International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 5 - Nov 2013 A Modified Genetic Algorithm for Software Testing Anjali kapoor1, Mohit kumar2 #1 #2 Research Scholar, CSE, RIMT-IET (PTU), Mandi Gobindgarh, India Associate Professor, CSE, RIMT-IET (PTU), Mandi Gobindgarh, India Abstract The success of software depends largely on the quality as well as quantity of the testing. The testing phase can consume lots of resources, if not planned properly. Automatic test case generation can help in saving some of those precious resources and at the same time, can speed up the testing. Genetic Algorithms (GAs) are being widely used for such automatic generation, but none of these methods are handling the equality type of predicates in an effective way, as explained below in section 5. This paper proposes a new method which can fasten the automatic test case generation activity. Our initial study indicates a remarkable improvement in the time needed for automatic test case generation corresponding to equality predicates and at the same time, does not affect most of the other test case generation corresponding to non-equality predicates, as mentioned in section 6 below. Results have also indicated improvement in test coverage in improved method as compared to existing GA method. A comparison of the existing GA based techniques with the newly proposed technique clearly demonstrates that new method is overall more efficient and thus more desirable. This proposed concept can be made more sound and reliable by exploring it over much more complex and lengthier benchmarking programs. I. Testing Background A quality software has to undergo complex, labour intensive, time consuming and costly test process before establishing the confidence of user, developer and manager of being a reliable and high probability error free product. Testing process comprises of a set of activities in which first inputs are selected and then software under test is executed using these inputs for fault(s) identification. Next steps are of isolation and resolution of these fault(s). If a small amount of automation is done in testing process it can save significant resources attributed to testing. Generation of apt test cases is the crucial in effective & efficient testing. Test cases can be generated in several ways: automated, semi-automated or manual. While automated test cases generation saves a lot of resources and these don’t encompass human biases and are less intelligent. On the other side manual test cases, although consist human intelligence, take much efforts and time. Last decade experienced a number of research attempts in automation of test cases generation [Korel1990], [Wegener2001], [Sthamer1996] [Mcminn2004], specifically use of heuristic approaches towards fulfilling the objectives of ISSN: 2231-5381 this activity. Several soft computing based techniques like genetic algorithm [Lin2001] [Berndt2003] [Miller 2006], simulated annealing [Tracy1998], tabu search [Diaz2003,2007], ant colony optimization [Mahanti2006] were successfully employed in this direction. Test cases are generated keeping in view a predefined test objective. II. Automatic Generation of Test Cases and Genetic Algorithm Being an NP hard problem, test cases generation is a perfect case for employing Genetic Algorithm, which are non linear search space algorithms with the characteristic of being robust and adaptive. GA is population based search algorithm, whose success in achieving objectives largely depends on the definition of fitness function. Fitness function evaluates candidate solutions based on the some criterion which is used to adjudge the suitability of solution as compared to others. Two candidate solutions in a search space may differ in several ways, but they can exhibit similar characteristics with respect to some criteria. A better fitness function always exposes diversity in the solutions rather than similarity. Finally, this diversity leads to better solution in less number of iterations. This problem of early convergence of GA to the non optimal solution in test cases generation has been cited by several researchers [Wegener2001] [lin2001]. In identifying test cases, fitness of a candidate solution may be described by several alternates. It may depend on the percentage of the components covered in testing or reward/penalty of inclusion/exclusion of a desired path in program execution by solution (candidate test case) or some other criterion. III. The Methodology Gas can be employed to generate test cases by using both functional testing and structural testing techniques. Functional testing also called as black box testing uses functional knowledge of program and focuses on input and output domains of program to identify test cases. On the other side, structural testing also called as white box testing (more appropriately glass testing) considers program internal structure in order to generate test cases by using flow graph of the program. Structural testing is further divided into two categories; control flow based and data flow based. In control flow based structural testing only control of program is analyzed. Data flow testing also uses control flow graph paths or sub paths to ascertain the association between the definition and computation or predicate uses of a variable. This analysis can be static as well as dynamic. Static analysis does not http://www.ijettjournal.org Page 248 International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 5 - Nov 2013 require the actual execution of program unlike dynamic analysis. Another issue of testing is setting up a limit on the coverage of program components by test cases. A test data adequacy criterion is a set of rules that is used to determine whether or not sufficient testing has been performed. Several criteria have been fixed and ranked according to their strength and weakness by researchers [Rapps1982] [Rapps1985] [Franlkl1988]. All-paths testing criterion subsumes all other criteria and is the strongest criterion which we have used in our experiments. In symbolic execution, a testing path is identified from control flow graph of program. Now a valid test case is identified which should execute the particular path by satisfying all of the boolean expression included in that path. Concatenation of all such expressions (formally known as predicates) involved in that path is done next to generate a hybrid predicate. Special consideration to the variable dependence on program processing has to be given during the process of concatenation, as it affects the subsequent execution criterion of the remaining path. In our method, we have used symbolic execution technique of static structural testing and GA for the identification of test cases. GA generates population of candidate solutions and these are evaluated using a fitness function. A candidate solution is made up of all the input variables, to which values are assigned for constructing a test case. To evaluate a fitness of a candidate, all the constraints of a particular path are atomized and one by one each is evaluated using current candidate solution. An atomic predicate is that one, which contains only one operator if it is satisfied then no penalty is imposed to candidate solution, otherwise candidate solution is penalized with following values as shown in table 1. This method has been already used by several researchers [Korel1990] [Tracy 1998] [Wegenar 2001]. Atomized predicate Penalty to be imposed in case predicate is not satisfied A<B A–B A <= B A–B+ζ A>B B–A A >= B B–A+ζ A=B Abs(A – B) A ≠B ζ – abs(A – B) A and B are operands and ζ is a smallest constant of operands’ universal domains. In case integer it is 1 and in case real values it can be 0.1 or 0.01 depending on the accuracy we need in solution. IV. Forced Constraint Satisfaction: While evaluating fitness, two types of categories of atomic predicates (constraints) can be identified; easily satisfiable and hard to be satisfied atomic predicate. Former category includes inequality condition such as A<B, A<=B, A>B, A>=B and A≠B while later includes only equality condition ISSN: 2231-5381 based predicate A==B that is not easily satisfiable by GA. A predicate is said to be satisfied if it evaluates to be true. A predicate is easily satisfiable if there are enough combinations of inputs which can be easily identified by GA program in order to execute we propose a change in fitness function of GA. Fitness function in existing GA techniques doesn’t change the structure of solution but only evaluates candidate solution according to the procedure described in section 4. In our method, we are just concerned with atomic predicate involving equality condition. When equality atomic predicate fails to evaluate then our proposed fitness function doesn’t penalize candidate solution straight away as in the case of inequality predicates but it forcibly assigns value of one variable (operand) to another and charges candidate solution altogether with a small penalty of ζ, so that it can be evaluated in the next iteration of GA just to ensure that recent assignment has not violated already satisfied predicates. This is called forced constraint satisfaction. Now let us consider the effect of new fitness function on the generation probability of test cases which can satisfy equality predicates. If we employ, FCS method in above example then test cases generation, related to satisfaction of equality predicate is increased from a meager 0.2 to 1. Thus, it becomes very easy to generate test cases for path evaluation in less number of generations and same is corroborated by our experimental results. V. Results To prove our method, we have experimented on two most standard benchmark programs regularly used in testing research. These are triangle classifier and line-rectangle clipping programs. Former classifies triangle type based on the value of three sides entered as inputs. Later program identifies whether a line cuts a rectangle or lies completely outside or lies completely inside of the rectangle. In this program total eight inputs are entered; four for co-ordinates of rectangle and other four inputs are used to define the line. Control Flow graphs of two programs are manually constructed from respective source code and all paths are identified for the two programs. For each path a hybrid predicate is formed manually which in turn becomes an input to GA fitness function used to evaluate candidate solution. For the purpose of comparison, random test cases are also generated randomly. Test cases are generated using GA with and without FCS method. Test cases are generated from inputs by taking different domain; one very large of the size of the order of 1011 and one small with a size of order of 104. Preliminary experimentation results are shown in table 2 and table 3. In larger domain of inputs, less coverage of paths is achieved by random test generator as compared to small domain. Even in larger domain existing GA method fails to provide 100% coverage while our method provides cent percent coverage without failing a single time. These results clearly shows that our proposed change in fitness function of GA drastically reduced the required no of test cases generation with increased coverage and that is without the loss of generality and simplicity of GA program. http://www.ijettjournal.org Page 249 International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 5 - Nov 2013 GA Test generation GA Test generation without FCS with FCS Average Coverag Average Coverag Average Coverag no of Test e% no of Test e% no of Test e% cases cases cases Generate Generate Generate d per d per d per path path path 17147 42.85% 4649 87.14% 40 100% Triangle Classsifier line14159 rectangle clipping 53% 5203 78.82% 162 Re ctangle Clas sifier w ith Range be tw ee n -10^11 and +10^11 100000 120 100 %coverage in rando m test case gener ati on %coverage in GA test case gener ati on wi tho ut FCS %coverage in GA test case gener ati on wi th FCS Average Test Cases in Rand o m test case g eneratio n Average Test Cases in GA t est case g eneratio n wit ho ut FCS 10000 Average Test Cases in GA t est case g eneratio n wit h FCS 100% 80 01000 60 00100 40 00010 %Test Coverage Random Test Generation Average Test Cases Generated Generatio n Method 20 00001 0 1 2 %cover ag e in rand om test case gener ati on 10 0 100 %cover ag e in GA t est case generat ion without FCS 10 0 100 52 52 44 57 54 62 65 %cover ag e in GA t est case generat ion with FCS 10 0 100 10 0 3 10 0 100 100 10 0 10 0 100 10 0 Aver age Test Cases i n Rando m test case g eneratio n 1 2 64 3 0 00 0 3 00 00 30 0 00 30 00 0 3 00 00 3 00 00 30 00 0 3 0 00 0 37 51 74 74 Aver age Test Cases i n GA test case g eneratio n wit ho ut FCS 1 88 103 42 10199 12 0 41 1194 5 108 92 8 98 4 10 616 12193 23 0 147 2 33 18 8 Aver age Test Cases i n GA test case g eneratio n wit h FCS 1 99 99 69 103 2 35 60 12 49 64 14 8 177 2 15 157 0 4 0 5 0 6 0 7 0 8 9 0 0 10 11 12 13 14 15 16 17 0 10 0 100 10 0 10 0 100 100 10 0 54 10 0 100 10 0 10 0 100 100 10 0 10 0 100 10 0 10 0 100 100 10 0 10 5 34 67 171 36 16 0 28 0 29 2 35 Path No Table2. Comparison of test cases generation methods with Input domain Range from -1011 to +1011 Generation Method Random Test Generation GA Test generation without FCS Average Cove Average Cove no of Test rage no of Test rage cases % cases % Generated Generated per path per path Triangle 10493 Classsifier line-rectangle 13901 clipping 47.86 427 % 55% 2028 100% 99.65 % GA Test generation with FCS Average Cove no of Test rage cases % Generated per path 35 100% 145 100% Table3. Comparison of test cases generation methods with Input domain Range Input domain Range from -104 to +104 Triangle Class ifier w ith Range betw ee n -10^11 and +10^11 % Co verage in Rando m test case generation 100000 120 % Co verage in GA test case generation without F CS 100 10000 % Co verage in GA test case generation with F CS 80 1000 A verage Test C ases in R andom test case generation A verage Test C ases in GA test case generatio n witho ut FC S A verage Test C ases in GA test case generatio n with FC S 60 100 40 10 20 1 0 1 2 %Cover age in Random t est case generat ion 100 100 %Cover age in GA t est case gener at ion wi thout FCS 100 100 34 92 91 %Cover age in GA t est case gener at ion wi th FCS 100 100 100 3 100 100 100 100 0 4 0 5 0 6 7 0 100 93 100 Aver age Test Cases in Random t est case generat ion 1 11 30000 30000 30000 30000 19 Aver age Test Cases in GA test case gener ati on wit hout FCS 1 136 20574 3473 3757 3315 27 Aver age Test Cases in GA test case gener ati on wit h FCS 1 195 26 11 24 9 16 P a t h No ISSN: 2231-5381 Conclusion In this paper a modified approach for software testing has been proposed, which is based on genetic algorithm uses for automatic test case generation. The weakness of traditional GA algorithm has been removed by forcing the equality constraint with the help of a revised fitness function. The proposed approach has been evaluated and compared with the existing techniques for two of the widely accepted problems named as triangle classifier problem and the line-rectangle clipping problem. Our initial study indicates a significant improvement in automation of structural testing and the proposed method can be quite useful for the software managers in improving the quality of the software within limited time and resources. However, the method needs to be evaluated further for larger programs in order to make it more reliable. References: [1] [Rapps1982] S. Rapps and E. J. Weyuker. Data flow analysis techniques for test data selection. In Software Engi-eering 6th International Conference. IEEE Computer Society Press, 1982. [2] [Rapps1985] S. Rapps and W. J. Weyuker. Selecting software test data using data [3] flow information. IEEE Transaction on software engineering 11(4):367{375, April 1985. [4] [Franlkl1988] Frankl, P.G. and Weyuker, E. J. “An Applicable Family of Data Flow Testing Criteria”, IEEE Transaction On Software Engineering., Vol. 14, NO.10, pp. 1483-1498, 1988. [5] [Myers 1978] G. J. Myers. A controlled experiment in program testing and code walkthrough and inspection. [6] [Natafos1988] Ntafos, S. C. “A comparison of some structural testing strategies. IEEE transaction on Software Engineering 14:868-874, 1988. [7] [Beizer1990] Beizer B. Software testing techniques. 2nd ed., Dreamtech publication New Delhi. 1990. [8] [Watkins1990] Watkins AL. “The automatic generation of test data using genetic algorithms”. The http://www.ijettjournal.org Page 250 International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 5 - Nov 2013 fourth software quality conference, vol. 2, p. 300–09. 1995 [9] [Sthamer1996] Sthamer H, “The automatic generation of software test data using genetic algorithms”. Ph.D. thesis, University of lamorgan, Pontyprid,Wales, UK, April 1996. [10] [Jones1996] Jones B, Sthamer H, Eyres D. Automatic structural testing using genetic algorithms. Software Engineering Journal, 11(5), 299–306, 1996 [11] [Pargas1999] R, Harrold MJ, Peck R. Test-data generation using genetic algorithms. Journal of Software Testing, Verification and Reliability, 9(4), 263–82. 1999 [12] [Wegener2001] Wegener J, Baresel A, Sthamer H., “Evolutionary test environment for automatic structural testing”. Information and Software Technology 43, 841–54, 2001; [13] [Diaz 2003] Díaz E, Tuya J, Blanco R. “Automated software testing using a metaheuristic technique based on tabu search”. In: The 18th IEEE international conference on automated software engineering, p. 310–3, 2003 [14] [Lin 2001] Lin J-C,Yeh P-L. Automatic test data generation for path testing using GAs. Information Sciences 131, 47–64, 2001; [15] [Berndt2003] Berndt D., Fisher J., Johnson L., Pinglikar J., and Watkins A., “Breeding Software ISSN: 2231-5381 Test Cases with Genetic Algorithms”, IEEE Proceedings of the 36th Hawaii International Conference on System Sciences (HICSS’03). [16] [Korel1990] B. Korel, “Automated software test data generation’, IEEE transaction on software engineering, 16(8), 870-879, 1990. [17] [Miller 2006] James Miller, Marek Reformat, Howard Zhang “Automatic test data generation using genetic algorithm and program dependence graphs”, Information and Software Technology, 48, 586–605, 2006. [18] [McMinn2004] McMinn P., “Search-based Software Test Data Generation: A Survey”, Software Testing, Verification and Reliability, 14(2), pp. 105-156, June 2004. [19] [Mahanti2006] Mahanti P K, Banerjee S, “Automated Testing in Software Engineering: Using Ant Colony and Self-Regulated Swarms”, Proceeding of Modelling and Simulation, 2006. [20] [Tracy1998] Tracey N., Clark J., Mander K., and McDermid J., “An automated framework for structural test-data generation” Proceedings of the International Conference on Automated Software Engineering, 285-288, 1998. http://www.ijettjournal.org Page 251