Test case optimization and prioritization using data mining techniques – An empirical studies Shivani singh Computer science and engineering, Krishna Institute Of Engineering and Technology Ghaziabad - Meerut Highway (NH-58) - (AFFILATED TO UPTU) - INDIA Email – shoisingh@gmail.com ABSTRACT Software engineering is a major process in SDLC (SOFTWARE DEVELOPMENT LIFE CYCLE).Where testing is a major area in this which consists of a set of activities conducted with the aim of finding errors in software. It is also a application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, .In this paper we review on all the previous worked based on data mining techniques based on test case optimization and prioritization including models and algorithm which give most accurate output by minimizing the error . Keywords Test Case Prioritization, test generation , test cases, UML, data mining, tabu search algorithm, Test Case Optimization, genetic algorithm. I. INTRODUCTION Nowadays software engineering play a vital role in developing project, because most of the companies spend almost 50 % amount of budget to this area. Software engineering is quality oriented techniques where Quality can be achieved by very good testing techniques. Figure 1:- Software Testing module Where software engineering consist set of instructions and programs, each programs have their own functionality depending up on its type. while Software Testing consists of a set of activities conducted which finding errors in software. Testing can b done by either manually or automatically. Testing automatically is best way to test software because it consume less time and give accurate result than manual testing. Test case can also generate from the UML using object diagrams. UML is unified modeling language which is used to create visual models of a software system.. We used dynamic technique of ―Tabu‖ search method [1] for test case generation called as TCGen(test case generator) which generates the most suitable test cases and in the ground work necessary for test case prioritization. It has been estimated the 50 percent of software development [2], thus, a subset of all possible test cases has to be determined that satisfies a particular criteria. As per for future aspect, modification is also performed in the software depending up on the request that is given by the user. The user can give the change request form depending on their need. i.e modification performed in their organization or depending upon the poor performance of the software . Quality proportional to the insurance of good testing techniques. So optimizing time and cost of testing process are the real challenge of a test engineers. Where Regression testing is a kind of testing which requires maximum effort, time and cost. Previous work on test case prioritization is based on the computation of a prioritization index, which Determines the ordering of the test cases (e.g., by decreasing values of the index). For example, the coverage level achieved by each test case was used as a prioritization index [3]. Another example is a fault proneness index computed from a set of software metrics for the functions exercised by each test case [4]. Among several approaches that are used for the testing, we are discuss specification-oriented approaches (or black-box testing) and implementation oriented approach ( or white box testing ). Black box testing which generate the test cases from the program specification, and white-box testing which generate the test cases from the code of the program under test. We have two different type of model of prioritization i.e system base prioritization method and model based prioritization. where system based modeling is used to capture some aspects of system behavior or it is state based for example real time system. And in model-based test prioritization [5,6] a system’s model(s) is used to prioritize tests. Futher it will divide into one type and second type model , one type model is appropriate for prioritization based modification that involve changes in the model and then in the source code. While in another type of model-based test prioritization methods [7] are appropriate for modifications that do not involve any changes in models (changes are only made in the source code). In this paper, we are discuss UML 2.0 for modelling which is based on the second type of Model-based prioritization method. We are also brief discuss about clustering technique[8] using genetic algorithm method of data mining to prioritize the test case by tabu search [9].Our major goal is to minimize time taken by test cases . II. RELATED WORK This section include all the previous work based on test cases optimization techniques based on proposed data mining and software testing models , regression techniques using UML diagrams ,genetic algorithm , object oriented prioritization various approaches etc. to gain optimal result . These research works contain their own advantages and disadvantages. Alessandra Cavarra, Thierry Jeron, Alan Hartman ISSTA (2002) in this paper researcher proposed a architecture for model-based testing using unified modeling language (UML). In which Class, object and state diagrams are used to define model. To generate the test cases automatically, different types of tool are available like AGEDIS (Automated Generation and Execution of Test Suites in Distributed Component-based Software) test generation tool. The main advantage of the AGEDIS test case generation tool is its ability to combine different test directives: coverage criteria, test purposes and test constraints. This allows the user to tune the selection of test cases with respect to the budget of the campaign [10]. A hierarchy of test suites can be constructed with the help of property that the larger the suite, the greater the coverage of the implementation. This hierarchy is particular useful in regression testing. L.C.Briand el al. (2008) in this paper researcher proposed, method supported by the prototype tool to deals with the design level regression test selection problem by avoid human errors. The main objective is to ensure that regression testing was safe while minimizing regression efforts. But they show that certain changes may not be visible in the design and may require additional attention during coding or special way to document them design. Another disadvantage is that, based on UML design information, test selection may not be a precise as if it was based on detailed code analysis. The case study that is considered has only one case of imprecision in classifying a test case as retesting [11]. M.Prasanna el al. (2009) in this paper researcher proposed that to testing a software, test cases generation is best way. The test cases are derived by analyzing the dynamic behavior of the objects due to internal and external stimuli [12]. Test cases can be generated with the help of UML diagrams. Researchers use model based approach in which genetic algorithms crossover technique is apply on the class diagram and the traversal is done by the depth first search(DFS) algorithm. This tree structure approach coupled with genetic algorithm shows that it is capable to reveal 80% faults in unit level and 88% faults in integration level. They couple the genetic algorithm with mutation testing to check the effectiveness in the testing process which shows 80.3% of effectiveness. The result shows methodology is useful to generate test cases after the completion of the design phase and error could be detected at an early stage in software development life cycle. Sequence Diagram General Tree Genetic algorithm Tree structure s Testing equence Testing implementation Testing condition Figure 2 :- flow diagram of genetic algorithm A. Mosavi (2010) in this paper researcher proposed the classification task of data mining has been introduced as an effective option for identifying the most effective variables of the MOO in MCDM (Multicriteria decisionmaking )systems. The Classification algorithm of LADTree was utilized analyzing the effect of each design variable to the indentified objectives. The number of the optimization variables has been managed very effectively and reduced in the given example. The modified methodology is demonstrated successfully in the framework. The author believes that the process is simple and fast. Variables were reduced and organized utilizing classification algorithms. The achieved preprocessing results as reduced variables will speed up the process of optimization due to delivered smaller design space and minimum requested computational cost for MOO process. Data mining tools have been found to be effective in this regard. It is evident that the growing complexity of MCDM systems could be handled by a preprocessing step utilizing data mining classification tools. Sangeeta sabhwal, Ritu sibal, Chayanika Sharma (2011) in this paper researcher proposed another novel based approach in which testing efficiency is optimized by applying the genetic algorithm on the test data. For requirement change, a stack based approach for assigning weights to the nodes object diagram is proposed [8]. Here the object diagram that is considers are activity and state chart diagram. In this they convert first activity diagram into CFG and state chart diagram into SDG and then stack based approach is applied. The test paths are generating from the activity diagram and state chart diagram. In this novel approach the research uses the genetic algorithm which is apply on the sequence diagram. In the first sequence diagram is generated and then from the sequence diagram. Sequence dependency graph is generated and genetic algorithm is apply on it .In their study the found that the approach used is significant to identify location of the fault in the implementation and thus reduce the testing efforts [13]. The proposed approach makes use of IF model and genetic algorithm to find the path to be tested first. A.V.K Shanth et al. (2011) in this paper researcher proposed another model based approach in which the concept of data mining is used in which the evolutionary genetic algorithm technique is apply on the class diagram and generate the test cases [12]. Researcher study that Data mining technique is implemented to generate the optimal test cases automatically by which human and cost efforts are minimized [12]. They show that evolutionary genetic algorithm yields optimal valid test case than with only genetic crossover operator, after applying depth first searching algorithm. The advantages are that specification –based testing uses. G.Mohan Kumar, A.V.K.Shanthi (2012) in this paper researcher proposed researchers used some novel approaches to test the software at the initial stage itself which will easy for the software testers to test the software in the later stage [12]. In their study they use the tabu search algorithm to generate the test cases automatically from the object diagram. Here they take the sequence diagram. The experiment results show that this method has better performance. All the possible test cases are generated and validated by prioritization. test cases which are generated can be used as test suite for path testing for application. This approach can reveal all path’s for software to be developed and also obtained test cases valid once, which avoids validation test case because of fitness criteria. Ranjit Swain, Vikas Panthi, Durga Prasad (2012) in this paper researcher proposed different techniques are used to generate the test cases to test the software. The functional minimization technique is also used to generate the test cases. In this technique in which the STUPEC [13] technique is used in which first predicate is selected and then predicated is transformed and then test cases are generated. The functional minimized technique is used for finding the minimum of predicate function. In this approach the test cases are generated step by step. Here the object diagram that is used for generating the test cases is state machine diagram. This approach covers much coverage like state coverage, transition pair coverage, action coverage. The numbers of test cases are minimized that achieve transition path coverage by testing the borders determine by simple prediction. It is found that test cases are generated from the object diagram by minimize the cost and time [13]. It can also handle transitions with guards and achieves transition path coverage. Chien-Li Shen* and Eldon Y. Li (2013) in this paper researcher proposed a research model by utilizing the Ant Colony Algorithm and test case prioritization technique to optimize the test case prioritization. The expected contribution is to help software quality assurance stakeholders identify the higher-risk test cases which enable project managers to control testing execution time and testing budget. To validate the result of prioritization in this model, the result will compare with the one of a set of test cases chosen by experienced test leaders in each run of regression test along with the APFD (Average. Percentage Faults Detected) value. In addition, the parameters in the algorithm will be adjusted in order to dig out the best suitable variable values in this algorithm. V.Mary Sumalatha, G.S.V.P.Raju (2013) in this paper researcher proposed the test case generation by means of UML sequence diagram using Genetic Algorithm from which best test cases can be optimized. Moreover this method for test case generation inspires the developers to improve the design quality and to find multiple test cases ready for execution. In future, it is possible to build an automatic tool using this approach. This automatic tool will reduce cost of software development and improve quality of the software A.Pravin and Dr. S. Srinivasan (2013) in these paper researchers proposed The proposed Algorithm introduces test case prioritization technique with the help of the value calculated. The proposed algorithm works for both requirement and testing. [15]The Experiment is done for projects. Test case values were being used to rate the test cases. Algorithm was compared with random prioritization technique on two Application projects and it describes the effectiveness of algorithm for early rate of fault detection. We are currently working to see the effect of proposed algorithm with other techniques. Additionally the proposed algorithm is tested on limited data set. It can be validated by taking large size projects having larger set of use cases and test cases. so that it will be suitable for all type of projects with larger complexity and length. III. CONCLUSION In this paper we see different types of test case approaches, methods, techniques of data mining used to generate test cases after the completion of the design phase and errors can be detected at an early stage in the software development life cycle. or optimization and prioritization of test case. Which minimizing cost , time , effort , errors ,delay and maximize accuracy and relevancy .and gives optimal result by automatic generating test cases. REFERENCES [1] Bertolino.A, “Software Testing: Guide to the software engineering body of knowledge”, IEEE Software, Vol. 16, 1999, pp. 35-44. [2] Clay E.Williams, “Software Testing from UML Specifications”, Second International Conference on Hitesh Tahbildar, Bichitra Kalita, “Automated Software Test Data Generation: Direction Of Research”, International Journal of Computer Science and Engineering Survey (IJCSES) vol.2, No.1, Feb 2011, pp. 99-120. [3] Alessandra Cavarra, Thierry Jeron, Alan Hartman, “Using UML for Test Generation”, ISSTA, 2002. [4] M. Prasanna, K.R.Chandran, “Automatic Test Case Generation for UML Object Diagrams Using Genetic Algorithm”, Int. J. Advance. Soft comput. Appl., vol.1, no. 1, July 2009, pp. 19-32. [5] L.C.Briand, Y. Labiche, S.He, “Automating Regression Test Selection based on UML designs”, Information and Software Technology 51(2009)16-30. [6] A.V.K.Shanthi, Dr.G.Mohan Kumar, “Automated Test Cases Generation for Object Oriented Software”, Indian Journal of Computer Science and Engineering (IJCSE), vol.2, no.4, Aug-Sep 2011, pp. 543-546. [7] Sangeeta Sabharwal, Ritu Sibal, Chayanika Sharma, “ Applying Genetic Algorithm for Prioritization of Test Case Scenarios Derived from UML Diagrams”, International Journal of Computer Science Issues, vol.8, issue 3, no.2, May 2011, pp. 433-444. [8] Marlon Vieira, Johanne Leduc, Bill Hasling, Rajesh Subramanyan , Juergen Kazmeier, “ Automation of GUI Testing Using a Model – Driver Approach”, Proceeding of International Workshop on Automation of Software Test, 2006, pp. 9-14. [9] Manoj Kumar, Dr. Mohammad Husain, Gyanendra K. Gupta and Amarjeet Singh, “An Efficient Algorithm for Evaluated for Object Oriented Models”, International Journal of Computer Application, vol.24, no.8, 2011, pp. 11-15. [10] Ranjit Swain, Vikas Panthi, Prafulla Kumar Behera, Durga Prasad Mahapatra, “Test Case Generation Based on State Machine Diagram”, International Journal of Computer Information Systems, vol.4, no.2, 2012, pp. 99-124 [11] A.V.K.Shathi, G.Mohan Kumar, “ A Heuristic Approach for Automated Test Case Generation From Sequence Diagram Using Tabu Search Algorithm”, European Journal of Scientific Research, vol.85, no.4, Sep 2012,pp. 534-540. [12] A.V.K.Shanthi, G.Mohan Kumar, “ Automated Test Cases Generation from UML Sequence Diagram”, International Conference on Software and Computer Application, vol.41, 2012, pp. 83-89. [13] Philp Samuel, R.Mall, A.K.Bothra, “Automatic Test Case Generation Using UML State Diagram”, IET Software, 2008, pp. 79-93. [14] Ashalatha Nayak, Debasis Samanta, “Automated Test Data Synthesis Using UML Sequence Diagram”, Journal of Object Technology , vol. 9,no.2, 2010,pp. 75-104. [15] A.Pravin “An Efficient Algorithm for Reducing the Test Cases which is Used for Performing Regression Testing “2nd International Conference on Computational Techniques and Artificial Intelligence (ICCTAI'2013) March 17-18, 2013 Dubai (UAE)