Linköping University | Department of Computer and Information Science Master thesis, 30 ECTS | Computer Science 2019 | LIU-IDA/LITH-EX-A--19/010--SE Economics of Test Automation – Test case selection for automation David Lindholm Supervisor : Azeem Ahmad Examiner : Kristian Sandahl External supervisor : Christoffer Green Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrätt Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/. Copyright The publishers will keep this document online on the Internet - or its future replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purposes. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/. © David Lindholm Abstract In this thesis a method for selecting test cases for test automation is developed and evaluated. Existing methods from the literature has been reviewed and modified with the result being the proposed method, a decision tree containing 23 factors grouped into 8 decision points. The decision tree has been used and evaluated in an industrial setting. The economic benefits were calculated with return on investment and the organisational benefits were measured in a survey at a software producing company. The result was that automated tests, selected with the decision tree, provided economic benefits after 0.5 to 4 years, these tests were also found to lead to 3 organisational benefits: less human effort when testing, reduction in cost and allowing for shorter release cycles. Acknowledgments First of all, I would like to thank my examiner Kristian Sandahl for the feedback during the project. A special thanks to my supervisor Azeem Ahmad for his exceptional supervision throughout my thesis. Thanks to Sectra Imaging IT Solutions Ltd for providing with the opportunity to realise this project. I would like to thank my project sponsor Magnus Ranlöf for the discussions about where to take the project. Thanks to my supervisor Christoffer Green, for his continuous support and advices. Finally, I want to thank all of the people at Sectra that has participated in interviews and surveys in my thesis. iv Contents Abstract iii Acknowledgments iv Contents v List of Figures vii List of Tables viii 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 3 3 3 3 Theory 2.1 Software Testing . . . . . . . . . . . . . . . . . 2.2 How to Test Software . . . . . . . . . . . . . . 2.3 Manual Testing . . . . . . . . . . . . . . . . . 2.4 Automated Testing . . . . . . . . . . . . . . . 2.5 Benefits and Limitations of Test Automation 2.6 What to Automate . . . . . . . . . . . . . . . 2.7 Return on Investment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 5 8 9 9 14 17 3 Method 3.1 Qualitative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Quantitative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 24 33 4 Results 4.1 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 35 40 5 Discussion 5.1 Results . . . . . . . . . . . . 5.2 Method . . . . . . . . . . . . 5.3 Internal Validity . . . . . . . 5.4 External Validity . . . . . . . 5.5 Reliability . . . . . . . . . . 5.6 Ethical and Societal Aspects 44 44 47 50 50 50 50 2 6 Introduction 1.1 Aim . . . . . . . . . 1.2 Research Question 1.3 Research Objectives 1.4 Project Context . . 1.5 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 v Bibliography 7 54 Appendices 7.A Interview Benefits from Test Automation 7.B Checklist Survey . . . . . . . . . . . . . . 7.C Checklist 1 . . . . . . . . . . . . . . . . . . 7.D Checklist 2 . . . . . . . . . . . . . . . . . . 7.E Decision Tree . . . . . . . . . . . . . . . . . 7.F Benefits from Automation Survey . . . . . vi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 60 62 68 70 72 74 List of Figures 1.1 Thesis aims. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1 2.2 V-model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Decision Tree in viability analysis method . . . . . . . . . . . . . . . . . . . . . . . . 7 16 3.1 3.2 3.3 Overview of the research method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview of the method used for modifying the checklist . . . . . . . . . . . . . . . Decision Tree usage example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 26 31 4.1 4.2 4.3 Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ROI of automation project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ROI for individual test cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 41 42 vii List of Tables 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 Software testing process . . . . . . . . . Questions in viability analysis method . Checklist for deciding what to automate Fixed costs of test automation . . . . . . Variable costs of test automation . . . . Benefits of test automation . . . . . . . . Variables in Hoffman’s ROI formula . . Variables in Münch et al. ROI formula . . . . . . . . . . . . . . . . . 6 16 17 19 19 19 20 21 3.1 3.2 Scores assigned to Likert scale points . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of excluded and included factors to modified checklist . . . . . . . . . . . 28 29 4.1 4.2 4.3 4.4 Results from interview of benefits from test automation. . . . . . . . . . . Results from modification interviews of checklist. . . . . . . . . . . . . . . Data used in ROI calculations . . . . . . . . . . . . . . . . . . . . . . . . . . Results from survey evaluating organisational benefits of automated tests. . . . . 35 37 43 43 5.1 Factors found in literature that are not included in decision tree. . . . . . . . . . . . 45 viii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Introduction With the usage of agile project methodologies and rapid release cycles software companies are releasing their product more frequently than ever before [1]. For instance, Firefox release their product every 6 weeks and some companies release even more often than that [2]. Mäntylä et al. argue that rapid releases implicate challenges in testing [3]. To illustrate an example of how rapid releases can affect quality; Porter et al. bring up that quality can decrease due practitioners not having time to test all platform configurations before releasing a product [4]. Testing the product to a sufficient degree is important, as phrased by Sawant, Bari, and Chawan: “Testing can be costly but not testing software can be even more costly” [5]. Charette shows several examples of how software failures can result in hundreds of million dollars in costs for companies, and in some cases it even led to bankruptcy [6]. Testing can be used to prevent this from happening, by being used as a tool for validation and verification of whether that software meets the goals and requirements from the customer [7]. It is clear that testing is needed, but how can software testing catch up with the short release cycles? One way of testing the product more efficiently can be with the use of test automation [3], [8]. However, many testers agree on the fact that test automation in its current state cannot fully replace manual testing, as both have different, albeit equally important roles [9]–[14]. Test automation can help improve quality, efficiency and shortening the time to market [8], [15], but the question remains of what should be automated and what is better left for manual testing. Selecting test cases for automation is a challenge, according to Amannejad et al., who state that there is a lack of research in how to select test cases for automation [16]. Kasurinen, Taipale, and Smolander define test automation strategy as “The observed method for selecting the test cases where automation is applied and the level of commitment to the application of test automation in the organizations” [11]. The authors conducted interviews with 55 industry specialists from 31 organizations in 2009 and found that many organizations have a need of a clear test automation strategy [11]. In a literature review from 2016, Garousi and Mäntylä provide an extensive checklist, consisting of 43 factors divided into 5 categories, which can be used when deciding whether to implement test automation and when selecting tests for automation [17]. Two factors that 1 1.1. Aim hinder companies from test automation are high implementation cost and maintenance effort [11]. These costs and the benefits of test automation are commonly estimated with return on investment (ROI) [17], [18]. This thesis will attempt to find a method for selecting test cases for automation that will result in economic and organisational benefits. Benefits of test automation were identified by reviewing the research made on this subject and interviewing practitioners. Interviews with software engineers led to modifying the checklist from Garousi and Mäntylä into a decision tree. The decision tree was validated in an industrial case study, in which a set of test cases were selected by using the decision tree and later evaluated on their return on investment and possibility to achieve the identified organisational benefits. 1.1 Aim The primary aim of this thesis is to establish a method that will facilitate the selection of test cases for test automation for software producing companies. Before attempting to construct this method, interviews were held to find out why test automation is needed and which benefits software-producing companies want to achieve with test automation. The secondary aim of this thesis is to identify what benefits have been presented in the scientific literature and to find out whether practitioners are in agreement of these. When the need of test automation had been identified the construction of a tool to simplify test case selection for automation can begin. In this thesis, it was studied whether the checklist [17] provided by Garousi and Mäntylä could be used to select test cases for automation. First it needs to be verified that the checklist is a useful tool for achieving test automation in an industrial setting. If practitioners agree on the fact that the tool can be used, it is necessary to find out which modifications are required for the checklist for it to suit practitioners. To simplify the use of the checklist, the factors will be put into a decision tree. In the end the decision tree needs to be evaluated. The third and final aim of this thesis is to evaluate the created method for selecting test cases for test automation. Can this method provide economic and organizational benefits in the industry? The economic benefits are to be measured with return on investment and organisational benefits are to be evaluated against the benefits identified from literature and interviews with practitioners. Figure 1.1: Thesis aims. 2 1.2. Research Question 1.2 Research Question Based on the above aims, the following research question has been formulated: Can the checklist provided by Garousi and Mäntylä [17] be modified in such a way that it can be used to select test cases for test automation that result in economic and organisational benefits? 1.3 Research Objectives As an aid to answer the research question, a couple of research objectives were formulated. The first two research objectives aim to establish a basis of understanding why test automation is needed and how the economic benefits from test automation can be measured. The third research objective aim is to verify that the proposed checklist is suitable for an industrial setting. The fourth and last research objective will aid the process of adapting the checklist to practitioners. When an adapted version of the checklist has been established, the data collecting phase of the case study can be started. Data are collected by using the checklist for selecting test cases that can be automated, automating a subset of these test cases and evaluating their outcome in the research question. The research objectives were formulated as follows: 1. What do practitioners believe are the common benefits software producing companies relate to test automation? 2. How can economic benefits be measured for test automation? 3. Is the checklist provided by Garousi and Mäntylä [17] applicable in an industrial setting to achieve test automation? 4. What modifications to the checklist provided by Garousi and Mäntylä [17] are required to make it applicable for practitioners? 1.4 Project Context This project is carried out at Sectra Imaging IT Solutions Ltd, which is a subsidiary company of Sectra Ltd [19]. Sectra Ltd was founded in 1978. 40 years later, 2018 Sectra Ltd has 645 employees and a turnover of 1 266 496 million SEK [19]. Sectra Ltd’s headquarter is located in Linköping, Sweden. Among other products, Sectra Imaging IT Solutions Ltd develops a picture archiving and communication system (PACS), a software system that aids in storage, visualization and manipulation images for departments such as radiology, pathology, cardiology and orthopaedics. The medical products produced by Sectra Imaging IT Solutions Ltd are used in more than 1800 hospitals all over the world [20]. 1.5 Delimitations This thesis studies test automation at Sectra Imaging IT Solutions Ltd, the methods that are reviewed are chosen on the basis that they have to be suitable for the software development process used at Sectra Imaging IT Solutions Ltd. In the scope of this thesis test automation is considered to be the software development process that results in software-performed test execution and result analysis. Automation at other levels such as requirement analysis and test implementation are not included in the scope for this thesis. Furthermore, the tests that 3 1.5. Delimitations are considered for automation in this project are tests of the higher levels of tests types. Integration, system and acceptance tests are considered for automation whereas unit tests are not considered for automation. Sectra Imaging IT Solutions Ltd, expressed a wish for a simple method for selecting test cases for automation. For this reason, a checklist-based approach was studied in this thesis, which is also why systematic approaches (see section 2.6.1) are not considered. Similar thoughts have been expressed in earlier studies. It has been shown that systematic approaches are not commonly used in industrial settings even though they might occur frequently in research. This was shown by Engstöm and Runeson, and Engstöm, Runeson, and Skoglund for regression test selection techniques [21], [22]. In another study, Runeson conducted a focus-group meeting and a survey with 17 and 15 participants respectively, which had a similar result: the participants stated that none of the companies used a systematic approach to select which unit tests to write, but instead tests were chosen based on the developer’s intuition and experience [23]. 4 2 Theory The scientific theory used in this thesis is presented in this chapter. The chapter is divided into the following sections; Software Testing, How to Test Software, Manual Testing, Automated Testing, Benefits and Limitations of Test Automation, What to automate and Return on Investment. 2.1 Software Testing Software testing can be defined as “Evaluating software by observing its execution” [24]. However, this definition of software testing only covers the “what” of testing. The goals of software testing are to verify and validate that a software works in a certain way and to find errors in software [25]. Beizer, describes the “why” of testing in his five levels of testing maturity: [26]. Level 0: There is no difference between testing and debugging. Level 1: The purpose of testing is to show that software works. Level 2: The purpose of testing is to show that software doesn’t work. Level 3: The purpose of testing is not to prove anything but to reduce the perceived risk of not working to an acceptable value. Level 4: Testing is a mental discipline to develop low-risk software without much testing effort. Testing can be used as a tool for validation and verification of that a software meets the goals and requirements from the customer [7]. Validation of software is used to ensure that the product works as expected and can be a help for management in making decisions of when to release the product [7]. To achieve reliable testing, it needs to be done in such a way that the process and the results from it are repeatable and independent of who performs the tests [25]. 2.2 How to Test Software In this section the software testing process is defined, the different testing levels are described and software testing techniques are presented. 5 2.2. How to Test Software 2.2.1 Software Testing Process The software testing process can be described through four main activities, requirements analysis and specification, test implementation, test execution and test evaluation, these are shown in Table 2.1 [16], [27], [28]. Table 2.1: Software testing process Test activity Requirements analysis and Specification Activity Defining the testing goals and exit criteria, i.e. what testing should accomplish [27], [28]. The exit criteria can be that a set of tests have been executed, that a certain code coverage [28] or requirements coverage [29] has been reached. In this activity, the test activities should also be clarified, it can for example be done by defining a set of tests that should be executed and specifying the test activities [16], [27], [28]. Test Implementation In this activity, the tests are created. If manual testing is used, the implementation will consist of writing manual test scripts [16], [28] or defining guidelines for exploratory testing. If test automation is used, the implementation activity is the production of automated test code [16], [28]. Test Execution The test cases are being executed. In manual testing, testers carry out the steps defined in the test scripts [16] or perform exploratory testing. In automated testing, test code is executed, either by running the code manually or by using a test automation tool to run the code [16], [28]. Test Evaluation The result from the test execution is evaluated in this activity. In manual testing, the tester checks the outcome of the test and compares it with the expected result [16], [28]. For automated evaluation, a test tool is used to verify the outcome with the expected result [16], [28]. 2.2.2 Testing Levels Dalal and Chhillar state that testing should be started early and performed in all stages of software development [7]. The V-model [30], Fig 2.1, is a common way to represent the development and testing stages of software development. Unit testing, is the lowest level of testing which has as goal to verify that a small piece of code does what it should. What is a unit depends on which programming language that is used, it can for example be a function, procedure or method [24]. Unit tests are often performed by developers shortly after writing the code. A great advantage for unit tests is that they can help with finding bugs in an earlier stage of software development which reduces the cost of the bug [25], which makes unit tests very cost effective [5]. Integration testing, tests how a set of units function when combined through interfaces. Many bugs only occur when modules interact with each other, the module itself may work and pass a unit test but when integrated with other modules they may be used in a way that was not intended or thought of by the developer. System testing is used to verify that the software product as a whole works as expected. System testing is often performed by a testing team and has the goal to look over the design and specifications of the software [24]. 6 2.2. How to Test Software Figure 2.1: V-model Acceptance testing aims to make sure that the software meets the needs and requirements from the customer. 2.2.3 Testing types The testing levels presented in the V-model gives a good overview of the testing levels. In each testing level, tests of different types can be used to achieve the testing goals. In this section some testing types are presented, the selection has been made with consideration to what is relevant for this thesis. An overview of the testing types can be found in: A comparative study of black box testing and white box testing techniques by Kumar, Singh, and Dwivedi [31]. Build verification testing is performed on new builds, a smaller set of regression tests runs with the aim to verify that no major defects have been introduced in the main functionality of the software [31]. Smoke testing is done at an early stage of the testing process, much like build verification testing the idea is to verify that the software is performing well enough to spend further testing effort on it [31]. Kumar, Singh, and Dwivedi state that the tester quickly goes through different parts of the software to answer questions like “Can I launch the test item at all?” [31]. Sanity testing is another form of quick and broad testing [31]. In sanity testing the tester aims to verify that the logic in the software is functional and correct [31]. Much like smoke testing and build verification testing, sanity testing is a tool to check if further testing should be made [31]. Scenario testing assess the product in how it will be used by end-users. Kaner state that “The scenario is a story about someone trying to accomplish something with the product under test” [32]. Scenario testing can be used to learn how the product will be used both by new users and expert users, it is also useful for verifying that the software delivers the features and possibilities that users need in their work [32]. 7 2.3. Manual Testing 2.2.4 Software Testing Techniques Khan classifies software testing techniques by their purpose into correctness, performance, reliability and security testing [33]. In this section correctness testing will be discussed briefly. Correctness testing verifies the behaviour of a software and is divided into black box, white box and grey box testing [33]. Grey box testing is only a combination of the black box and white box testing techniques, it will not be discussed further here. Black box In black box testing the tester does not consider the internal parts of the software under test (SUT), instead the tester looks at the output provided from the SUT when given a certain input. Black box testing can be performed at all levels of testing defined above (unit, integration, system and acceptance) [34], although unit tests are commonly done with knowledge of the underlying code. Some black box testing methods are exploratory testing, smoke testing, stress testing, load testing, equivalence class testing, boundary value testing, model-based testing and use-case testing [5], [33], [34]. One advantage of using black box testing is that test cases can be defined independent of the implementation of the software and for this reason they can be written in parallel with the development of the software [35]. Black box testing is efficient at finding defects [5], [33], [34]. However only relying on black box testing is likely to result in that some parts of the software are not being tested [5]. Black box testing needs a clear specification of what the software should do [33] and implemented behaviour that is not defined in such specification might not be caught [35]. White box White box testing makes use of the underlying structure and paths to test the software under test [34]. While white box testing is often used in unit and integration testing, it can also be used in system testing [5], [34]. The following methods are used in white box testing; path testing, statement coverage, control structure testing and data flow testing [5], [33]. These techniques can help the tester to find errors hidden in the code [33], and gives the tester possibility to test all logical decisions, loop boundaries and paths in a module [5], [34]. A disadvantage to using white box testing techniques is that it requires the tester to have developer skills and as such often is costly [5], [33], [34]. Another disadvantage is that if required functionality is missing in the implementation it is not likely to be found by using white box testing [35]. 2.3 Manual Testing In manual testing a human performs the tests by interacting with a software and evaluating the results, manual testing can be carried out by following scripted instructions or ad-hoc, exploratory test the software. Scripted tests rely on predefined test cases that describes what input should be given to the software and which output that is expected [34], [36]. The result of the test is a comparison between the actual output and the expected output. Input can be created using test case design techniques from black box testing such as boundary value testing or they can be based on requirement documents, release notes or defect reports [34], [36]. Fewster and Graham classifies scripted tests into vague manual scripts and detailed manual scripts, where both scripts define input and expected output, in vague scripts the test input and expected output is described in general terms whereas in a detailed script they are defined precisely [29]. An advantage for scripted tests is that they can be carried out by any tester, are easily repeated and can therefore be used in regression testing [34]. However, if the 8 2.4. Automated Testing scripted test is of the vague type it may have different outcomes depending on the tester’s choice of test input and execution [29]. Scripted testing is inflexible and in many cases it might be hard to define test cases on beforehand, exploratory testing can assist in finding more test cases or be used as an alternative testing approach [34], [37]. Exploratory testing is a type of black box testing, Bach defines it as: “Exploratory testing is simultaneous learning, test design, and test execution” [37]. In exploratory testing the tester starts with a goal and defines new tests along the way as he is testing the software. Depending on where the tester wants to place the test on the spectrum between scripted and exploratory testing, the tester decides if and how much it should be guided by written goals and tactics [37]. Exploratory testing can help testers to diversify testing, evaluate or learn about a new functionality and is fast at finding the most important bugs [37]. Although manual testing has many benefits, a few is mentioned above, sometimes it can be more efficient to perform tests with the aid of a computer. The following section looks into the technique of using software to test software. 2.4 Automated Testing Dustin, Garrett, and Gauf provides an inclusive, high-level definition of automated software testing that is “The application and implementation of software technology throughout the entire software testing lifecycle (STL) with the goal of improving STL efficiencies and effectiveness” [38]. Dustin, Garrett, and Gauf’s definition states that test automation can be made in all stages of the software testing process, that is requirement analysis, implementation, execution and evaluation can all be automated with certain methods [16] (see section 2.6.1). But as already mentioned in the delimitations, section 1.5, the term test automation in this thesis is focused on the software development process that result in test automation in the activities of test execution and test evaluation from the testing process (see section 2.2.1). Software engineers in Test Automation have a varied range of tasks such as; planning and implementing test scenarios, developing test automation frameworks, preparing and configuring the infrastructure to run the tests and to present the test result reports [39]. Typically, a tool for Continuous Integration (CI) is used for running the tests and displaying test reports, the developer “push” (send) a code change to the code repository, the CI tool automatically builds the code, run smoke tests and provide build and test results [40]. Kasurinen, Taipale, and Smolander stresses that test automation commonly is used for repetitive tasks, from their survey with 31 organization managers they concluded that the respondents thought that unit testing and regression testing were the two most efficient application areas of test automation tools [11]. Dustin, Garrett, and Gauf state that test automation is typically used for the following testing types: unit tests, regression tests, functional tests, security tests, performance tests, stress tests, concurrency tests and code coverage verifications [38]. 2.5 Benefits and Limitations of Test Automation Rafi et al. conducted a systematic literature review and practitioner survey in 2012, in which benefits and limitations of test automation was identified from research and later verified by practitioners [9]. In the following two sections the benefits and drawbacks with test automation are explained with a starting point taken from “Benefits and Limitations of Automated 9 2.5. Benefits and Limitations of Test Automation Software Testing: Systematic Literature Review and Practitioner Survey” [9]. The references from “Benefits and Limitations of Automated Software Testing: Systematic Literature Review and Practitioner Survey” has been checked and are shortly summarized under each factor, when newer references were found these have been added. References that support the factor are shown next to the heading of the factor, note that the reader is advised to read the whole section, several times one references occurs in more than one factor and is only described once. The goal has not been to perform a systematic literature review. The aim of this chapter is to give the reader a solid introduction to what benefits and limitations that comes with test automation. 2.5.1 Benefits of Test Automation Rafi et al. presented 9 benefits from their literature study and in their survey they show that practitioners are in agreement with research for 8 of the 9 benefits, in the following section research related to these 8 factors is summarized. The ninth factor, left out here, was “increased fault detection” [9]. Improved product quality [11], [41] Rafi et al. define quality as a low defect level in SUT [9]. Malekzadeh and Ainon presents a technique for automated test case generation, which according to the authors can be used to reveal ambiguities in the specification of SUT [41]. The method was only validated on a non-industrial example [41]. Through surveys and interviews in 30 organisational units Kasurinen, Taipale, and Smolander found that test automation can provide quality improvements from increased test coverage and reduced testing time [11]. Test coverage [11], [18], [38], [42]–[49] According to Hoffman and Dustin, Garrett, and Gauf test coverage can be increased with automated tests [18], [38]. They explain that an increase could be due to that automated tests can cover more combinations of data and paths compared to when testing is performed manually [18], [38]. Saglietti and Pinte created a multi-objective optimization model that optimizes test case generation in unit and integration tests [42]. The objectives of the model being to maximize code coverage and minimize test amount [42]. From experimental verification in industrial software the authors concluded that coverage could be improved [42]. With usage of the programming language Sulu, Tan and Edwards performed unit test case generation on non-industrial software [43]. The result being 90% statement coverage and high mutation coverage [43]. Alshraideh carried out a similar study, where a partly automatic generation of unit tests were made for JavaScript code [44]. Non-industrial experiments in their research showed that coverage can increased with the tool [44]. The authors argue, but do not provide data to support the statement, that this type of tools can lead to a reduction in testing cost [44]. In 2008, Burnim and Sen presented heuristic search strategies for generating test input with symbolic execution [45]. The authors validated their method on the two open source projects Grep 2.2 and Vim 5.7 that had 15K lines of code (loc) and 150K loc, respectively [45]. The authors were able to increase coverage with their method and argues that the method can be used on real-world software systems [45]. Geetha Devasena, Gopu, and Valarmathi proposed a hybrid optimization method for generating tests for conditional branches, the method used both a Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) [46]. The aim of this method was to achieve a certain 10 2.5. Benefits and Limitations of Test Automation branch coverage, the authors validated the method on benchmarking samples and compared the result to a method that only uses GA or PSO [46]. The hybrid method could reduce the execution time by around 50% compared to when only GA or PSO was used [46]. The authors suggest that this type of method can minimize testing effort, time and cost [46]. Nagowah and Kora-Ramiah created a tool under the name Control Ripper and Test Case Player (CRaTCP) that can be used on web applications to generate and execute test cases [47]. The tool looks for fields and buttons where the user can give input to the web application, it considers the given constraints for these input fields and generate test cases to cover the possible inputs [47]. A tester can later execute the test cases on the web application, the tool does not provide for automatic test evaluation [47]. The authors have not provided any validation data of the tool, they state that the ambition with the tool is to achieve complete test coverage by executing the generated test cases [47]. Reduced testing time [11], [16], [46], [47], [50]–[54] In 1999, du Bousquet and Zuanon presented their testing environment for synchronous reactive systems [50]. The tool automatically generates test data and with a user provided test oracle it also provides for automatic test execution and evaluation [50]. The authors have validated their tool and argues that the solution can result in cost reductions and reduced testing time [50]. Wissink and Amaro argues that a Keyword-based approach to test automation can lead to reduced testing time[51]. In the Keyword-based approach the test cases are defined as a set of actions, or keywords, a test automation engineer develops tests that can perform these actions and arrange for a tool where the tests can be executed [51]. The authors have not validated this method, instead they refer to a white-paper 1 where promising results of the technique has been presented [51]. Haugset and Hanssen has used Robot Framework, a Keyword-based testing tool, for implementing regression tests [52]. The authors report that automated testing can decrease the testing effort in regression tests and reduce costs due to finding bugs earlier in the development process [52]. Amannejad et al. showed in an industrial setting that test automation can save time in the test processes of test design, test execution and test evaluation [16], see section 2.6.1. Reliability [29], [48], [55] Reliability here means that tests produces the same result when repeated [9]. Test automation can result in more reliable testing [29]. Automated tests will be executed several times and the execution is always performed in the same way [29]. In 2018, Banerjee and Yu investigated how test automation could be used to test face recognition software made possible with a robotic arm [48]. Banerjee and Yu reports that test automation resulted in reliable tests, the authors also argue that test coverage was increased with test automation [48]. Increase in confidence [52] From interviews the authors of “Automated Acceptance Testing: A Literature Review and an Industrial Case Study” found that automated testing can increase the confidence for the perceived quality of the SUT [52]. 1 http://www.sdtcorp.com/cs_gtnprogram.html 11 2.5. Benefits and Limitations of Test Automation Reusability of tests [11], [14], [54], [56], [57] In 2009 a tool was developed for generating test cases in Java [56]. The tool has not been validated in any industrial setting, but the authors argue that reusable test cases can be created with the tool [56]. Kasurinen, Taipale, and Smolander state that test automation requires an initial investment but that with the possibility of increased reusability that comes with automation, it can lead to a payoff in the long term [11]. According to Obele and Kim a software test automation tool can improve test reusability [54]. The authors present their tool and state that in their experience automated software testing can free testers from mundane activities and minimize human effort, cost, time and human errors [54]. Flenström et al. have provided and validated an optimization model for prioritizing test cases based on the possibility to reuse code [57], see section 2.6.1. Less human effort [14], [46], [49], [52], [54] Haugset and Hanssen and Berner, Weber, and Keller reports that with automated regression testing, testers have more time for other test activities [14], [52]. The authors also state that test automation makes it easier to test complex interfaces and that it can enable a higher test frequency compared to manual testing [52]. Reduction in cost [46], [50], [52]–[56] Test automation can find bugs earlier in the development process [52]. Due to the possibility of running the automated tests frequently, the test can find simple bugs early by being executed directly after the code has been produced. Bugs found earlier in the development process are cheaper to fix than bugs that are found late [38]. Shan and Zhu provide a solution for test case generation called data mutation [55]. The data mutation method, which is inspired by mutation testing methods, uses mutation operators on input data to generate test cases [55]. The authors validated their method on CAMLE which is a modeling language and environment developed by the authors [55]. The authors state that the method can provide several benefits, these being; reduced costs, good coverage and increased reliability [55]. 2.5.2 Limitations of Test Automation Rafi et al. identified 7 limitations from research, in the survey they show that practitioners agrees with research for 6 of the 7 benefits, in the section below research related to these 6 factors are summarized. The seventh factor, left out here, was “failure to achieve expected goals” [9]. Automation can not replace manual testing [11]–[14] From empirical observations Kasurinen, Taipale, and Smolander, Bach and Pettichord reports that some tasks are better suited for manual testing, while others preferably are automated [11]–[13]. Berner, Weber, and Keller state that manual testing will be likely to detect new defects and argues that automated tests are suitable for revalidation of SUT [14]. Difficulty in maintenance of test automation [11], [14], [58] Kasurinen, Taipale, and Smolander state that test automation will lead to an increased effort in maintenance due to changes in SUT or in product infrastructure [11]. Similar thoughts are express by Berner, Weber, and Keller, who argues that testware has to be maintained at each new release of SUT [14]. 12 2.5. Benefits and Limitations of Test Automation Liu argues that test automation is sensitive to changes in SUT, the author presents a testing language with the aim to simplify maintenance of automated tests [58]. Process of test automation needs time to mature [14], [59] Bashir and Banuri used a model based technique to generate test data [59]. The authors argue that test automation can result in time and costs savings, but that it takes time to achieve these goals [59]. In section 2.5.2, similar thoughts is described, which are expressed by Berner, Weber, and Keller [14]. False expectations [13], [14] From observations made by Berner, Weber, and Keller, the authors found that test automation failed to deliver on the expectations of exposing known defects [14]. The authors also note that automation does not deliver a short return on investment as some practitioners had expected [14]. Pettichord reports that from his experience there are several false expectations with test automation, such as, the expectations that test automation can be achieved as a side project or that test automation can provide all accomplish benefits at the same time, such as combining a wish for increased test coverage and time savings [13]. Inappropriate test automation strategy [14], [60] Berner, Weber, and Keller and Persson and Yilmaztürk argues that choosing the right test automation strategy is vital for success in test automation [14], [60]. The authors of “Observations and Lessons Learned from Automated Testing” explains four commonly occurring problems with test automation strategies; “misplaced or forgotten test types”, “wrong expectations”, “missing diversification” and “tool usage is restricted to test execution” [14]. Persson and Yilmaztürk argues that the test strategy has to consider the different needs for manual and automated testing [60]. Lack of skilled people [13], [49], [60], [61] Rafi et al. describe this factor as that automation requires many different types of skills [9]. Pettichord says that from his experience it can be hard to maintain automated tests that are developed by inexperienced developers [13]. In an observational study, Fecko and Lott report that test automation demands several skills; test tool knowledge, proficiency in development, software design and expertise in SUT [61]. The authors argue that testers commonly have good tool knowledge and expertise in SUT, but that there is a lack of software development and design skills among testers [61]. In 2018, Gafurov, Hurum, and Markman propose test automation solution based on a keyword-driven testing language [49]. The aim is to decrease the cost of test implementation by having automation engineers (expensive personnel with development skills) implement test steps and letting tests analysts (less expensive personnel, non-developers) organize and combine test steps with input data [49]. The authors validated their method in an industrial setting and argue that this approach can decrease the test implementation cost [49]. It was also found that automated testing resulted in a decrease in manual test effort and an improved test coverage [49]. Persson and Yilmaztürk explain that if test automation is implemented by personnel without the right competence, it can result in higher cost and even failure [60]. The authors recommend that the automation project should have expertise knowledge available, but not all who automate have to be experts [60]. This competence has to remain within the company after the automation project has been realized [60]. The automation project should consist of 13 2.6. What to Automate a mixed knowledge between testing, development, project management and other skills in related areas, e.g. database and hardware [60]. 2.6 What to Automate As mentioned in the introduction there is a lack of research of which tests to automate [16]. In this section systematic and checklist based approaches for selecting which test cases to automate are described. 2.6.1 Systematic Approaches In research, optimization and simulation methods have been used to help practitioners in the decision making of which tests to automate. This section will describe a few examples but the aim is not to cover these methods in depth, rather to provide a short overview of other solutions to the problem. The reason for not investigating these methods further is described in the delimitations, section 1.5. It is also worth mentioning that according to Amannejad et al. the research for systematic approaches for deciding what to automate is still in an early stage [16]. Optimization based approaches In 2014, Amannejad et al. formulated an optimization problem of what to automate and verified their approach in an industrial setting [16]. The authors used a matrix that indicates what stages of a use-case that should be performed with manual testing methods or automated testing methods [16]. The optimization problem was solved by searching through solutions in the matrix with a genetic algorithm and the goal function, i.e. the evaluation of the solutions, were based on the return on investment [16]. Data was collected from software tools and when such were not available interviews or estimations models were used [16]. As for estimating manual test effort, the authors customized an existing test execution effort estimation model and as for estimating maintenance cost of the automated test code the maintenance estimation principle of COCOMO was used [16]. The optimal value of the goal function for the problem provided a ROI of 367% [16]. The authors found that the test artifacts had to be used more than 2, 3 and 8 times to provide a positive ROI in the activities of test design, test execution and test evaluation [16]. Amannejad et al. state that the highest ROI was gained from automation in test execution, thereafter automation in test design and lastly automation test evaluation [16]. The result was also presented in time savings, where it was found that for test design, test execution and test evaluation a saving of 85, 275 and 21 working days (8-hour) could be made [16]. Ramler and Wolfmaier constructed a constrained linear optimization problem of what to automate [62]. The problem was constrained from a fixed budget and a minimum number of automated and manual test executions [62]. The goal function was constructed by creating functions for risk mitigation, the objective was to maximize the risk mitigation of the sum of manual and automated testing [62]. The authors argue that manual and automated tests fulfil different purposes, automated tests are suitable for mitigating regression risks while manual testing can be used to explore new functionality, this is the reason behind the choice of the constraints and goal function [62]. Ramler and Wolfmaier express that a drawback with this model is that it is simplified and ignores important factors such as maintenance cost for automated tests and growing test effort over time in iterative development [62]. The article is widely cited and google scholar reports that it has been cited 113 times [63], however the authors did not evaluate their model empirically and an empirical evaluation was not found among the citations. 14 2.6. What to Automate In 2018, Flenström et al. proposed a method for helping decision makers to prioritize which tests cases to automate first [57]. The aim of this study is to reduce test effort by prioritizing automation of test cases that have the possibility to reuse code from test cases that have been automated previously [57]. Reuse of test automation code is made possible by comparing the proposed set of manual test cases for automation with the manual test cases that have already been automated, if the steps are similar, it is likely that code can be reused [57]. In this optimization problem the goal function is formulated to measure the manual test effort [57]. The manual test effort decreases when manual tests are replaced with automated tests, the objective is to find the ordered set of test cases to be automated that minimize the manual test effort [57]. The method was empirically validated in a case study consisting of four projects at a company in the vehicular embedded systems domain [57]. The result was that if reuse of test automation code is considered, then the manual test effort can be decreased by up to 12% with the usage of an optimization model for similarity-based reuse of test steps. [57]. Simulation based approach Sahraf et al. constructed a System Dynamics (SD) simulation model with the aim to answer the problem of what to automate in all stages of the software testing process [64]. The SD simulation model was created as a general model and later adapted and validated in an industrial case study [64]. The authors state that it exists an uncertainty of the concrete results from the simulations made in this study, due to uncertainties in the input data to the SD simulation model [64]. It is concluded that the usefulness of the proposed SD simulation model has been shown in this study, but that more research is needed to validate the proposed model [64]. 2.6.2 Checklist Based Approaches In 2015, Garousi and Mäntylä published a multi-vocal literature review with the aim to support decision making on when and what to automate [17]. One of the results in the paper is a checklist, see factors in table 4.2, where the author has used coding to identify 43 factors from 78 sources [17]. These factors can be evaluated to find out if test automation is suitable for a company and if so, which tests that can be automated. The factors are grouped into five categories: 1) Software Under Test-related factors, 2) testrelated factors, 3) test-tool-related factors, 4) human and organizational factors and 5) crosscutting and other factors [17]. The authors assigned an area weight to each factor which is the frequency in which the factors appear in their sources [17]. In the paper the authors clearly state that the area weight cannot be viewed as a prioritization made for practitioners, rather the checklist needs to be evaluated and prioritized in the context that it will be used [17]. In 2006, Oliveira, Gouveia, and Filho proposed a viability analysis method, which uses 9 questions, Table 2.2, together with a decision tree, Fig 2.2, when deciding on whether to automate a given manual test or not [65]. The questions are answered with “High”, “Medium” or “Low”, represented in the decision tree as “H”, “M” and “L”. The authors, Oliveira, Gouveia, and Filho, trained their model on 500 manual tests and validated it on 200 tests [65], the model has been recommended in one paper by Assad et al. [66] and used and evaluated with positive results in another paper by Kadry [67], however no usage in an industrial setting has been found. In the book Implementing Automated Software Testing, Dustin, Garrett, and Gauf present a checklist, shown in Table 2.3, consisting of 12 factors which aim to answer whether a specific test case should be automated or not [38]. The authors argue that a test case is a good candidate for automation if all questions are answered with “yes” [38]. Dustin, Garrett, and Gauf also provide 6 guidelines for when and what to automate tests, where the authors among 15 2.6. What to Automate Table 2.2: Questions in viability analysis method [65] Id 1 2 3 4 5 6 7 8 9 Topics Frequency Reuse Relevance Automation Effort Resources Manual Complexity Automation tool Porting Execution effort Related Questions How many efforts is this test supposed to be executed? Can this test or parts of it be reused in other tests? How would you describe the importance of this test case? Does this test take a lot of effort to be deployed? How many members of your team should be allocated or how expensive is the equipment needed during this test’s manual execution? Is this test difficult to be executed manually? Does it have any embedded confidential information? How would you describe the reliability of the automation tool to be used? How portable is this test? Does this require a lot of effort to be executed manually? Figure 2.2: Decision Tree in viability analysis method [65] other things recommend practitioners to consider time and budget constraints when deciding what to automate and recommends specifically to automate repetitive tasks [38]. The authors state that the checklist has been used in various projects [38], the authors refer to one of the authors previous book “Effective Software Testing: 50 Specific Ways to Improve Your Testing” written by E. Dustin, unfortunately they do not elaborate more on these results. 2.6.3 Other Approaches and Advices Graham and Fewster have provided a short text about which tests that should be automated first in their book Experiences of test automation: Case studies of software test automation from 2012. Graham and Fewster recommends testers to consider the following factors when choosing what to automate [68]: • most important tests, • a set of breadth tests (sample each system area overall), • tests for the most important functions, • tests that are easiest to automate, 16 2.7. Return on Investment Table 2.3: Checklist for deciding what to automate Dustin, Garrett, and Gauf [38] Test Automation Criteria Is the test executed more than once? Is the test run on a regular basis, i.e., often reused, such as part of regression or build testing? Does the test cover most critical feature paths? Is the test impossible or prohibitively expensive to perform manually, such as concurrency, soak/endurance testing, performance and memory leak detection testing? Are there timing-critical components that are a must to automate? Does the test cover the most complex area (often the most error-prone area)? Does the test require many data combinations using the same test steps (i.e., multiple data inputs for the same feature)? Are the expected results constant, i.e., do not change or vary with each test? Even if the results vary, is there a percentage tolerance that could be measured as expected results? Is the test very time-consuming, such as expected results analysis of hundreds of outputs? Is the test run on a stable application; i.e., the features of the application are not in constant flux? Does the test need to be verified on multiple software and hardware configurations? Does the ROI as discussed in Chapter 3 of Implementing Automated Software Testing [38] look promising and meet any organizational ROI criteria? Yes No • tests that will give the quickest payback, • tests that are run the most often. The authors argue that a high return on investment can be achieved by selecting tests from different product areas and automating the most important tests first [68]. Both Graham and Fewster and Dustin, Garrett, and Gauf bring up the importance of not rushing into automation and attempting to achieve too much at an early stage, due to the learning curve of automation [68] and limited experience with the automation tool and other related factors to automation [38]. Another recommendation that both of the previous mentioned author gives is to automate tests that have repetitive tasks which can free up the time for testers to do other work [38], [68]. 2.7 Return on Investment Return on investment (ROI) is the ratio between the benefits and the costs of a given investment. ROI shows how much profit that is generated from each monetary unit spent on an investment. 2.7.1 Return on Investment for Test Automation Before the test automation process is started it is advised to calculate the ROI, in order to be sure that the savings and benefits are greater than the costs of automation [29], [38], [68]. Münch et al. define a ROI formula for test automation [69] as: ROI = Benefit Gain Costs = Investment Investment 2.1: ROI formula for test automation by Münch et al. The factors that can be included into the gains and costs of the formula can be divided into intangible and tangible factors. 17 2.7. Return on Investment 2.7.2 Intangible Factors Intangible factors are factors that are difficult to measure in a quantifiable way. In this section the following factors will be discussed briefly: time for manual tests, testers motivation, test coverage and quality in testing. In addition to the factors mentioned above, the following intangible factors have been identified in section 2.5; improved product quality, reduced testing time, reliability, increase in confidence, reusability of tests, less human effort and reduction in cost. Automated testing can result in testers having more or less time for manual testing. If automated tests find simple bugs such as platform/browser specific bugs, the developer is notified about this and have the possibility to fix these bugs before a tester performs a manual test. With less simple bugs to report, testers have more time to perform manual testing to find bugs that are harder to identify [8], [10], [14]. Test automation can also reduce testing time by providing efficient testing tools and by automating test activities in the software testing process (section 2.2.1) (for more information about reduced testing time, see section 2.5.1) [11], [16], [46], [47], [50]–[54]. The other possibility is that development, maintenance or analysis of data from automated testing becomes time consuming and allows for less manual testing [12]. Automated testing is good for performing the same test on multiple platforms or configurations. Time can be saved by verifying several platforms at once with a script instead of doing it manually [38]. Performing the same test in many platforms/configurations can be a monotonous task, which can lead to a tester getting tired and missing potential problems [38]. Motivation of testers might increase or decrease due to automation. Some testers will enjoy test automation and embrace it, while others might be sceptical and see it as less time for manual tests [18]. As shown in section 2.5.1, test coverage can be improved with automated tests [11], [18], [38], [42]–[49]. It is also possible that test coverage decreases with automation if a simple automated test is implemented instead of an exploratory testing session that would cover more functionality [18]. Persson and Yilmaztürk report that a weak knowledge of the existing test coverage can be a problem when implementing test automation [60]. The authors report that they experienced trouble with establishing test automation due to having to specify automation coverage in manual test coverage measurements [60]. Change of quality in testing [18]. Test automation allows for using different types of tests, such as stress testing, load testing, higher frequency of regression testing, test multiple platforms at once etc. These tests might not be possible to perform with manual testing, but manual testing can be preferred in other cases, for example GUI testing is often done manually due to the ability to have a human verify that the application “looks good” and the fact that the GUI might change rapidly and cause a high cost of maintenance if the tests were to be automated [12]. Shorter release cycles [8], [14]. Due to running automated tests frequently a more stable version of the software can be obtained in between release testing periods [8], [15]. This can be a part of allowing the release cycle to be shortened. 2.7.3 Tangible Factors Hoffman divides the tangible factors of test automation into fixed costs, Table 2.4, and variable costs, Table 2.5 [18]. Some of the factors of cost and benefit, Table 2.6, from test automation will be described briefly in this section. Test automation might result in the need of upgrade in hardware, since it can be resource consuming to run a big test suite several times per day. Another cost factor is the software 18 2.7. Return on Investment Table 2.4: Fixed costs of test automation Factor Hardware Automation ware soft- Software training and setup Definition Hardware for running automated tests. Testware software licenses and support. Initial configuration of tool(s), training of staff, initial test suite implementation. Reference(s) [18], [69] [18], [69] [18], [69] used for test automation, that includes software for creating automated tests, continuous integration server, software for analyse test results etc. In some cases, companies choose to use an open source solution such as Selenium, which reduces the cost for software. But even in those cases, there is still a cost for configuring the tool and training the staff in the tool. Table 2.5: Variable costs of test automation Factor Test design & implementation Test maintenance Definition Designing and implementing tests for automation. Maintenance of tests that are broken due to e.g. new functionality making the tests outdated. Reference(s) [18], [69] [18], [69] The largest cost of test automation is test design and implementation. Writing automated tests requires development skills and knowledge of the test tool used [12]. Test cases also need documentation, maintenance and be tested themselves [12]. Product changes may result in broken tests or that new tests need to be developed to cover new features. Table 2.6: Benefits of test automation Factor Failure cost Greater regression test coverage Test execution savings Definition Failures found by automated tests. Being able to run regression tests more frequently increases the coverage from these tests. Automated tests run faster than tests performed by a human resulting in possibility to run more tests per time unit. Reference(s) [69] [38] [18] Some bugs are more likely to be found by computers and automated tests than manual tests performed by humans [70]. Manual tests are good for analysing what appears on the screen, but automated tests can better examine data that lie behind the screen. That can enable for example testing for memory leaks and monitor unexpected system calls [18]. The benefit 19 2.7. Return on Investment from finding these bugs that without automated tests would end up in production can result in a great saving for the company [69]. Being able to run an automated regression test suite frequently will increase the test coverage and might reduce the number of bugs that would (or would not) be found in a later stage of the development process. Another factor to consider is savings in test execution. Overly simplified you could say that a manual tester can execute tests for 8 hours per day, while a test automation engineer would create and maintain tests for 8 hours per day and then let those tests run for an additional 16 hours per day [71]. Often the benefits of automated testing are determined by comparing costs of automated tests with costs of manual tests [18]. Graham states that benefits of test automation can be measured by considering the equivalent manual test effort, that is the time that it would take to execute the tests without automation [72]. 2.7.4 ROI using Equivalent Manual Test Effort Hoffman defines a ROI formula for test automation that can be used in the situation when manual testing has been used in the project and one are to consider investing in test automation [18]. This ROI formula differs from the others described in this section with that it considers the incremental costs and benefits of automation. For this reason, it can be used to calculate the return on investment from one small project or even isolated tests, whereas the other formulas described in this section consider ROI of the whole automation investment. The advantage being that initial investments, which may have been made a long time ago, such as tool requisition and staff training, can be disregarded from the ROI calculations. ROI(in time t) = ∆(Benefits from automation over manual) ∆Ba = ∆(Costs of automation over manual) ∆Ca 2.2: ROI formula for test automation using EMTE by Hoffman Table 2.7: Variables in Hoffman’s ROI formula. Var n1 n2 N Ba Ca ∆Ba ∆Ba (in time t) ∆Ca ∆Ca (in time t) Definition Number of automated only test executions. Number of manual tests executions. Average number of runs for automated tests before maintenance is needed. The benefits from automated testing. The costs of automated testing. The incremental benefits from automated over manual testing. Σ (improvement in fixed costs of automated testing (t / Useful Life)) + Σ (variable costs of running manual tests n2 times during time t) Σ (variable costs of running automated tests n1 times during time t) The incremental costs of automated over manual testing. Σ (increased fixed costs of automated testing times (t / Useful Life) + Σ (variable costs of creating automated tests) Σ (variable costs of creating manual tests) + Σ (variable costs of maintaining automated tests) (n1 / N) Note that much of the values used in Hoffman’s equation need to be determined for both manual and automated testing. For example, in the equation for incremental benefits from 20 2.7. Return on Investment automated over manual testing (∆Ba ), Table 2.7, the manual and automated tests used in the equation are assumed to cover the same test cases. That is, the benefits are defined as equivalent manual test effort (EMTE) minus the automated test effort [72]. Schwaber and Gilpin uses a definition of cost of test automation that is similar to Hoffman’s, namely the following one [73]: Cost of test automation = Cost of tool(s) + Labor costs of Labor costs of script + script creation maintenance 2.3: Cost formula for test automation by Schwaber and Gilpin But both these definitions leave out the factor of training the employees to use the automation tool. The ROI formula by Münch et al. takes this factor in consideration [69], the variables are defined in Table 2.8. Similar to the formula by Hoffman, the benefit in this formula is defined as the saved cost of executing automated tests compared to executing all tests manually. ROIn = Benefit Gain Costs = Investment Investment 2.4: ROI formula for test automation that include tool cost by Münch et al. Table 2.8: Variables in Münch et al. ROI formula. Var n Gain Costs Investment Definition Number of testing cycles Costs of executing all tests purely manually (EMTE) Cost of executing the manual and automated tests for all the cycles and the cost of the automation investment. Cost of buying the automation tool, training the employees in the tool and an initial test automation suite implementation. 21 3 Method The research method, shown in Fig 3.1, started by reviewing the literature to select tools for the method and evaluation of the results in the thesis. The literature study aimed to answer the following areas: how test cases can be selected for test automation, what benefits exists for test automation, how interviews should be conducted, how a checklist can be evaluated and to find information that could give the reader a short introduction to testing, manual and automated, practices. As for evaluation of the result, it was studied how return of investment can be measured for test automation. When a sufficient literature base had been established the interview process was started. The aim of the interviews was divided into three parts, responding to the research objectives, that is, to identify the benefits that Sectra Imaging IT Solutions Ltd want to achieve with test automation. To verify that the proposed checklist is applicable for the current situation and if so, to find out what modifications that are needed to establish a checklist that is suitable for the company. The data from the interviews were processed and the result was a decision tree. This tree was to be used when deciding which test cases to automate. When a set of test cases had been found, the automation process could start. During the automation process the effort spent was strictly noted, this would provide data that were to be used when performing return on investment calculations. The return on investment calculations were used to answer the research question, Can the checklist provided by Garousi and Mäntylä [17] be modified in such a way that it can be used to select test cases for test automation that result in economic and organisational benefits? 22 23 Figure 3.1: Overview of the research method. 3.1. Qualitative Methods Case Study The method used in this thesis is best defined as case study. Robson and McCartan defines the term case study as: “Case study is a strategy for doing research which involves an empirical investigation of a particular contemporary phenomenon within its real life context using multiple sources of evidence” [74]. Runeson and Höst describes the stages of case studies as [75]: 1. Case study design: Defining the objectives and planning the case study. 2. Preparation for data collection: Evaluating which resources that are available and scheduling data collection activities. 3. Collecting evidence 4. Analysis of collected data 5. Reporting Runeson and Höst define four purposes of research, exploratory, descriptive, explanatory and improving [75]. The purpose of this case study was of the improving type, with the aspect that this thesis aims to improve being the test case selection for test automation at Sectra Imaging IT Solutions Ltd. Case studies can use both qualitative methods such as interviews or focus groups and quantitative methods such as surveys [75]. Runeson and Höst state that it is preferred to perform case studied with mixed methods, that is to use both qualitative and quantitative methods [75]. This thesis makes use of a mixed method, the qualitative methods used in this thesis are presented first, at the end of the method chapter the qualitative methods are described. 3.1 Qualitative Methods In this section the qualitative methods used in this thesis are explained. Interviews were used in several areas of this thesis, to identify benefits from test automation, to evaluate and modify the checklist and to validate the checklist. This section starts with describing how interviews are conducted, before explaining the method for the actual interviews. Interview structure Interviews can be categorised as structured or unstructured and formalised or informalised, depending on how standardised the questions are [76]. In a structured interview, the interviewer read out each question and notes the response on a standardised schedule [76]. A semi-structured interview is more flexible, the interviewer may change the questions and their order depending on the outcome of the ongoing interview [76]. When the interview will be used to gather data to a quantitative analysis, standardised questions are often preferred over more flexible forms of interviews, on the other hand non-standardised questions are useful for qualitative analysis [76]. Runeson and Höst state that the interview questions can be structured in three manners, following the funnel, pyramid or time-glass model [75]. Using the funnel model, the initial questions are open and gets more specific as the interview proceeds [75]. The pyramid model follows the opposite structure of the funnel model, moving from concrete to open questions and the time-glass model starts with open questions, then goes to concrete questions and later uses open questions again [75]. 24 3.1. Qualitative Methods Saunders, Leiws, and Thornhill suggest that its commonly preferred to participate in an interview rather than to filling out a questionnaire [76]. This can be due to that the respondent does not have to write down the answers themselves, the respondent can get feedback during the interview and that in some occasions the respondent might not feel trust to give out information through a questionnaire [76]. Another situation where it is preferred to conduct an interview instead of a questionnaire is when the questions are complex and the respondent might need help to interpret them. To get an accurate answer from the respondent the interviewer should ask open-ended questions, prepare follow-up questions and be as neutral as possible when asking questions [77]. Turner III recommends the interviewer to conduct a pilot test of the interview to give the interviewer possibility to refine the interview design before performing the actual interview [77]. After the interview has been conducted it should be transcribed before it can be analysed, it can also be helpful to ask the respondents to review the transcript to give them the possibility of correcting the interpretation and change or rephrase their answers [75]. 3.1.1 Benefits from Test Automation In order to be able to answer the first research objective, What do practitioners believe are the common benefits software producing companies relate to test automation?, two interviews were held. The interview questions were designed from Benefits and Limitations of Automated Software Testing: Systematic Literature Review and Practitioner Survey [9], discussed in section 2.5.1. The respondents were asked if they agree that test automation can have an impact on the benefits defined in [9], the questions were formulated to be as neutral as possible, the questions can be found in section 7.A. After these initial questions the respondents were asked if they thought there were any other benefits with test automation. The interviews were semi-structured as follow-up questions were asked throughout the interviews, the questions were organized following the funnel model. The interviews were carried out with two respondents, one of them being Vice President of Product Development with 3.5 years of experience in that role, the other being a CI/CD Engineer with 10 years of experience. 3.1.2 Checklist Evaluation & Modification Checklists can be used as a decision and memory aid when performing a task. A strong reason for using a checklist when taking a decision or performing a task is that it can make the outcome more predictable and reliable independently of who that is performing the task [78]. Kramer and Drews defines four types of checklists: 1) laundry list 2) criteria of merit list 3) sequential checklist and 4) flowchart\diagnostic checklist [78]. A laundry list is used to remember steps or items, the criteria of merit list are used to rate and rank items, a sequential checklist defines steps where the performance order is important and a flowchart or diagnostic checklist is used to make decisions based on the current situation [78]. Stufflebeam talks about how to create and evaluate checklists, the first five steps describes tasks for how to create a checklist, in this thesis the checklist used already exists so focus will be on the subsequent steps of how to review and validate a checklist. After the checklist have been created an initial review should be made by asking potential users to judge and give feedback about the checklist [79]. When feedback has been received the checklist needs to be revised. After the initial review the checklist developer can ask potential users of the checklist to grade the categories of the checklist using a given scale, such as a Likert scale [79]. The last steps of evaluating a checklist consists of giving the checklist to users and ask them to use it in their work and provide feedback and redesign the checklist based on the outcome from the evaluation. Other methods that can be used for evaluating a checklist is the Delphi technique 25 3.1. Qualitative Methods [80], interviewing experts and asking about their opinion of the checklist [81] and evaluating a checklist through a survey [81]–[84]. In an industrial multi-case study, Usman et al. investigated how checklists for effort estimation in agile teams can be developed. This study resulted in a method that can be used to develop and validate checklists. The method was implemented at three software companies and the researchers used semi-structured interviews, workshops, questionnaires, metrics and checklist usage data to carry out the steps in the proposed method [85]. The method consists of the following five activities [85]: 1. Understand estimation context In this step the creator of the checklist should study how the current work process is done and while doing so collecting factors that can be used in the checklist. 2. Develop and present initial checklist The identified factors from the previous step are put together into a checklist. The checklist is to be presented to the agile teams and in an iterative manner it should be modified until consensus is reached. 3. Validate checklist statically At this stage the checklist is used for the first time by the teams. In the context of the case study [85], that aimed to study effort estimation, the teams used the checklist to estimate work performed in previous sprints. During this stage small changes are allowed from team members, if any large changes are suggested it should be discussed with the whole team before implemented. 4. Validate checklist dynamically Now the checklist is ready to be used in the everyday work. The checklist can still be modified when needed, it should not be seen as a static document. 5. Transfer and follow up When the checklist has been used and validated dynamically, it should be reviewed and the results of its usage can be concluded. These results should be communicated to the management, to allow for it to become a standardized tool. After a reasonable time period a second follow up should be made to study the usefulness of the checklist. Figure 3.2: Overview of the method used for modifying the checklist Checklist evaluation In this thesis the checklist from Garousi and Mäntylä was used. It contains factors that can be used when deciding to automate and that can help with identifying what test cases to automate [17]. Garousi and Mäntylä recommend that this checklist is evaluated at the company 26 3.1. Qualitative Methods that wish to use it, some factors might not be applicable, and the level of importance can vary in different companies [17]. The checklist was evaluated for this project in several stages of the thesis. First it was used in an interview setting with the respondent being a test automation engineer with 6 years of experience. In this interview the respondent answered all factors in the checklist with plus and minus signs as instructed by the authors of the checklist. Analysis from the answers could indicate whether the respondent thought that the checklist was useful at the company. After modifications to the checklist had been made, the checklist was evaluated in two additional ways. First the checklist was evaluated by studying the results from the interviews that aimed to modify the checklist, verifying that the respondents thought that a reasonable number of factors were important at the company. Secondly, the checklist was evaluated when it was used to select test cases, verifying that the checklist could be used. Checklist modification A survey was created to modify the checklist. To measure the level of agreement from the responders a Likert scale was used. In the original Likert Scale the following responses were used (as cited in [86]): 1. strongly approve 2. approve 3. undecided 4. disapprove 5. strongly and disapprove. According to Li, Likert scales are a popular choice in research due to that they are easy to construct, provide numerical results and have a good reliability [87]. Likert scales can be created with different numbers of scale points. A large number of scale points can confuse responders and increase the measurement error [87]. It is common for Likert scales to have 5- or 7-scale points [88]. In a 5-point scale, the points can be labelled as for example [87]: 1. strongly disagree 2. disagree 3. neither disagree nor agree 4. agree 5. strongly agree One way of evaluating the result and obtaining the central tendency from Likert scale data is by using the mean value [86]. The mean value is calculated by assigning scores to each scale point, summarising the score from all respondents and dividing the sum by the number of the respondents. In the survey, the responders looked at each factor in the checklist by Garousi and Mäntylä [17] and were then asked select a response to the statement “The following factor is important when deciding if the given situation favours test automation”, the response had to be selected from the following Likert scale points “Disagree Strongly”, “Disagree Slightly”, “Agree Slightly”, “Agree Strongly” or “Do not know”. The responder also had the option to write their own factors that they thought was important when evaluating tests to automate. The last part of the questionnaire asked the responder to state some product areas that they thought was suitable for automation and asked to motivate their answer and to connect it to one or several of the checklist questions. The checklist was reviewed in an interview setting and the survey was used as an aid to carry out these semi-structured interviews that were conducted following the pyramid model. The interviews were carried out with two respondents, one tester and one developer with 11 respectively 3 years of experience in their role. One respondent was active in interface rich desktop and web products, whereas the other respondent was solely active in web related products. 27 3.1. Qualitative Methods Scores to Likert scale points The Likert scale points were assigned scores as shown in Table 3.1. The survey was reviewed in a semi-structured interview with a total of two respondents, thus each factor could get a score between 0 and 8. For each factor in the survey the mean value of the score was calculated. The mean value for each factor could be a number between 0 and 4. Where 0 meant that both respondents answered “Do not know” and 4 that both respondents answered “Agree strongly”. Table 3.1: Scores assigned to Likert scale points Likert scale point Do not know Disagree strongly Disagree slightly Agree slightly Agree strongly Score 0 1 2 3 4 Inclusion criteria For a factor to be included in the modified checklist the mean value had to be greater or equal to 3. If the mean value for a factor was below 3 it was excluded from the modified checklist. Example of exclusion and inclusion In this section a few examples are shown of when factors were excluded and added to the checklist. The examples are shown in Table 3.2. The first two rows in Table 3.2 shows two factors that were excluded from the checklist and the reasoning that the respondents provided. In the two last rows of Table 3.2 the two factors that the respondents wanted to add to the checklist are shown. 28 Factor Tests require large amounts of data We make several releases of our product The product being tested is highly customizable, i.e. have much configurations. Developers have low knowledge in the product being tested, i.e. product has not been developed on for a long period of time. Factor Id 13 42 44 45 R2: Several releases of our product, that’s a yes. (Talking about factor 4: “SUT is a generic system, i.e. not tailor made or heavily customized system”) R1: No, I don’t think that affect automation. Because here it says “customized”, if it had said “customizable” I would had thought that it’s a factor worth considering. (Talking about a product) R2: We don’t sell the product anymore, but existing customers will continue using it. And people at the company have relatively low knowledge in this area. Because it has not been developed on for ages. That’s also one aspect how well developers know the product. I: Yes, perhaps this is a factor that you would like to add? R2: Yes, exactly. R2: Perhaps it’s worth considering, but its not the most important. R1: No, I don’t think that releases are. . . Even if there only is one release per year, there can still be a lot of iterations that require much builds. Of course, it’s a factor that influence, but I wouldn’t say that it’s the biggest factor. R1: It depends of how data is needed. I guess I would say disagree. A test doesn’t need to have large amounts of data to be automated. If it’s necessary to import much data, then only that part can be automated and then tested manually. I don’t think the two are clearly related. Interview Yes Yes No No Factor added to modified Checklist Table 3.2: Examples of factors from the checklist provided by Garousi and Mäntylä that were excluded and included in the modified checklist. R1 stands for respondent one, R2 for respondent two and I for Interviewer. 3.1. Qualitative Methods 29 3.1. Qualitative Methods Decision Tree representation of Checklist The factors from the second checklist were reformulated and grouped together into a decision tree. The idea of using a decision tree was taken from Oliveira, Gouveia, and Filho [65] as described in section 2.6. In the decision tree (see appendix 7.E) similar factors were grouped together and made into decision points. The order of the factors in the decision tree were decided using the mean value for the factors, meaning that factors that had higher mean value were placed higher up, that is, closer to the start point of the decision tree. The tester was to use the checklist by starting at a decision point, evaluating all the factors belonging to that point and deciding for each factor if the given situation and test case is favourable for automation. The decision tree was used in this thesis to decide which tests that were to be automated. Example of how to use the decision tree Figure 3.3 shows how the decision tree can be used on a test case. Below the steps that are shown in the figure are explained. For example, say that a tester is thinking about automating a test case testing a user login functionality. First the tester considers decision point 1, the factors in group F1. The test is deterministic, either the user gets logged in or he does not, agree is chosen for this factor. The test result does not require human judgment, again agree is chosen. The factors in F1 would be answered as shown in Fig 3.3. The total result from F1 is agree. Since F1 resulted in agree, the tester moves in the right direction in the tree. The next factors to consider, in decision point 2, are the factors in the F2 group. The login functionality should be tested often so 2.1 and 2.5 are answered with agree. But the test is not likely to reveal defects, since it can be assumed that the login functionality has been thoroughly tested in the past, factor 2.2. is answered with disagree. The factors in F2 would be answered as shown in Fig 3.3 and the total result from F2 would be agree. The tester again moves in the right direction of the tree and coming to an endpoint, this endpoint states that according to the tree the test case should be automated. 30 Figure 3.3: Decision Tree with path marked out from the example. 3.1. Qualitative Methods 31 3.1. Qualitative Methods 3.1.3 Validation of Decision Tree by Usage on Regression Test Cases A step in the verification of the decision tree was to study when and why the result from the tree was to not automate a test case. For this purpose, a group interview was conducted. The group interview was held with two respondents. The respondents were developers in a web related product with 3 respectively 7 years experience. In the interview the author’s usage of the decision tree for 11 regression test cases was verified and changed when needed. The interviewer started the discussions by presenting the test case, later the two respondents discussed the usage of the decision tree for the test case, without much involvement from the interviewer. The test cases were chosen in no specific order and randomly selected from the regression suite. If the result from the decision tree was to not automate a test case, the respondents were asked to motivate why the test was not suitable for automation. The interview can be classified as unstructured as the respondents had the opportunity to freely discuss the usage of the tree without rigid guidelines. The decision tree was also used on one test case for an interface rich desktop product. The result from the usage of the decision tree on this product was verified in an informal meeting with one tester with 11 years of experience in his role. In total the usage of the decision tree was verified on 12 test cases. 3.1.4 Automated Tests Implementation As a part of gathering data for the ROI calculations a set of manual test cases were automated. The reasoning was that it is hard to make estimations of how much time it takes to automate a test case and such estimations are likely to be inaccurate. The decision tree was used on several regression test cases from areas identified in previous interviews (see section 3.1.2). The tests that were considered for automation by the tree was discussed in informal meetings with testers and developers. The tests were coded in C# using the test automation tool Gauge 1 and Selenium 2 for browser automation. After the tests had been implemented, they were reviewed in two steps, first by a developer\tester and later by a test automation engineer. This was done to make sure that the tests had been implemented correctly according to code standards at the company, and to verify that the tests performs the necessary steps as described in the manual test cases. 1 https://gauge.org/ 2 https://www.seleniumhq.org/ 32 3.2. Quantitative Methods 3.2 Quantitative Methods In this section the quantitative methods used in this thesis are explained. The first quantitative method that is described is return on investment (ROI). The second quantitative method described here is surveys, in this thesis a survey was used to validate the decision tree. In the survey section, it is first described how surveys are conducted and afterward how this survey was conducted. 3.2.1 ROI The return on investment was calculated on the tests described in section 3.1.4. To calculate the return on the automation investment the formula by Hoffman was used, it is described in section 2.7.4. The formula is presented below: ROI(in time t) = ∆(Benefits from automation over manual) ∆Ba = ∆(Costs of automation over manual) ∆Ca 3.1: ROI formula for test automation by Hoffman Hoffman [18]. ∆Ba being the incremental benefits from automated over manual testing and is defined as: ¸ (variable costs of maintaining manual tests) (n /N ) ¸ + (variable costs of running manual tests n times during time t) ¸ (variable costs of running automated tests n times during time t) ∆Ba (in time t) = 2 2 2 1 Note that the first variable, variable costs of maintaining manual tests, and N2 is not included in the original formula from Hoffman. N2 is defined as the average number of runs for manual tests before maintenance is needed. This variable was added to have a more realistic estimation of the costs for manual testing, at Sectra Imaging IT Solutions Ltd the manual test scripts are frequently updated and this costs needs to be accounted for in the return of investment. One could argue that improvement in fixed costs of automated testing can be not having to maintain manual tests, but it is easier to follow and understand the calculations if this is added as a new variable. ∆Ca being the incremental costs of automated over manual testing and is defined as: ¸ (variable costs of creating automated tests) ¸ (variable costs of creating manual tests) ¸ + (variable costs of maintaining automated tests) (n /N ) ∆Ca (in time t) = 1 Note that the improvement in fixed costs of automated testing in ∆Ba and the increased fixed costs of automated testing in ∆Ca , both included in the original formula and shown in section 2.7.4, was set to zero in the calculations, since none of these costs could be identified for this project. 33 3.2. Quantitative Methods The primary reason for choosing this formula is that it is suitable for calculating ROI for small projects and isolated tests. In this thesis three test cases were automated in two product areas and the ROI was calculated for each test case and for the whole project. Another reason for using this formula is that initial costs of test automation such as tool costs and staff training does not need to be considered in this formula. Although these costs are relevant for calculating an overall return on investment for test automation, they are not reasonable to consider when calculating the ROI for a smaller project. 3.2.2 Validation of Benefits from Test Automation Surveys are commonly used to describe or explain a phenomenon, the main advantage being the possibility to analyse data from many participants [76], [89]. Dillman defines three data variables that can be captured in surveys; opinion, behavioural and attribute variables [90]. Opinion variables capture respondent’s beliefs, from behavioural variables the actions of the respondents can be studied, and attribute variables are used to study attributes of the respondents themselves [90]. It is often preferred to have survey questions that are closed and standardised, this allows for easier analysis of the data collected [76], [89]. The guidelines for surveys are well summarized by Saunders, Leiws, and Thornhill, similar thoughts are described by Kelley et al., these being [76], [89]: • Careful design of individual questions, • Clear and pleasing layout of questionnaire, • Lucid explanation of the purpose of the questionnaire, • Pilot testing, • Carefully planned and executed administration. When collecting data from surveys it is of special importance to record reasons to why some respondents choose to not participate, to ensure that the result has not been biased from nonresponders [89], [91]. Furthermore, information about how the survey was administrated, how respondents were approached, and response rate should be recorded by the researcher [89]. The organisational benefits that potentially could be achieved with the implementation of automated tests was reviewed with a survey. The responders were asked if the automated tests that have been implemented in this thesis could result in the organisational benefits that have been found (presented in table 4.1). The manual test case that was the basis for the automation and the implemented code was sent to the responders, so that they could review these documents when answering the survey. The survey can be found in section 7.F. The question asked for each benefit was “The implementation of the automated test does to some degree allow for:”, the formulation is relatively weak since it can be difficult to see the benefits from only one automated test, the benefits are likely to be more prominent when there exists a sizable automated test suite. The evaluation of the survey was the same as used in the interview for modifying the checklist (see section 5.). That is, the Likert scale points were assigned values from 0 to 4, if the mean value was greater or equal to 3, agreement was considered to be found. 34 4 4.1 Results Qualitative Results In this section the qualitative results are presented, qualitative data comes in words, pictures and diagrams [91]. The qualitative results presented in this chapter have been collected through interviews with the aim to address research objectives one, three and four. That is, to find which benefits practitioners want to achieve with test automation, and to evaluate and modify the checklist provided by Garousi and Mäntylä. 4.1.1 Benefits of Test Automation To answer the first research objective, What do practitioners believe are the common benefits software producing companies relate to test automation?, two interviews were held. In table 4.1 the benefits that were identified from the interviews are presented, there was no disagreement from the respondents on which benefits that test automation has. Table 4.1: Results from interview of benefits from test automation. Benefit of test automation Improved product quality Increased test coverage Reduced testing time Definition by Rafi et al. [9] Quality in terms of fewer defects present in the software product. High coverage of code (e.g. statement, branch, path) is achieved through automation. Time required for testing, i.e. the ability to run more tests within a timeframe. 35 Agreement from responders Yes. Yes. No. 4.1. Qualitative Results Increased test reliability Increase in confidence Reusability of tests Less human effort Reduction in cost Shorter release cycles AST is more reliable when repeating tests as variance in outcomes can be due to the manual tester running the tests in a different way, but can not make use of the knowledge of the tester Increase of confidence in the quality of the system (e.g. as perceived by developers) When tests are designed with maintenance in mind they can be repeated frequently, a high degree of repetition of test cases leads to benefits, not a single execution of an automated test case Automation reduces human effort that can be used for other activities (in particular ones that lead to defect prevention) With a high degree of automation cost are saved Test automation is a prerequisite for continuous integration and will allow for shorter release cycles. [3], [14], [92] Yes. Yes. Yes. Yes. No. Yes. Excerpts from the interviews are shown for the following two factors, the excerpt was chosen to show how the responders thought when disregarding a benefit. The two benefits are shown in table 4.1. Reduced testing time When asked if automated testing will reduce the time spent on testing, one responder answered: “I think that if automated tests cover a large part of the regression, integration tests and so on, then less time could be spent on manual testing. But the manual testing that remains will be more qualitative. If we consider testing for verify quality, I don’t think the total test time will be changed but it will be a shift in how much manual testing that is carried out. More time will be spent on developing automated tests.” The other responder agreed and expressed similar thoughts. Reduction in cost The responders were asked: Does test automation affect the costs of testing? The first responder answered: “It will not lower the cost of testing, due to automation being an investment. Initially it might increase, but with time I think we will get to a similar level. But with an increase in value. If we achieve more value, the cost per unit of value will be decreased.” The second responder answered: “Yes, it does. Sometimes you might think that it is only an additional cost. I would say yes, it is an additional cost. But test automation will allow for 36 4.1. Qualitative Results faster development of the products and comes with increased quality and confidence in the product.” 4.1.2 Checklist Evaluation The checklist was evaluated in three different phases of the thesis. First before any modifications had been made to the checklist, an interview was carried out, see section 3.1.2. In this interview the checklist was used at the company and from analysis of the answers the conclusion could be made that the respondent thought that most factors in the checklist were suitable at the company. Secondly, from the results from the interviews with the aim to modify the checklist, conclusions can be drawn about if the checklist is applicable for the company. As shown in Table 4.2, 15 out of 43 factors were removed from the checklist. This result speaks for that a clear majority (65%) of the factors in the checklist are considered as important when making decisions regarding test automation at the company. The last evaluation of the checklist, now a decision tree, was when it was used on regression test cases, see section 4.1.4. At this point, the decision tree was used for 12 test cases and the result was that 8 of the test cases should be automated, 3 test cases should not be automated and that one test case should be partly automated. The factors in the decision tree could be used to select test cases for automation. Hence the third research objective, Is the checklist provided by Garousi and Mäntylä [17] applicable in an industrial setting to achieve test automation?, can be answered positively. The checklist is applicable at the company. 4.1.3 Modifications in Checklist In this section the modifications to the checklist are presented. The result shown in table 4.2 answers the fourth research objective, What modifications are required to the checklist provided by Garousi and Mäntylä [17] to make it applicable to Sectra Imaging IT Solutions Ltd? The modifications consisted of that 15 factors were removed from the modified checklist, some of the factors that were removed and the reasoning to why is shown in table 3.2. 28 factors were included in the modified checklist and two factors, 44 and 45, were added from the interview respondents to the modified checklist. Note that factor 11 (“Tests are Unit tests”) got a mean value that should have included it to the modified checklist but due to the delimitations, section 1.5, it was not included in the modified checklist. Table 4.2 presents the modifications made to the checklist. Table 4.2: Results from modification interviews of checklist. Factor Id 1 2 3 4 5 Factor from checklist by Garousi and Mäntylä SUT or the targeted components will experience major modifications in the future. The interface through which the tests are conducted is unlikely to change. SUT is an application with a long life cycle. SUT is a generic system, i.e. not tailor made or heavily customized system. SUT is tightly integrated into other products, i.e. not independent. Mean value score Factor included in Modified Checklist 2.5 No. 2 No. 3.5 Yes. 1.5 No. 2.5 No. 37 4.1. Qualitative Results 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 SUT is complex. SUT is mission critical. Frequent regression testing is beneficial or essential. Tests are performance and load tests. Tests are smoke and build verification tests. Tests are Unit tests. There are large number of test that are similar to each other. Tests require large amounts of data. Humans are likely to make errors when performing and evaluating these tests, e.g. tests require vigilance in execution. Computers are likely to make errors when performing and evaluating these tests, e.g. test execution is not deterministic. Tests can be reused part of other tests. Tests need to be run in several hardware and software environments and configurations. The lifetime of the tests is high. The number of builds is high. Tests are likely to reveal defects, i.e. high risk areas. Tests cover the most important features, i.e. high importance areas. Test results are deterministic. Test results require human judgement. Automated comparison will be fragile leading to many false positives. Tests are instable, e.g. due to timing. We must perform the test repeatedly and if it passes above a threshold we consider that the test passes. Tests are instable, e.g. due to timing. The results cannot be trusted at all. We have experimented with the test automation tool we plan to use and the results are positive. A suitable test tool is available that fits our purpose. We have decided on which tool to use. We can afford the costs of the tool. Our test engineers have adequate skills for test automation. We can afford to train our test engineers for test automation. We have expertise in the test automation approach and tool we have chosen. 2.5 4 No. Yes. 3.5 Yes. 3.5 3 3.5 Yes. Yes. No. 2.5 No. 2.5 No. 3.5 Yes. 4 Yes. 2.5 No. 2.5 No. 3 3 Yes. Yes. 3.5 Yes. 4 Yes. 4 4 Yes. Yes. 4 Yes. 4 Yes. 4 Yes. 3 Yes. 3 Yes. 2.5 3 No. Yes. 2.5 No. 3 Yes. 2.5 No. 38 4.1. Qualitative Results 34 35 36 37 38 39 40 41 42 43 44 45 We are currently under a tight schedule and or budget pressure. We have organizational and top management support for test automation. There is a large change resistance against software test automation. We have the ability to influence or control the changes to SUT. There are economic benefits of test automation. Tests are easy and straight forward to automate. Test results are ease to analyze automatically. Test automation will require a lot of maintenance effort. Our software development process requires test automation to function efficiently, for example agile methods. We make several releases of our products. The product being tested is highly customizable, i.e. have much configurations. Developers have low knowledge in the product being tested, i.e. product has not been developed on for a long period of time. 2 No. 3 Yes. 2 No. 3.5 Yes. 3 Yes. 3 Yes. 4 Yes. 3 Yes. 1.5 No. 3 Yes. N/A Yes. N/A Yes. The factors that remain after the modifications were divided into two groups, one consisting of factors to consider before starting an automation process and the other consisting factors to consider when deciding to automate a test. The factors in the second part of this checklist was reformulated, edited and put into a decision tree (see appendix 7.E). The decision tree, see Fig 4.1, was to be used when evaluating a test case for automation. 4.1.4 Usage of Decision Tree on Regression Test Cases Of the 12 test cases, described in section 3.1.3, the result from the Decision Tree was to fully automate eight test cases and not automate three test cases. In one of the test cases the tree was used on several subsets of the steps in the test case, each subset tested different areas in the product. This test case contained 19 steps, with usage of the tree it was concluded that 7 steps could be automated and 12 steps were recommended to not be automated, that is, roughly 40% of the test case were recommended for automation. The test cases were classified based on test type, from the twelve test cases, one were considered to be a smoke test, eight were sanity tests, two were scenario tests and one test cases was mixed and contained steps that related to smoke, sanity and scenario testing. 4.1.5 Automated Tests To gather data for the ROI calculations three manual test cases were automated. The tests were selected by using the decision tree on several regression test cases from areas identified in the interviews for modifying the checklist (see section 3.1.2). The tests were then discussed 39 4.2. Quantitative Results Figure 4.1: Decision Tree, for full size version see appendix 7.E in informal meetings with testers and developers. With this strategy three manual test cases were chosen for automation. Two of the test cases were considered to be sanity tests and the third was considered to be a scenario test. Two of the test cases were in a web product, one sanity and one scenario test, the third test case, being a sanity test, was found in an interface rich desktop product. The test cases contained between 8 and 24 steps, which was to be automated. The tests were written in C# and the test automation framework that the tests were coded in used gauge 1 and Selenium 2 . The data used in the ROI calculations that comes from the automated tests, such as time to automate the tests, is presented in table 4.3 in section 4.2.1. 4.2 Quantitative Results In this section the quantitative results are presented. Quantitative data are numbers and classes [91]. The quantitative results that was found in this thesis comes from return on investment calculations and a survey to measure organisational benefits of test automation. 4.2.1 ROI This section answers the second research objective, How can economic benefits be measured for test automation? In the literature study, see section 2.7, it was found that return on investment is the most common way to measure the economic benefits of test automation. In this thesis a formula created by Hoffman was used, for a description of the formula see section 3.2.1. The return of the investment is dependent on how much time the tests can run. For the whole project it was found that a positive ROI is achieved after 3.3 release cycles or 1.7 years, as a release cycle at Sectra Imaging IT Solutions Ltd is 6 months. The ROI of the automation project is shown in figure 4.2. 1 https://gauge.org/ 2 https://www.seleniumhq.org/ 40 4.2. Quantitative Results 1.8 1.6 Project Total 1.4 ROI 1.2 1.7 years 1 0.8 0.6 0.4 0 2 4 6 8 10 12 14 16 18 20 22 Releases (6 months) Figure 4.2: ROI of automation project The return on investment differed greatly between the two products, the result is presented in figure 4.3. For the interface rich desktop product, a positive ROI was achieved from the first usage of the tests. Whereas for the web product a positive ROI was found after around 4 years, the ROI for test A was positive at 3.7 years and for test B a positive ROI was achieved after 3.85 years. 41 4.2. Quantitative Results 10 Project Total Web Product A Web Product B Interface Rich Desktop Product 9 8 7 ROI 6 5 4 3 2 0.5 years 1.7 years 1 3.85 years 3.7 years 0 0 2 4 6 8 10 12 14 16 18 20 22 Releases (6 months) Figure 4.3: ROI for individual test cases The reason for that the ROI differed so much is that the maintenance need and costs were estimated to be much higher in the web product than in the interface rich desktop product. The maintenance cost was estimated to be more than 10 times higher in the web product compared to the interface rich desktop product. The variables used in the formula is presented in table 4.3. 42 4.2. Quantitative Results Table 4.3: Data used in ROI calculations Data Interface Rich Desktop Product Web Product Test A Web Product Test B Unit Estimated /Logged 16.69 25.36 19.38 hours l 8 6 4 hours e 3 3.33 3 hours l 24 6 6 months e 2 6 6 hours e 6 6 6 months l 2 2 3 hours e How much time does it take to create the automated test How much time does it take to create an equivalent manual test How much time does it take to run an equivalent manual test How long can the automated test run without maintenance How much time does it take to maintain the automated test How long can the manual test run without maintenance How much time does it take to maintain the manual test 4.2.2 Validation of Benefits from Test Automation As a part of answering the research question, Can the checklist provided by Garousi and Mäntylä [17] be modified in such a way that it can be used to select test cases for test automation that result in economic and organisational benefits?, a survey was sent out to three responders to study if the automated tests could lead to organisational benefits. The result from the survey is presented in table 4.4. The table shows that the test named “Web Product A” can lead to six benefits, “Web Product B” to seven benefits and “Interface Rich desktop product” to four benefits. Table 4.4: Results from survey evaluating organisational benefits of automated tests. Factor / Product The implementation of the automated test does to some degree allow for: Improved product quality Increased test coverage Reduced testing time Increased test reliability Increase in confidence Reusability of tests Less human effort Reduction in cost Shorter release cycles Web Product A Web Product B Interface Rich Desktop Product Agree Mean Value Agree Mean Value Agree Mean Value No. No. Yes. No. Yes. Yes. Yes. Yes. Yes. 2.7. 1.7. 4. 2.3. 3.3. 3.7. 4. 3.3. 3. No. No. Yes. Yes. Yes. Yes. Yes. Yes. Yes. 2.7. 1.7. 4. 3.3. 3.7. 3.7. 3.7. 4. 3. No. Yes. No. No. No. No. Yes. Yes. Yes. 1.7. 3. 2.3. 2. 2.3. 2.7. 3.7. 3.7. 3. 43 5 Discussion The discussion is divided into six subsections, first results and method are discussed. Afterwards, internal and external validity, reliability, ethical and societal aspects are considered. 5.1 Results In this section the results found in this thesis are analysed and discussed. The chapter is organised by the same section names as used in the result chapter. 5.1.1 Benefits of Test Automation From the literature study 8 benefits of test automation were identified, see section 2.5. Out of these 8 benefits the 2 responders in the interviews agreed with 6 of the benefits, one benefit was added from the responders (shorter release cycles). The two benefits that the responders did not agree with the literature was: Reduced Testing time and Reduction in cost. When discussing reduced testing time with the responders, it was clear that some ideas from the literature were brought up. The responders agreed on that the time for test execution could be reduced with automation, as suggested by Amannejad et al. [16]. The responders also thought that test automation can find bugs earlier in the development process, which [53] and [52] argues will reduce testing cost. Nonetheless, the responders thought that when looking at the big picture, test automation will not result in reduced testing time and cost, but rather a shift in cost and testing activities. Benefits of test automation were identified in other interviews throughout this thesis. Three benefits that were mentioned in other interviews but not in the ones about benefits of test automation were: “testing low risk areas”, “testing product areas where the company has a low level of knowledge” and “not having to perform time consuming and difficult setup when regression testing”. One reason to why there are different benefits mentioned by the responders can be due to the variety in roles that the responders had. The benefits that have been found, have been expressed by testers, developers, CI/CD engineers and management, which provide a broad set of different perspectives on the subject. An unexpected result is that 2/3 of the tests that were automated in this thesis were thought to lead to reduced testing time, and 3/3 tests were thought to lead to a reduction in cost. This 44 5.1. Results Table 5.1: Factors found in literature that are not included in decision tree. Factor Reuse Porting, i.e. can the test run on several environments? Definition Can this test or parts of it be reused in other tests? [65]. How portable is this test? [65] Does the test require many data combinations using the same Large input test steps (i.e., multiple data inputs for the same feature)? [38] Is the test very time-consuming, such as expected results Large output analysis of hundreds of outputs? [38] Does the test need to be verified on multiple software and Test Portability hardware configurations? [38] result comes from the survey that was sent out to 3 responders (see section 4.2.2). Note that these are the benefits that the responders in the interview called “Benefits of test automation” thought were not likely to come with test automation. One reason to why there is a discrepancy from the survey and the interviews, can be that different respondents was asked to participate in the survey/interviews. The discrepancy can simply come from disagreement of what benefits that test automation provides. It is also important to note that in the survey the responders were asked to identify benefits from specific automated tests, whereas in the interviews the responders were talking about a general and overall scope of test automation, which very well could influence the answers. 5.1.2 Modifications in Checklist After the inclusion process had been terminated the remaining factors were divided into two checklists, the first checklist, see appendix 7.C, contained factors that are to be considered before starting the automation process answering the question “Are we at a position were test automation is possible?”. While the second checklist, see appendix 7.D, contained factors that were to be considered when deciding whether to automate a test or not, hence answering the question “Is this test suitable for automation?”. The factors from the second checklist were used in the decision tree. The reason for choosing a decision tree is that factors can be prioritized depending on how close to the top they are put in the tree and based on which previous decision points that has been passed to reach the current factors. The factors were prioritized with the score from the interviews in mind. The majority of the tree follow the score from the interviews but in some cases exceptions were made to get a valid grouping of factors and structure in the tree. Another advantage with using a decision tree is that the tester gets a clear result and it is easy to summarize the result of the factors considered. When comparing the decision tree in this thesis to the checklists from the literature many similarities are found. Some of the factors covered in the literature that are not present in the decision tree are shown in table 5.1. The main reason for that these factors were not added to the decision tree is that most of these factors are not included in the checklist by Garousi and Mäntylä [17] and hence were not considered when modifying the checklist. It is worth to note that the responders had the possibility to add factors, if they thought it was needed. The factor “Large input”, is included in the checklist by Garousi and Mäntylä, but in the interviews the responders thought that it was not of high importance when making decisions about test automation. 45 5.1. Results 5.1.3 Usage of Decision Tree on Regression Test Cases The decision tree was used on regression test cases for 11 tests of the web related product and 1 test of the interface rich desktop product. The reason for not using the decision tree on more test cases is that it was a time consuming process to verify that the usage of the decision tree was correct. First the author of this thesis used the decision tree on a test case, later it had to be verified by testers or developers at the company. The advantage on using the tree on regression tests is that these tests are likely to be the tests that are automated first. Ramler and Wolfmaier state that automated tests best address the regression risk, meaning the risk that new defects are introduced after changes in the software [62]. In a survey it was found that 50% of the 36 industry respondents performed equal amounts of manual and automated regression testing and only 30% performed manual regression testing [21]. In the same study the authors mention that the respondents did not use any systematic approach to select test cases for regression testing, instead judgment and experience was used [21]. For this reason it can be argued that it was a good choice to evaluate the decision tree on regression tests, since there seems to be a need of a systematic approach to select test for which regression tests to run and possibly also which to automate. However, a disadvantage with using the decision tree on regression test is that many of these at Sectra Imaging IT Solutions Ltd are sanity tests, which are suitable for automation. One reason for evaluating the decision tree on the test cases was to find out why the tree would result in a recommendation to not to automate a test case. It is possible that if the tree had been used on another set of test cases it would have resulted in that more test cases should not be automated, which would give more data to analyse this situation. In the interview only 3 test cases had the result that they should not be automated, which might be a too small data set for drawing justifiable conclusions. 5.1.4 ROI First it is important to note that the ROI calculations assumes that manual tests are executed once every six months. The test cases that was included in the ROI calculations are regression tests and at Sectra Imaging IT Solutions Ltd these typically runs once per release cycle. Automated tests are commonly run much more frequent, it is reasonable to assume that the automated tests would run once per day, but this benefit is not present in the ROI calculations. If the frequency of regression testing were to be increased, a positivie ROI would had been reached in a shorter time period for the automated tests. The result of the ROI calculations, see section 4.2.1, is that the automation project would achieve a positive ROI after 1.7 years or 3.3 release cycles. This is equivalent to 3.3 test executions. This result is not far from what was achieved in Visual GUI Testing: Automating High-level Software Testing in Industrial Practice [93]. In this study the author performed automation in Visual GUI Testing in three projects, the mean ROI value for the three projects was 2.3 executions [93]. Similar results are reported from Amannejad et al. in 2014, where the authors found that when only automating test execution a positive ROI was reached after 3 test executions [16]. The individual tests have a big difference in maintenance cost in the ROI calculations, however this is not unexpected when considering the differences in the products. One of the products being tested is an interface product, which is very stable, and changes are not frequently made to this product. Whereas the other product is a web product, where the tests are performed in several areas of the GUI and changes occur much more frequently. 46 5.2. Method 5.1.5 Validation of Benefits from Test Automation The result from the survey is shown in table 4.4. None of the tests is thought to increase product quality, where quality is defined as fewer defects present in the software. In the “Benefits of test automation”-interviews the responders considered this to be a benefit of test automation. The responders thought that test automation can find smaller defects, typically found in regression testing. It is not unexpected that the automated tests were not considered to increase quality. The thought that automated tests are useful for verification rather than finding defects is present in the literature [14]. Kaner argues that bugs found from automated tests, are found when developing the tests, rather than when running the tests [94]. This was the only benefit that was not found for any of the tests. The other benefits were found in some tests but not in all. Overall less benefits were found for the tests of the Interface Rich Desktop Product than for the Web Product. This result can be explained by that testing the Interface Rich Desktop Product is easier, both manually and automatically. For instance, the reliability will not increase when testing this product automatically since there already exists a high level of trust in the manual testers performing these tests. Three benefits were found to be present in all three automated tests, these were: Less human effort, Reduction in cost and Shorter release cycles. 5.2 Method In this section the methods used in this thesis is discussed. The chapter is organised by the methods used and inside each subsection the different usages of the method is discussed. 5.2.1 Interview The interview questions, which were handed out to responders, and also the survey used in the thesis, were written in English. This could potentially have been a problem, since it was not the responder’s native language. The reason for having the questions in English was to not have to translate technical terms found in the literature. Some technical terms that are not commonly used at Sectra Imaging IT Solutions Ltd were changed to simplify for the responders, these terms were identified when having informal meetings with the supervisor where the questions and layout of the survey/interviews were discussed. The responders had a good level of English and having the questions in English did not seem to be a problem in the interviews. The interviews were held in Swedish, which was the native language of both the responders and the interviewer. All interviews were transcribed, and the transcription was sent to the responders as recommended by Runeson and Höst [75]. The texts that have been used in the report that comes from interviews was sent to the responders for their approval. Partly to make sure that the translation was correct, but also to assure that the responders approved that this text can be published in the report. There was no nonresponders to the interviews or the survey in this thesis. The organisational benefits of test automation were found from mainly one study, namely Rafi et al. [9]. There exists a risk that there are other benefits for test automation that have not been identified by Rafi et al. This risk has been mitigated by reviewing several other sources of benefits for automation, such as books and studies published after 2012 (the publishing year of the SLR by Rafi et al.). There were only two interviews held to verify the benefits from the perspective of practitioners. Also, both interview responders were selected from the same company, Sectra Imaging IT Solutions Ltd. The conclusions from these interviews is that practitioners at Sectra Imaging IT Solutions Ltd believe there primary exist 7 benefits of test automation (presented in table 4.1.), but no general conclusions for benefits from test automation can be drawn from 47 5.2. Method the interviews. Nevertheless, it is likely to assume that there will be similar results in other companies, since the benefits have been identified in several published papers. When answering the third research objective one interview that was used had a different purpose than the question. In the interview (see section 3.1.2) the checklist was used to verify that test automation was possible at the company. While the research objective was to evaluate if the checklist was applicable at the company. The answers from the interview were analysed and the conclusion that most questions were applicable could be drawn. But there still exists a possibility that the result would have changed if the purpose of the interview had been to verify the usefulness of the checklist, perhaps some questions could have been removed directly from the checklist, before conducting the second set of interviews. In the end the result would likely had been the same since more interviews with the aim to review the checklist were carried out. Furthermore, several actions were taken to evaluate the applicability of the checklist, as described in section 4.1.2. When conducting the interviews for modifying the checklist, it was clear that the questions were complex and not always easy to answer right away. For this reason, it can be argued that it was suitable to conduct interviews instead of handing out a questionnaire to the respondents. In an interview setting the respondents have the possibility to ask the interviewer what a question is aiming at and it can be discussed in the interview. Before the interviews were conducted a small review had been made with one worker at the company. However, there was no complete pilot test of the interview and when a factor could have several definitions it was decided that the interviewer and the respondent should come to an agreement of a definition in the interview. This resulted in that in the two interviews there were different definitions to a few of the factors, which led to varied prioritizing of the factors from the respondents. In some cases, this was noted in the coding of the interview and actions could be taken to prevent a factor from being left out from the checklist due to this reason. It is possible that some factors might have been dropped from the modified checklist due to differences in definitions. This does not have to be a disadvantage, if the respondents had difficulties understanding the purpose of the factor it is likely that a tester using the checklist would have the same problem. If the checklist contains ambiguous factors it might give different results when used by different individuals. The goal is that the checklist always give the same answer, independent of who is using it. One example of a factor that was dropped due to differences in definitions is factor 12. Even though the factors were discussed with the respondent and the data was analysed, it would have been better if a pilot test of the interview had been carried out. Turner III states that a pilot test is necessary to refine the interview design before conducting the actual interviews [77]. If a pilot test had been used before carrying out the real interview, ambiguous factors could have been removed or reformulated to be easier to understand. A more serious problem with these interviews were that both respondents had trouble interpreting the context question. In the questionnaire the context question “The following factor is important when deciding if the given situation favours test automation” was asked for each factor in the checklist. But the factors in the checklist contained statements as well and for the respondents it was difficult to focus on which statement that was discussed. One respondent openly stated that the it was hard to follow the interview design. The other respondent selected a Likert scale point to a factor in the questionnaire that showed an opposite opinion than the one what he had stated when discussing the factor. The interviewer asked this respondent if he was sure his answer was correct, which it was not, and the answer could be changed. This problem could easily have been prevented with a pilot test. If the context question had been better formulated, the interviews would have been easier to participate in and would had taken less time to conduct. 48 5.2. Method 5.2.2 Survey In this thesis one survey was used to validate the decision tree. The aim was to investigate if the automated test implemented with the aid of the decision tree could provide organisational benefits for the company. The survey that was used can be found in appendix 7.F. The survey was sent out to three responders, as described in section 3.2.2. Two of the responders thought that it was difficult to evaluate whether tests from other product areas than their own could provide benefits. This was to some extent expected, since it can be troublesome to read the implementation code. The mitigation was that the survey conductor explicitly stated that the responders could contact the conductor for asking for guidance. One of the responders used this. This could reduce the accuracy of the survey result, but it was not possible to conduct the survey with responders that have experience in both of the product areas. It would have been better to get actual data to measure the benefits. For example increased test coverage and reduced testing time are benefits that can be measured, but that data could not be collected at the moment of the thesis. Also, at this stage, it is complicated to measure if the tests results in an increase of confidence or shorter release cycles, these benefits, if they exist, are more likely to be found after more tests have been implemented and used for a longer time period. 5.2.3 Automated Tests Implementation The automated tests for the interface rich desktop product were considerably easier to implement than the ones for the web product. This was expected before starting the implementation and was much due to properties of the products. In the web product there was a large set of possible configurations and the tests that were automated needed to be configured in many different ways. For a developer that is new to this product it can be troublesome to understand all configurations and the steps needed to perform them are not always clear. Furthermore, the tests in the web product interact with the GUI and it was difficult to understand and get an overview of all the functionality needed for the tests. These factors, a considerable number of configurations and features, resulted in that the developer had to ask for help many times throughout the implementation phase. Whereas for the interface rich desktop product, the functionality was clearly defined and the tests did not need any special configurations. For this product it was obvious what the tests should achieve and how. The implementation of these tests was uncomplicated and straightforward. 5.2.4 ROI A factor that needs to be considered for the ROI calculations is the maturation effects [91]. As experience was gained throughout the project it is likely to assume that the first test that was implemented, web product A, took longer time to implement than the other test due to a learning curve in the beginning, this is also seen in the data. However, an attempt was made to mitigate this risk and parts of the learning time was logged and could be shared among the test cases. The resulting time difference, due to learning curve or not, is not that big and will not affect the ROI in any significant size. The risk could be avoided if each test had been implemented by different developers, but then more test would need to be implemented to assure that the implementation-time is not dependent on individual experience and skills. A related factor is that the developer that implemented the test is a junior developer, which might result in a higher cost of implementation than if the tests had been implemented by a senior developer. 49 5.3. Internal Validity 5.3 Internal Validity The data used for the return on investment calculations are to a large extent based on estimations from developers and testers at Sectra Imaging IT Solutions Ltd. The validity concern lies in that the data can be estimated with a hidden agenda, it is possible that the estimations were made to influence the ROI of automation in a positive or negative manner. To minimize this risk, existing data was used from issue tracking and project management tools, when possible. Several estimations were collected from different sources and an average was used when calculating the ROI. Hidden agendas might also be found in the interview and survey answers. All interviews and the survey was conducted with at least two responders, but due to time constraints only a few responders were interviewed. The interviews and surveys were created and analysed with published literature in mind, which can increase the reliability in the results. As mentioned in section 5.2.4, maturation effects can have influenced the result of the return of investment from test automation. It can also be argued that maturation effects plays a role in the answers and estimations of the responders, for instance, the effort of manual testing a test case will be estimated lower by an experienced tester than when estimated by a junior tester. 5.4 External Validity As for external validity, the data collected for modifying, using and evaluating the decision tree comes solely from Sectra Imaging IT Solutions Ltd. Practitioners from other companies have not been consulted to verify if this tool can be useful in other settings. For this reason, there is no data to support that the tool can be useful for other practitioners than Sectra Imaging IT Solutions Ltd. Reasonably the decision tree can be used in other companies, since it builds on general research and the factors included are not specific for Sectra Imaging IT Solutions Ltd. 5.5 Reliability The tools used for collecting data in this thesis are shown in a transparent way as possible, with the integrity of participants in mind. Interview and survey questions can be found as appendices. The data used for calculating return on investment is shown in table 4.3. Parts of the method are complicated to replicate, such as the implementation of the automated tests, since only a limited amount information about the products and the specific tests is published due to reasons of confidentiality. 5.6 Ethical and Societal Aspects Runeson and Höst brings up a few key factors for ethical considerations, these include: informed consent, handling of sensitive results, feedback and confidentiality [75]. These factors have been considered with all responders and stakeholders for this thesis. Transcriptions of interviews and excerpts from the text used in the report from interviews have been sent to all responders asking for their consent. Responders and also products that are mentioned in the report have been anonymised to ensure integrity of responders. The responders have been informed about these actions and the purpose of the interview, before deciding to participate in interviews. Manual testers might have concerns about test automation. Some may wonder if test automation is likely to replace their work. This does not seem to be reasonable in a near future, most testers agree on that both manual and automated testing are necessary parts of software testing [9]–[14]. Some tests are not suitable for automation, such as usability testing. However, 50 5.6. Ethical and Societal Aspects test automation will without doubt change the work for testers. One of the benefits that exist for test automation is “Less human effort” (see section 2.5.1). In the interviews (see section 4.1.1) about the benefits from test automation it was stated that test automation will lead to a shift in activity for manual testers. The responders thought that manual testers will have more time for qualitative testing instead of so called must-do-regression, this opinion is also stated by Berner, Weber, and Keller [14]. Moreover, the responders stated that test automation will be implemented with the help from developers, which will result in changes in work for both developers and testers. Test automation will seemingly change the work of manual testers for the better, allowing for more creative tests and qualitative testing. 51 6 Conclusion The aim of this thesis has been to provide a method for selecting test cases for automation. The research question has been, whether the checklist provided by Garousi and Mäntylä [17] can be modified in such a way that it can be used to select test cases for test automation that result in economic and organisational benefits. The checklist was modified into a decision tree and the result from the evaluation suggests that it can result in economic and organisational benefits. Three test cases were selected by using the decision tree and the return on investment of automating these test cases shows that economic benefits are found after 0.5 to 4 years. Three organisational benefits were found to be related to this automation, these being: less human effort when testing, reduction in cost and allowing for shorter release cycles. These results have been shown to be present at one company, Sectra Imaging IT Solutions Ltd. Whether the result is replicable at other companies can not be concluded from this thesis, but it is reasonable to assume that similar results can be found in other settings. To aid the research process four research objectives were defined, the first one being, what do practitioners believe are the common benefits software producing companies relate to test automation? Practitioners of Sectra Imaging IT Solutions Ltd thought that test automation can lead to seven benefits: Improved product quality, increased test coverage, increased test reliability, increase in confidence, reusability of tests, less human effort and shorter release cycles. The second research objective was, how can economic benefits be measured for test automation? The answer to this research objective was found in the literature study and it was concluded that the most common way to measure economic benefits of test automation is with return of investment formulas. In this thesis a formula by Hoffman [18] was used. The third research objective was, whether the checklist provided by Garousi and Mäntylä [17] was applicable in an industrial setting to achieve test automation. The checklist was considered to be applicable at Sectra Imaging IT Solutions Ltd. Practitioners at Sectra Imaging IT Solutions Ltd. considered that 65% of the questions in the checklist were important when making decisions related to test automation. The fourth and last research objective was, what modifications to the checklist provided by Garousi and Mäntylä [17] are required to make it applicable for practitioners. The checklist was modified 52 in two steps. The first step was to identify which factors were necessary to consider when making decisions related to test automation. From this step, 15 factors were removed from the original checklist and two new factors were added. In the second step the remaining factors were grouped and sorted into a decision tree, the reason being that this prioritisation and organisation will allow for easier use and give the users of the decision tree a definitive answer to whether a test case should be automated or not. More research is needed in the area of test case selection for automation, especially in methods that are simple to implement in test automation strategies. This thesis shows that practitioners can achieve economic and organisational benefits with checklist-based methods in test case selection for automation. 53 Bibliography [1] F. Khomh, T. Dhaliwal, Y. Zou, and B. Adams, “Do faster releases improve software quality? an empirical case study of mozilla firefox”, Proceedings for 9th IEEE Working Conference on Mining Software Repositories (MSR), Zurich, Switzerland, Jun. 2012, pp. 179–188. [2] M. Mäntylä, F. Khomh, B. Adams, E. Engström, and K. Petersen, “On rapid releases and software testing”, In 29th International Conference Software Maintenance (ICSM), IEEE, Sep. 2013. [3] M. Mäntylä, B. Adams, F. Khomh, E. Engström, and K. Petersen, “On rapid releases and software testing: A case study and a semi-systematic literature review”, Empirical Software Engineering, vol. 20, no. 5, pp. 1384–1425, Oct. 2015. [4] A. Porter, C. Yilmaz, A. Memon, A. Krishna, D. Schmidt, and A. Gokhale, “Techniques and processes for improving the quality and performance of open-source software”, Software Process: Improvement and Practice banner, vol. 11, no. 2, pp. 163–176, 2006. [5] A. A. Sawant, P. H. Bari, and P. M. Chawan, “Software testing techniques and strategies”, International Journal of Engineering Research and Applications, vol. 54, no. 3, pp. 980– 986, May 2012. [6] R. Charette, “Why software fails”, IEEE Spectrum, vol. 42, no. 9, pp. 42–49, 2005. [7] S. Dalal and R. S. Chhillar, “Software testing-three p’s paradigm and limitations”, International Journal of Computer Applications, vol. 54, no. 12, pp. 49–54, Sep. 2012. [8] D. Kumar and K. Mishra, “The impacts of test automation on software’s cost, quality and time to market”, Proceedings of the 7th International Conference on Communication, Computing and Virtualization (ICCCV), vol. 79, 2016, pp. 8–15. [9] D. Rafi, K. Moses, K. Petersen, and M. Mäntylä, “Benefits and limitations of automated software testing: Systematic literature review and practitioner survey”, Proceedings of the 7th International Workshop on Automation of Software Test, Jun. 2012, pp. 36–42. [10] Test automation is still testing, but don’t go at it alone, Nov. 2018 (accessed November 23, 2018). [Online]. Available: https://blog.testproject.io/2018/11/13/testautomation-is-still-testing/. [11] J. Kasurinen, O. Taipale, and K. Smolander, “Software test automation in practice: Empirical observations”, Advances in Software Engineering, vol. 2010, p. 18, Nov. 2009, Article ID: 620836. 54 Bibliography [12] J. Bach, “Test automation snake oil”, Proceedings for the 14th International Conference and Exposotition on Testing Computer Software (TCS’99), 1999. [13] B. Pettichord, “Seven steps to test automation success”, Proceedings of STAR West Software Testing Conference, Nov. 1999. [14] S. Berner, R. Weber, and R. Keller, “Observations and lessons learned from automated testing”, Proceedings of the 27th International Conference on Software Engineering (ICSE), May 2005, pp. 571–579. [15] Accelerating time to market through next-gen test automation, Apr. 2018 (accessed November 23, 2018). [Online]. Available: https : / / www . cigniti . com / blog / accelerating - time - to - market - through - next - generation - test automation/. [16] Y. Amannejad, V. Garousi, R. Irving, and Z. Sahaf, “A search-based approach for costeffective software test automation decision support and an industrial case study”, IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops, vol. 3, 2014, pp. 302–311. [17] V. Garousi and M. V. Mäntylä, “When and what to automate in software testing? a multi-vocal literature review”, Information and Software Technology, vol. 76, pp. 92–117, Aug. 2016. [18] D. Hoffman, Cost benefits analysis of test automation, STAR West October 1999, 1999. [19] Retriever business - a business database for swedish companies, (accessed February 24, 2019). [Online]. Available: https://www.retriever-info.com/?e=3. [20] Sectra’s history - the road to world-leading products, (accessed February 24, 2019). [Online]. Available: https://www.sectra.com/investor/about/history.html. [21] E. Engstöm and P. Runeson, “A qualitative survey of regression testing practices”, Jun. 2010, pp. 3–16. [22] E. Engstöm, P. Runeson, and M. Skoglund, “A systematic review on regression test selection techniques”, Information and Software Technology, vol. 52, no. 1, pp. 14–30, Jan. 2010. [23] P. Runeson, “A survey of unit testing practices”, IEEE Software, vol. 23, no. 4, pp. 22–29, Jun. 2006. [24] P. Ammann and J. Offutt, Introduction to Software Testing. New York: Cambridge University Press, 2008. [25] S. Quadri and S. Farooq, “Software testing - goals, principles, and limitations”, International Journal of Computer Applications, vol. 6, no. 9, 2010. [26] B. Beizer, Software Testing Techniques, 2nd ed. United States of America: International Thomson Computer Press, 1990, ISBN: 1850328803. [27] P. Ammann and J. Offutt, Introduction to Software testing. New York: Cambridge University Press, 2008, ISBN: 978-0-521-88038-1. [28] D. G. E. van Veenendaal, I. Evans, and R. Black, Foundations of Software Testing: ISTQB Certification. London: Cengage Learning EMEA, 2012, ISBN: 978-1-408-04405-6. [29] M. Fewster and D. Graham, Software Test Automation: Effective use of test execution tools. New York: Addison-Wesley, 1999, ISBN: 0-201-33140-3. [30] P. Rook, “Controlling software projects”, IEEE Software Engineering Journal, vol. 1, no. 1, pp. 7–16, Jan. 1986. [31] M. Kumar, S. Singh, and R. Dwivedi, “A comparative study of black box testing and white box testing techniques”, International Journal of Advance Research in Computer Science and Management Studies, vol. 3, no. 10, pp. 32–44, Oct. 2015. 55 Bibliography [32] C. Kaner, “Cem kaner on scenario testing: The power of ’what-if...’ and nine ways to fuel your imagination”, Better Software, vol. 5, no. 5, pp. 16–22, Oct. 2003. [33] M. E. Khan, “Different forms of software testing techniques for finding errors”, International Journal of Computer Science Issues, vol. 7, no. 3, pp. 11–16, May 2010. [34] L. Copeland, A Practitioner’s Guide to Software Test Design. London: Artech House, 2003. [35] P. C. Jorgensen, Software Testing A Craftsman’s Approach, fourth. Auerbach Publications, 2014. [36] J. Itkonen, M. V. Mäntylä, and C. Lassenius, “How do testers do it? an exploratory study on manual testing practices”, 3rd International Symposium on Empirical Software Engineering and Measurement, pp. 494–497, Oct. 2009. [37] J. Bach, Exploratory testing explained, v.1.3, 2003 (accessed September 12, 2018). [Online]. Available: http://www.satisfice.com/articles/et-article.pdf. [38] E. Dustin, T. Garrett, and B. Gauf, Implementing Automated Software Testing. Massachusetts: Addison-Wesley, 2009. [39] A software engineer in test must have the heart of a developer, Nov. 2018 (accessed November 23, 2018). [Online]. Available: https://blog.testproject.io/2018/11/06/ the-software-engineer-in-test/. [40] Key guidelines to continuous integration and jenkins ci server, May 2017 (accessed November 23, 2018). [Online]. Available: https://blog.testproject.io/2017/05/11/ jenkins-ci/. [41] M. Malekzadeh and R. Ainon, “An automatic test case generator for testing safetycritical software systems”, Proceedings of the 2nd International Conference on Computer and Automation Engineering (ICCAE), Feb. 2010. [42] F. Saglietti and F. Pinte, “Automated unit and integration testing for component-based software systems”, Proceedings of the International Workshop on Security and Dependability for Resource Constrained Embedded Systems, 2010. [43] R. Tan and S. Edwards, “Evaluating automated unit testing in sulu”, Proceedings of the International Conference on Software Testing, Verification, and Validation, 2008. [44] M. Alshraideh, “A complete automation of unit testing for javascript programs”, Journal of computer Science, vol. 4, no. 12, pp. 1012–1019, 2008. [45] J. Burnim and K. Sen, “Heuristics for scalable dynamic test generation”, Proceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering, Sep. 2008, pp. 443–446. [46] M. Geetha Devasena, G. Gopu, and M. Valarmathi, “Automated and optimized software test suite generation technique for structural testing”, International Journal of Software Engineering, vol. 26, no. 1, pp. 1–13, 2016. [47] L. Nagowah and K. Kora-Ramiah, “Automated complete test case coverage for web based applications”, Proceedings of the International Conference on Infocom Technologies and Unmanned Systems (ICTUS), 2017. [48] D. Banerjee and K. Yu, “Robotic arm-based face recognition software test automation”, IEEE Access, vol. 6, pp. 37 858–37 868, Jul. 2018. [49] D. Gafurov, A. Hurum, and M. Markman, “Achieving test automation with testers without coding skills: An industrial report”, Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018, pp. 749–756. [50] L. du Bousquet and N. Zuanon, “An overview of lutess a specification-based tool for testing synchronous software”, Proceedings of the 14th IEEE International Conference on Automated Software Engineering, Oct. 1999. 56 Bibliography [51] T. Wissink and C. Amaro, “Successful test automation for software maintenance”, Proceedings of the 22nd IEEE International conference on Software Maintenance (ICSM’06), 2006. [52] B. Haugset and G. Hanssen, “Automated acceptance testing: A literature review and an industrial case study”, Proceedings of the Agile Conference, Aug. 2008. [53] S. Stresnjak and Z. Hocenski, “Usage of robot framework in automation of functional test regression”, Proceedings of the 6th International Conference on Software Engineering Advances (ICSEA), Oct. 2011. [54] B. Obele and D. Kim, “On an embedded software design architecture for improving the testability of in-vehicle multimedia software”, Proceedings of the IEEE International Conference on Software Testing, Verification, and Validation Workshops, 2014, pp. 349– 352. [55] L. Shan and H. Zhu, “Generating structurally complex test cases by data mutation: A case study of testing an automated modelling tool”, The Computer Journal, vol. 52, no. 5, pp. 571–588, Aug. 2009. [56] J. Al Dallal, “Automation of object-oriented framework application testing”, Proceedings of the 5th IEEE GCC Conference & Exhibition, Mar. 2009. [57] D. Flenström, P. Potena, D. Sundmark, W. Afzal, and M. Bohlin, “Similarity-based prioritization of test case automation”, Software Quality Journal, vol. 26, no. 4, pp. 1421– 1449, Dec. 2018. [58] C. Liu, “Platform-independent and tool-neutral test descriptions for automated software testing”, Proceedings of the 2000 International Conference on Software Engineering (ICSE), Jun. 2000. [59] M. Bashir and S. Banuri, “Automated model based software test data generation system”, Proceedings of the 4th International Conference on Emerging Technologies, Oct. 2008. [60] C. Persson and N. Yilmaztürk, “Establishment of automated regression testing at abb: Industrial experience report on ‘avoiding the pitfalls’”, Proceedings of the 19th International Conference on Automated Software Engineering (ASE’04), Oct. 2004. [61] M. Fecko and C. Lott, “Lessons learned from automating tests for an operations support system”, Software: Practice and Experience, vol. 32, no. 15, pp. 1485–1506, 2002. [62] R. Ramler and K. Wolfmaier, “Economic perspectives in test automation: Balancing automated and manual testing with opportunity cost”, Proceedings of the 2006 International Workshop on Automation of Software Test, vol. 3, Jan. 2006, pp. 85–91. [63] Google scholar citations economic perspectives in test automation: Balancing automated and manual testing with opportunity cost by r. ramler and k. wolfmaier 2006, (accessed November 28, 2018). [Online]. Available: https://scholar.google.at/scholar?oi=bibs& hl=en&cites=3114206012469503695&as_sdt=5. [64] Z. Sahraf, V. Garousi, D. Pfahl, R. Irving, and Y. Amannejad, “When to automate software testing? decision support based on system dynamics: An industrial case study”, Journal of Software: Evolution and Process, vol. 28, no. 4, pp. 272–285, Apr. 2016. [65] J. Oliveira, C. Gouveia, and R. Filho, “A way of improving test automation costeffectiveness”, Proceedings for the 1st Annual Conference of the Association for Software Testing (CAST) 2006, Indianápolis, EUA, 2006. [66] R. Assad, T. Katter, F. Ferraz, L. Ferreira, and S. Lemos Meira, “Security quality assurance on web-based application through security requirements tests: Elaboration, execution and automation”, Aug. 2010, fifth international conference on software engineering advances (ICSEA) August 2010. 57 Bibliography [67] S. Kadry, “A new proposed technique to improve software regression testing cost”, International Journal of Security and its Applications, vol. 5, no. 3, Nov. 2011. [68] D. Graham and M. Fewster, Experiences of test automation: Case studies of software test automation. Crawfordsville, Indiana: Addison-Wesley, 2012, ISBN: 0-321-75406-9. [69] S. Münch, P. Brandstetter, K. Clevermann, O. Kieckhoefel, and E. Schäfer, “The return on investment (ROI) of test automation”, Pharmaceutical Engineering, vol. 32, 2012. [70] B. Marick, “When should a test be automated”, 1999 (accessed November 5, 2018). [Online]. Available: https : / / www . stickyminds . com / sites / default / files / article/file/2014/When%20Should%20a%20Test%20Be%20Automated.pdf. [71] P. Grossman, Automated testing ROI: Fact or fiction? a customer’s perspective: What real QA organizations have found, White paper, 2009. [72] D. Graham, ROI of test automation: Benefit and cost, Professionaltester.com, November 2010, 2010. [73] C. Schwaber and M. Gilpin, Evaluating automated functional testing tools, Forrester Research, February 2005, 2005. [74] C. Robson and K. McCartan, Real World Research: A resource for Users of Social Research Methods in Applied Settings, 4th ed. United Kingdom: John Wiley and Sons Ltd, 2016, ISBN : 9781118745236. [75] P. Runeson and M. Höst, “Guidelines for conducting and reporting case study research in software engineering”, Empirical Software Engineering, vol. 14, no. 2, pp. 131–164, Dec. 2008. [76] M. Saunders, P. Leiws, and A. Thornhill, Research methods for business students, Fifth. Italy: Pearson Education, 2009. [77] D. W. Turner III, “Qualitative interview design: A practical guide for novice investigators”, The Qualitative Report, vol. 15, no. 3, pp. 754–760, 2010. [78] H. S. Kramer and F. A. Drews, “Checking the lists: A systematic review of electronic checklist use in health care”, Journal of Biomedical Informatics, vol. 71, pp. 6–12, 2017. [79] D. L. Stufflebeam, Guidelines for developing evaluation checklists: The checklists development checklist (cdc), 2000 (accessed October 4, 2018). [Online]. Available: https://wmich. edu/sites/default/files/attachments/u350/2014/guidelines_cdc. pdf. [80] D. H. Goh, A. Chua, E. Khoo, E. Mak, and M. Ng, “A checklist for evaluating open source digital library software”, Online Information Review, vol. 30, no. 4, pp. 360–379, Jul. 2006. [81] B. M. Gillespie, E. Harbeck, J. Lavin, T. Gardiner, T. K. Wither, and A. P. Marshall, “Using normalisation process theory to evaluate the implementation of a complex intervention to embed the surgical safety checklist”, BMC Health Services Research, vol. 18, no. 170, 2018. [82] W. Martz, “Validating an evaluation checklist using a mixed method design”, Evaluation and Program Planning, vol. 333, pp. 215–222, 2010. [83] S. M. Linares and A. C. D. Romero, “Developing a multidimensional checklist for evaluating language-learning websites coherent with the communicative approach: A path for the knowing-how-to-do enhancement”, Interdisciplinary Journal of e-Skills and Lifelong Learning, vol. 12, pp. 57–93, 2016. [84] N. Aggarwak, N. Dhaliwal, and B. Joshi, “To evaluate the use of surgical safety checklist in a tertiary referral obstetrics center of northern india”, Obstetrics and Gynecology International Journal, vol. 9, no. 2, pp. 133–136, 2018. 58 Bibliography [85] M. Usman, K. Petersen, J. Börstler, and P. Neto, “Developing and using checklists to improve software effort estimation: A multi-case study”, Journal of Systems and Software, vol. 146, pp. 286–309, Dec. 2018. [86] H. Boone Jr. and D. Boone, “Analyzing likert data”, Journal of Extension, vol. 50, no. 2, Apr. 2012. [87] Q. Li, “A novel likert scale based on fuzzy sets theory”, Expert Systems with Applications, vol. 40, no. 5, pp. 1609–1618, Apr. 2013. [88] R. Cummins and E. Gullone, “Why we should not use 5-point likert scales: The case for subjective quality of life measurement”, Proceedings of the second International Conference on Quality of Life in Cities, 2000, pp. 74–93. [89] K. Kelley, B. Clark, V. Brown, and J. Sitzia, “Good practice in the conduct and reporting of survey research”, International Journal for Quality in Health Care, vol. 15, no. 3, pp. 261– 266, 2003. [90] D. Dillman, Mail and Internet Surveys: The tailored Design Method, 2nd ed. Hoboken, New Jersey: John Wiley and Sons Inc., 2007, ISBN: 9780470038567. [91] B. Kitchenham, S. Plfeeger, L. Pickard, P. Jones, D. Hoaglin, K. El Emam, and J. Rosenberg, “Preliminary guidelines for empirical research in software engineering”, IEEE Transactions on Software Engineering, vol. 28, no. 8, pp. 721–734, 2002. [92] J. Humble and D. Farley, Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation. 1st ed. Crawfordsville, Indiana: Person Education Inc., 2010, ISBN: 0-321-60191-2. [93] E. Alégroth, “Visual gui testing: Automating high-level software testing in industrial practice”, Ph.D. Dissertation. Chalmers University of Technology, Sweden., 2015. [94] C. Kaner, “Improving the maintainability of automated test suites”, Proceedings of the 10th International Conference Software Quality Week 1997, 1997. 59 7 7.A Appendices Interview Benefits from Test Automation The questions that was used in the interview for identifying which benefits that industry representatives want to achieve with test automation. See section 3.1.1 for the method used in the interviews and section 4.1.1 for the result. 60 Interview: What changes does test automation come with? Name:________________________________ Role:_________________________________ Experience in role (years):________________ Consider the following questions, motivate your answer on why/how test automation affect the factor discussed. 1. Can test automation result in a change of product quality? Quality is defined as a low defect level in the product. 2. Will test automation result in changes in test coverage? 3. Will test automation result in less/more testing time? 4. Does automated testing affect the reliability of the testing? 5. Can test automation result in a change of product confidence? That is, will eg. Developers feel more/less confident in the product quality with test automation. 6. Does test automation affect the reusability of testing? 7. Will test automation make a difference in the human effort of testing? 8. Does test automation affect the costs of testing? 9. Can you identify any other benefits from test automation than the previous mentioned? 10. What benefits do you aim to achieve with test automation? 7.B. Checklist Survey 7.B Checklist Survey 62 Survey: Evaluate checklist for deciding what to automate Name: Team: Role: Experience (years) in role: Product area: Date: ____________________________ ____________________________ ____________________________ ____________________________ ____________________________ ____________________________ Questions 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. The following factor is important when deciding if the given situation favors test automation. Category: SUT-related factors Area: Maturity of SUT SUT or the targeted components will experience major modifications in the future. The interface through which the tests are conducted is unlikely to change. Area: Other SUT aspects SUT is an application with a long life cycle. SUT is a generic system, i.e. not tailor made or heavily customized system. SUT is tightly integrated into other products, i.e. not independent. SUT is complex. SUT is mission critical. Category: Test-related factors Area: Need for regression testing Frequent regression testing is beneficial or essential. Area: Test type Tests are performance and load tests. Tests are smoke and build verification tests. Disagree Strongly Disagree Slightly Agree Slightly Agree Strongly Do not know Questions 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. The following factor is important when deciding if the given situation favors test automation. Tests are Unit tests. There are large number of test that are similar to each other. Tests require large amounts of data. Humans are likely to make errors when performing and evaluating these tests, e.g. tests require vigilance in execution. Computers are likely to make errors when performing and evaluating these tests, e.g. test execution is not deterministic. Area: Test reuse/repeatability Tests can be reused part of other tests. Tests need to be run in several hardware and software environments and configurations. The lifetime of the tests is high. The number of builds is high. Area: Test importance Tests are likely to reveal defects, i.e. high risk areas. Tests cover the most important features, i.e. high importance areas. Area: Test oracle Test results are deterministic. Test results require human judgement. Automated comparison will be fragile leading to many false positives. Area: Test stability Tests are instable, e.g., due to timing. We must perform the test repeatedly and if it passes above a threshold we consider that the test passes. Disagree Strongly Disagree Slightly Agree Slightly Agree Strongly Do not know Questions 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. The following factor is important when deciding if the given situation favors test automation. Tests are instable, e.g., due to timing. The results cannot be trusted at all. Category: Test-tool-related factors Area: Automation (test) tool We have experimented with the test automation tool we plan to use and the results are positive. A suitable test tool is available that fits our purpose. We have decided on which tool to use. We can afford the costs of the tool. Category: Human and organizational factors Area: Skills level of testers Our test engineers have adequate skills for test automation. We can afford to train our test engineers for test automation. We have expertise in the test automation approach and tool we have chosen. Area: Other hum. and org. factors We are currently under a tight schedule and or budget pressure. We have organizational and top management support for test automation. There is a large change resistance against software test automation. We have the ability to influence or control the changes to SUT. Category: Cross-cutting and other factors Area Economic factors There are economic benefits of test automation. Disagree Strongly Disagree Slightly Agree Slightly Agree Strongly Do not know Questions 39. 40. 41. 42. 43. Disagree Strongly 45. 46. 47. 48. Agree Slightly Agree Strongly Do not know The following factor is important when deciding if the given situation favors test automation. Area: Automatability of testing Tests are easy and straight forward to automate. Test results are ease to analyze automatically. Test automation will require a lot of maintenance effort. Area: Development process Our software development process requires test automation to function efficiently, for example agile methods. We make several releases of our products. Other factors that you think are important when deciding if the given situation favors test automation. Factor 44. Disagree Slightly Importance Based on the questions above name a few areas in your product that could benefit from automation and relate them to one of the questions. Product area 1. 2. 3. 4. 5. 6. Q. No (s). Short motivation 7.C. Checklist 1 7.C Checklist 1 The questions number relate to the When and What to automate checklist, to see the questions with their number go to table 4.2. 68 27. 28. 30. 32. 35. 37. Questions Consider these questions before the automation process is started We have experimented with the test automation tool we plan to use and the results are positive. A suitable test tool is available that fits our purpose. We can afford the costs of the tool. We can afford to train our test engineers for test automation. We have organizational and top management support for test automation. We have the ability to influence or control the changes to SUT. 7.D. Checklist 2 7.D Checklist 2 The questions number relate to the When and What to automate checklist, to see the questions with their number go to table 4.2. 70 3. 7. 8. 9. 10. 14. 15. 18. 19. 20. 21. 22. 23. 24. 25. 26. 38. 39. 40. 41. 43. Questions Consider these questions when deciding whether to automate a test SUT is an application with a long life cycle. SUT is mission critical. Frequent regression testing is beneficial or essential. Tests are performance and load tests. Tests are smoke and build verification tests. Humans are likely to make errors when performing and evaluating these tests, e.g. tests require vigilance in execution. Computers are likely to make errors when performing and evaluating these tests, e.g. test execution is not deterministic. The lifetime of the tests is high. The number of builds is high. Tests are likely to reveal defects, i.e. high risk areas. Tests cover the most important features, i.e. high importance areas. Test results are deterministic. Test results require human judgement. Automated comparison will be fragile leading to many false positives. Tests are instable, e.g., due to timing. We must perform the test repeatedly and if it passes above a threshold we consider that the test passes. Tests are instable, e.g., due to timing. The results cannot be trusted at all. There are economic benefits of test automation. Tests are easy and straight forward to automate. Test results are ease to analyze automatically. Test automation will require a lot of maintenance effort. We make several releases of our products. 7.E. Decision Tree 7.E Decision Tree The numbers next to the factors show the mean value of the score for the factor from the interviews, 4 was the highest mean value possible and 3 was the lowest mean value a factor could have to be included into the checklist. 72 Decision Tree: Which Test to Automate? Decision Point: F1 Factor 1.1 Test results are deterministic. Factor 1.2 Test results does not require human judgement. Factor 1.3 Automated comparison will not be fragile leading to many false positives. Factor 1.4 Tests are not instable, e.g., due to timing. Instable meaning: we must perform the test repeatedly and if it passes above a threshold we consider that the test passes. Factor 1.5 Factor 1.6 Tests are not instable, e.g., due to timing. Instable meaning: the results cannot be trusted at all. Computers are not likely to make errors when performing and evaluating these tests, e.g. test execution is not deterministic. Factor 1.7 Test results are easy to analyze automatically Factor 2.1 There are economic benefits of automating these tests. Factor 2.2 Tests are likely to reveal defects, i.e. high risk areas. Factor 2.3 The product being tested is mission critical. Factor 2.4 Tests cover the most important features, i.e. high importance areas. Factor 2.5 Frequent regression testing is beneficial or essential for this product. Factor 3.1 Tests are easy and straight forward to automate. Factor 3.2 Test automation will not require a lot of maintenance effort. Decision Point: F2 Decision Point: F3 Decision Point: F4 Factor 4.1 Humans are likely to make errors when performing and evaluating these tests, e.g. tests require vigilance in execution. Factor 5.1 Developers have low knowledge in the product being tested, i.e. product has not been developed on for a long period of time. Decision Point: F5 Decision Point: F6 Factor 6.1 Agree/Disagree Test type is favorable for automation, i.e. tests are performance or load tests. Agree/Disagree Decision Point: F7 Agree/Disagree Agree/Disagree TC Information TC ID: Factor 7.3 The product being tested is highly customizable, i.e. have much configurations. The lifetime of the tests is high. TC Title: Factor 7.4 Tests are performed on a product with a long life cycle. Factor 7.5 The number of builds for this product is high. Factor 8.1 Test type is favorable for automation, i.e. tests are smoke or build verification tests. Factor 7.2 Agree/Disagree Agree/Disagree We make several releases of the product. Factor 7.1 Decision Point: F8 Agree/Disagree Steps: Date: Result Agree/Disagree Performed by: Result: Comments 7.F. Benefits from Automation Survey 7.F Benefits from Automation Survey The survey that was sent out to responders for evaluating the organisational benefits of the automated tests implemented in this thesis (see section 3.2.2). 74 Survey: Evaluate benefits from test automation Name: Team: Role: Date: ____________________________ ____________________________ ____________________________ ____________________________ First take a look at the benefits that have been found from test automation1. 1. Improved product quality Quality in terms of fewer defects present in the software product. 2. Increased test coverage High coverage of code (e.g. statement, branch, path) is achieved through automation. 3. Reduced testing time Time required for testing, i.e. the ability to run more tests within a timeframe. 4. Increased test reliability Automated Software Testing is more reliable when repeating tests as variance in outcomes can be due to the manual tester running the tests in a different way, but can not make use of the knowledge of the tester. 5. Increase in confidence Increase of confidence in the quality of the system (e.g. as perceived by developers). 6. Reusability of tests When tests are designed with maintenance in mind they can be repeated frequently, a high degree of repetition of test cases leads to benefits, not a single execution of an automated test case. 7. Less human effort Automation reduces human effort that can be used for other activities (in particular ones that lead to defect prevention). 8. Reduction in cost With a high degree of automation cost are saved. 9. Shorter release cycles Test automation is a prerequisite for continuous integration and will allow for shorter release cycles. 1 As defined in D. Rafi, K. Moses, K. Petersen, and M. Mäntylä, “Benefits and limitations of automated software testing: Systematic literature review and practitioner survey”, Proceedings of the 7th International Workshop on Automation of Software Test, Jun. 2012, pp. 36–42. 1 First consider the test: Web Product A Please see attached TC and code implementation. Questions 1. 2. 3. 4. 5. 6. 7. 8. 9. Disagree Strongly Disagree Slightly Agree Slightly Agree Strongly Do not know Disagree Slightly Agree Slightly Agree Strongly Do not know The implementation of the automated test does to some degree allow for: Improved product quality Increased test coverage Reduced testing time Increased test reliability Increase in confidence Reusability of tests Less human effort Reduction in cost Shorter release cycles Now consider the test: Web Product B Please see attached TC and code implementation. Questions 1. 2. 3. 4. 5. 6. 7. 8. 9. 2 The implementation of the automated test does to some degree allow for: Improved product quality Increased test coverage Reduced testing time Increased test reliability Increase in confidence Reusability of tests Less human effort Reduction in cost Shorter release cycles Disagree Strongly Now consider the test: Interface Rich Desktop Product Please see attached code implementation, no TC exists for this test. Questions 1. 2. 3. 4. 5. 6. 7. 8. 9. 3 The implementation of the automated test does to some degree allow for: Improved product quality Increased test coverage Reduced testing time Increased test reliability Increase in confidence Reusability of tests Less human effort Reduction in cost Shorter release cycles Disagree Strongly Disagree Slightly Agree Slightly Agree Strongly Do not know