A FIREWALL MODEL FOR TESTING USER-CONFIGURABLE SOFTWARE SYSTEMS By BRIAN P. ROBINSON Submitted in partial fulfillment of the requirements For the degree of Doctor of Philosophy Dissertation Advisor: Dr. Lee J. White Department of Electrical Engineering and Computer Science CASE WESTERN RESERVE UNIVERISTY May, 2008 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES We hereby approve the thesis/dissertation of Brian P. Robinson___________________________________ candidate for the Ph.D. degree *. (signed) Lee J. White______________________________ (chair of the committee) Andy Podgurski____________________________ Vincenzo Liberatore________________________ Ken Loparo_______________________________ (date) March 6th, 2008 *We also certify that written approval has been obtained for any proprietary material contained therein. Acknowledgements This work is dedicated to my wife Heidi, for all the love and support provided. She is an amazing person and I could never have done this work without her. I would like to first acknowledge and thank my advisor Dr. Lee White for all his help over the years. Taking his class in Software Engineering really got me excited about this area of research and also showed me how behind industry really is. This became the motivation for both my research in this area and my research program at ABB. Dr. White also showed me that youth and speed are great in many sports, but precision matters most in racquetball. That lesson was repeated constantly throughout our time working together. We have analyzed some very interesting software testing problems over these last six years, and I thank him for all of the time, dedication, and patience he gave me. Next, I would like to thank Dr. Vincenzo Liberatore. I got to know Dr. Liberatore in his research networking class, which really opened my eyes to other areas of research in Computer Science. To this day I am amazed that my class project ended up published as a short paper to an international workshop. I have really enjoyed the research, proposals, and discussions with him over the years, and I thank him for all of the opportunities and help. Also, I would like to thank Dr. Andy Podgurski. His knowledge of software engineering and testing are very helpful to me personally and to my research program at ABB. Many of the ideas and work he has done have refined how I look at software testing, and I am grateful to him for all of the help and support. Dr. Ken Loparo has also been a great committee member. His knowledge of the systems and products that I studied for this work, as well as their market and use, has been a great benefit to my research and the committee. I am very grateful for his support and help. Another group who deserves thanks is the EECS student affairs team. Without their support getting forms signed and questions answered, this dissertation might never have come about. Outside of Case, I would like to thank ABB. Without the experiences I gained while working there, not to mention the funding, this research would not have happened. I am very fortunate to have that knowledge and data available to me, and I hope that this research will be put to good use in ABB soon. My family also deserves many thanks. Their encouragement and support were very helpful and really enabled me to get this work done. They all had to put up with my trips to the library, requests for quiet, frequent absences, and various other things that came up over these years. A special thanks to my son Aidan and my daughter Arwyn. Aidan had many nights where he was not able to get my full attention, and I thank him for being patient. Arwyn had to learn not to push the mouse and hit a key when I was holding her, no matter how fun it looks. It is amazing how fast hundreds of pages can be deleted and I think this was the real reason that Undo was invented. Last, and most important, I would like to thank my wife Heidi. Without her encouragement and support I would never have pursued this degree. She sacrificed time, both her own and ours together, to make sure I could take courses, study for tests, do this research, and be successful. She is my role model, and her work ethic and study habits are what I strive for. Her love and support enabled this research to be possible, and I am forever in her debt. Table of Contents List of Tables ..................................................................................................................... 3 List of Figures.................................................................................................................... 4 1. Introduction................................................................................................................... 6 2. Proposed Solution ....................................................................................................... 14 2.1 Solution Overview .................................................................................................. 14 2.2 Using the Solution................................................................................................... 17 2.3 Example Applications of the Solution .................................................................... 18 3. Needed Firewalls ......................................................................................................... 24 3.1 Traditional Firewall ................................................................................................ 24 3.2 Extended Firewall ................................................................................................... 30 3.3 COTS Firewall ........................................................................................................ 35 3.4 Deadlock Firewall................................................................................................... 41 3.5 Other Future Firewalls ............................................................................................ 50 4. Configurations and Settings Firewall........................................................................ 52 4.1 Settings Changes..................................................................................................... 56 4.2 Configuration Changes ........................................................................................... 62 4.3 Constructing a Firewall for Settings Changes ........................................................ 66 4.4 Constructing a Firewall for Configuration Changes............................................... 77 4.4.1 Constructing a Firewall for New Configurable elements .................................... 81 4.4.2 Constructing a Firewall for Previously Used Configurable elements ................. 87 4.4.3 Constructing a Firewall for Removed Configurable elements ............................ 91 4.5 Time Complexity of the Configuration and Settings Firewall................................ 93 4.6 Future Improvements on the Configuration and Settings Firewall......................... 99 5. A Process to Support the Configuration and Settings Firewall............................ 101 5.1 Current Industry Testing Process.......................................................................... 101 5.2 Modified Industry Testing Process ....................................................................... 103 5.3 Time Study of the Proposed Release Testing Process .......................................... 107 5.4 Future Additions to the Proposed Release Testing Process.................................. 108 6. Empirical Studies of User Configurable Software Firewalls................................ 109 6.1 Empirical Studies Overview ................................................................................. 109 6.2 Limitations of Empirical Studies .......................................................................... 114 6.3 First Case Study .................................................................................................... 114 6.3.1 First Customer Study – Embedded Controller................................................... 115 6.3.2 Second Customer Study – Embedded Controller .............................................. 119 6.3.3 Additional Customer Studies – Embedded Controller....................................... 123 6.4 Second Case Study................................................................................................ 126 6.4.1 First Customer Study – GUI System ................................................................. 127 6.4.2 Second Customer Study – GUI System ............................................................. 130 6.4.3 Additional Customer Studies – GUI System ..................................................... 132 6.5 Third Case Study................................................................................................... 136 6.5.1 First Testing Study – GUI System ..................................................................... 137 6.5.2 Second Testing Study – GUI System ................................................................ 139 6.5.3 Summary of Third Case Study........................................................................... 141 6.6 Fourth Case Study................................................................................................. 141 6.6.1 Taxonomy Overview ......................................................................................... 142 1 6.6.2 Embedded Controller Defect Classification ...................................................... 145 6.6.3 GUI System Defect Classification ..................................................................... 148 6.7 Fifth Case Study.................................................................................................... 151 7. Conclusions and Future Work................................................................................. 159 8. References.................................................................................................................. 163 2 List of Tables Table 1. Results of Procedural Firewall Testing at ABB.................................................. 29 Table 2. Results of Object-Oriented Firewall Testing at ABB......................................... 29 Table 3. Results of EFW Testing at ABB......................................................................... 32 Table 4. Effort Required for EFW Testing at ABB .......................................................... 33 Table 5. Results of EFW Testing at Telecom Company .................................................. 34 Table 6. Effort Required for EFW Testing at Telecom Company.................................... 34 Table 7. COTS Firewall, First Study Results at ABB ...................................................... 38 Table 8. COTS Firewall, Second Study Results at ABB.................................................. 39 Table 9. COTS Firewall, Third Study Results at ABB..................................................... 40 Table 10. COTS Firewall, Fourth Study Results at ABB ................................................. 40 Table 11. Summary of Case Study 1 .............................................................................. 126 Table 12. Summary of Case Study 2 .............................................................................. 135 Table 13. Results from the Third Case Study ................................................................. 141 Table 14. Beizer’s Taxonomy’s Major Categories ......................................................... 142 Table 15. Beizer’s Taxonomy’s Functional Bugs........................................................... 143 Table 16. Beizer’s Taxonomy’s Functionality as Implemented Bugs............................ 143 Table 17. Beizer’s Taxonomy’s Structural & Data Bugs ............................................... 144 Table 18. Beizer’s Taxonomy’s Implementation & Integration Bugs............................ 144 Table 19. Beizer’s Taxonomy’s System and Test Bugs ................................................. 145 Table 20. Summary of Source Metrics ........................................................................... 153 Table 21. T-test Results for Call Depth .......................................................................... 154 Table 22. T-test Results for Fan In ................................................................................. 155 Table 23. T-test Results for Fan Out............................................................................... 156 Table 24. T-test Results for LOC / Method .................................................................... 157 Table 25. T-test Results for Cyclomatic Complexity ..................................................... 157 3 List of Figures Figure 1. Example Procedural Firewall Graph [13].......................................................... 26 Figure 2. Example Object-Oriented Firewall Graph......................................................... 27 Figure 3. Example Extended Firewall Graph ................................................................... 31 Figure 4. Example COTS Firewall Graph [14]................................................................. 36 Figure 5. Example Deadlock Graph, Two-Way ............................................................... 42 Figure 6. Example Deadlock Graph, Three Way.............................................................. 43 Figure 7. Example Deadlock Graph with Message Queues ............................................. 44 Figure 8. Example Deadlock Firewall Graph for a Modified Task .................................. 45 Figure 9. Example Deadlock Firewall Graph, First Study Original ................................. 46 Figure 10. Example Deadlock Firewall Graph, First Study Changed .............................. 46 Figure 11. Example Deadlock Firewall Graph, Second Study ......................................... 47 Figure 12. Example Deadlock Firewall Graph, Third Study ............................................ 48 Figure 13. Example Deadlock Firewall Graph, Fourth Study .......................................... 49 Figure 14. Example of a Settings Change......................................................................... 58 Figure 15. Example Configuration Addition .................................................................... 65 Figure 16. Process Diagram for a Settings Change .......................................................... 67 Figure 17. Example List of Settings ................................................................................. 71 Figure 18. Example Settings Change GUI........................................................................ 72 Figure 19. Example Difference of Two Configurations, Settings .................................... 73 Figure 20. Example Settings Code Definition .................................................................. 74 Figure 21. An Example EFW from a Settings Change..................................................... 76 Figure 22. General Process Diagram for Configuration Changes .................................... 78 Figure 23. Process Diagram for a New Configurable Element ........................................ 82 Figure 24. Example Configuration ................................................................................... 83 Figure 25. Example Difference of Two Configurations, Adding ..................................... 84 Figure 26. Example Source for a Configurable Element.................................................. 85 Figure 27. Process Diagram for a Previously Used Configurable Element...................... 88 Figure 28. The V-Model [48].......................................................................................... 102 Figure 29. Release Testing, Old and New Methods ....................................................... 105 Figure 30. Case Study 1, Configuration Change with Latent Defect ............................. 118 Figure 31. Case Study 1, Settings Change with Latent Defect....................................... 119 Figure 32. Case Study 1, Added Configuration Change................................................. 122 Figure 33. Classification of Embedded Controller Defects ............................................ 146 Figure 34. Classification of GUI System Defects........................................................... 148 4 A Firewall Model for Testing User-Configurable Software Systems Abstract by Brian P. Robinson User-configurable software systems present many challenges to software testers. These systems are created to address a large number of possible uses, each of which is based on specific configurations. Configurations are made with combinations of configurable elements and settings, leading to a huge number of possible combinations. Since it is infeasible to test all combinations at release, many latent defects remain in the software once deployed. An incremental testing approach is presented, where each customer configuration change requires impact analysis and retesting. This incremental approach involves cooperation and communications between the customer and the software vendor. The process for this approach is presented along with detailed examples of how it can be used on various user-configurable systems in the field. The overall efficiency and effectiveness of this method is shown by a set of empirical studies conducted with real customer configuration changes running on two separate commercially released ABB software systems. These two systems together contained ~3000 configurable elements and ~1.4 million Executable Lines of Code. In these five case studies, 460 failures reported by 100 different customers were analyzed. These empirical studies show that this incremental testing method is effective at detecting latent defects which are exposed by customer configuration changes in user-configurable systems. 5 1. Introduction The testing of user-configurable software systems presents significant challenges to practitioners in the field. These systems allow a huge number of possible user configurations in the system, each of which can affect its execution. These configurations are composed of user specified combinations of configurable elements, including individual settings values which exist inside the elements themselves. Due to this combinatorics problem, it is infeasible to completely test these systems before release [30, 31], resulting in many latent defects remaining in the software when it is deployed to the field. Recently, there have been a few approaches to try to address this problem. The first approach combines statistical design of experiments, combinatorial design theory, and software engineering in an attempt to cover important, fault revealing areas of the software [26, 32, and 34]. One study of open source software by NIST shows that these techniques can be effective when tests can cover a large number of pairs [40]. Another recent study shows a technique which prioritizes configurations, allowing earlier detection of defects but leading to a decrease in overall defect detection [33]. These studies were conducted on an open source system and a small set of test cases from the Software-artifact Infrastructure Repository [41], respectively. Neither of these studies was conducted on large or industrial systems. Another approach relies on parallelism and continuous testing to reveal faults in the system. This system, named Skoll [52] , was developed at Maryland by Porter, et. al. Skoll runs multiple configurations in parallel on separate systems, allowing for a larger number of combinations to be tested. In addition, the system employs search techniques 6 to explore the configuration space and using feedback to modify the testing as it is being performed. Also, this system continues testing configurations after the release of the product. This is a very promising approach to try on user-configurable systems. However, it may suffer from scalability problems when used on hardware limited systems, such as those running on a custom embedded hardware platform, or when multiple release baselines are being maintained. In practice, industry testers first verify the system with common configurations that are created with expert knowledge and sample simulated field data. These configurations are created to test areas perceived to be high risk, an idea taught by James Bach [4]. Bach also made the ALLPAIRS program [3] with allows industrial testers to generate a small set of pair-wise tests which satisfy a coverage standard. Once the system is verified using these methods, each new customer’s configuration is used in a very extensive testing activity. This testing is conducted when the software is first delivered, installed, and commissioned [5], and involves running the software thoroughly on the specific installed system, including its final settings and configurations, through both normal run modes as well as any error cases that can be injected into the system. While these forms of testing may work for the initial commissioning and installation, users of these software systems often make changes to their configurations throughout the lifetime of the installation. These changes to the configuration of the software can cause failures related to defects that are latent and hidden within the initial released version. These latent software defects were never detected in the release testing of the system, where the simulated example customer configurations were run, and also remained hidden for other customers, most frequently due to other customers running 7 different configurations or settings. As a result, customers who have been running failurefree for years, from their point of view, now have a major risk and potential quality problem when changes to the configuration and settings are made. This issue is exacerbated by the fact that there was no code within the software that changed, the usual action that customers associate with the risk of new failures. In many cases, only a few configuration or settings items were added or modified, leading to this new defect affecting the software’s stability for that customer. Before going further into the proposed solution for this problem, a better understanding of user configurable systems will be presented. User configurable systems are software programs (or groups of programs) that are created as a general purpose solution to address a broad market need by presenting the ability to address many specific needs that individual customers may have. Each customer within this market has a smaller set of specific problems and needs, and this type of software addresses them by taking the general purpose software and specializing it. This specialization is accomplished by using configurations that direct the execution of the program to solve the exact problem or need each customer has. In order to provide the ability to solve such varying and diverse issues within this broad market, the system is usually made up of a large number of configurable, library-like components, called configurable elements. These elements are only executed when the customers configure the system to include them in their running configuration. In addition, each of these elements can contain a number of settings whose values further refine the actions that the element performs. Configuring systems such as these usually involve connecting or grouping the elements to process different events and actions, usually in a programming environment. These 8 groupings can be set up either graphically or programmatically, depending on the implementation of the software and the needs of the specific market. A real-time control system is an example of a user-configurable software system. These systems are used to control the operation of factories, power plants, chemical plants, and pharmaceutical manufacturing. Users of these kinds of systems purchase a base set of software that contains the many different functions and rules which, either independently or in cooperation with the vendor, can be used to configure the system to the customer’s specific process needs. All of the systems installed by customers contain the same base configurable elements, but each customer uses a subset of them and groups them in different ways, leading to very different execution patterns for each customer. For example, there are many different control algorithms, such as Proportional-IntegralDerivative (PID) control algorithms [37], each of which exists as a configurable element. Within each control algorithm’s configurable element, there are many settings that can change the operation of the mathematical function used. This function directly feeds the output value used by other elements in the configuration. Another example of a user-configurable software system is an Enterprise Resource Planning (ERP) system [38], such as the ones developed at SAP™. These systems are meant to model and manage a company’s business process flow. These ERP systems contain base libraries and functions that are needed to implement and run a business process. Users of ERP systems can configure the software, either independently or jointly with the vendor, for their individual business process needs. These types of systems are becoming more common and more widely deployed. An example of a configurable element in an ERP system is a business rule. Each company has different 9 business rules for each process they are implementing, most of which are based on one of a set of patterns. Companies can configure the system with the specific business rule element for the pattern that they wish to use. An example of configuring one of these types of systems is the specific accounting model a company wants the system to use. There are many accounting models available, such as First-In-First-Out or Last-In-FirstOut, and each of them are configurable elements in the system. Each of these configurable elements has settings, and these settings influence the execution of that element. In the case of the accounting model, these settings select how the company wants to organize the accounting model, either by region, cost center, code, country, or business unit. A new approach to testing user-configurable software systems is presented, specifically aimed at finding latent defects that customers would detect. In this approach, each initial configuration is tested before its initial use. Instead of trying to cover as many other configurations as possible, additional testing is postponed until the users of the software make changes to their configurations. By using a method completely based on user changes, only defects of relevance to a customer will be revealed. Data collected from failure reports at ABB show that configuration-based failures found by internal testing are only fixed 30% of the time, compared to non configuration-based failures which have an overall fix rate of 75%. Configuration-based defects are often postponed until customers in the field report them. The proposed approach can be considered a new or modified form of regression testing and, as such, there is a need to determine the testing required for each type of customer change to verify that the system still performs correctly and the change did not 10 expose a latent defect. While the purpose and data needed for this new form of regression testing is different, the main steps that must be followed are the same. These steps include identifying the specific change itself, determining the impact of that change, and finally a selection of tests that cover that impact to verify that the system still meets its requirements after the change and no latent defects are exposed. Regression testing involves selective retesting of a system to verify that modifications have not caused unintended effects and that the system still complies with its specified requirements [1]. There exist many regression test selection (RTS) techniques that minimize the time and resource costs of retesting changed software. Methods for test selection and reduction are highly desirable, as a complete retest of a system is cost prohibitive [20]. Many RTS methods make use of control flow information to determine the impact of a change, such as [8, 11, 17, 18, and 23]. Many of these methods were later expanded to support Object Oriented systems [9, 24, 25, and 28] which complicate basic control flow methods. Thomas Ball improved these control flowbased methods with algorithms that achieve a much finer grain selection of the areas which require retesting [16]. Besides control flow, many other dependencies have been used for RTS methods. The first, data flow, expands the impact along longer data flow dependencies which would otherwise be missed. These techniques, such as [13, 22, 27, 29, and 39], take longer to determine the impact of the change, but allow for the detection of defects related to these data flow paths. In addition, dependencies dealing with global variables [12], COTS components [14], and GUI systems [10] have been studied. Finally, other concepts have been used to augment regression test selection, such as program slices [21] and guided semantics [19]. 11 These existing RTS methods are all intended to detect regression defects coming from code changes within the software under test. In the case of changes in user configurations, these methods do not directly apply, since there is no actual change to the software itself. In addition these current RTS methods all assume no latent defects remain in the system, since their focus is regression defects that are the result of a code change within the software product. This is not the case with user configuration changes. Finally, these systems lack an overall complete test suite that can be used to select regression tests from, since fully testing a system with this many combinations of configuration elements is not feasible. These problems will be solved by creating a new RTS method which addresses configuration changes and latent defects in these types of software systems. In addition to providing a solution to this specific problem, the new RTS method provides an industrial organization the opportunity to significantly change their release testing activities for these user configurable systems. Since each software configuration change will lead to retesting at the customer site based on the results of this new RTS method, it becomes less important to do exhaustive testing before release. Each customer’s current configuration could be tested at each software release, verifying that no traditional regression defects are introduced in the code for running customers. In addition, some other unused parts of the system could be tested for general use; for example, testing could be planned for common changes to these customer configurations, verifying that likely changes the customer will make do not contain regression defects, both traditional and latent. The exhaustive testing of these unused areas can be postponed until a customer configures the system to use these features. This RTS method supports, in effect, a test-as-the-software-is-used approach. This will lead to faster times to market, 12 as many large configurable systems have many features that remain unused for years. This can also lead to more satisfied customers, as each customer would know the system works for their specific configuration and usage, and have that software released out to them faster. 13 2. Proposed Solution This chapter presents a high level description and overview of the proposed solution, shown in Section 2.1. A description of how to use the method is presented in Section 2.2. Finally, Section 2.3 presents high level examples of how to use the method in practice. 2.1 Solution Overview User-configurable software systems present very unique and challenging problems for all of software engineering and testing in particular. These systems can be configured in a large number of ways by using the system’s configurable elements, each of which may contain many settings. These configurable elements and settings lead to a system with a huge number of possible executions, resulting in a high probability that latent defects exist. Even after release testing is complete and the software has been executing in the field, many of these defects remain. Latent defects, such as these, can be exposed at a later time by customers changing their running configuration. Since these defects never caused failures to be seen, many customers and software providers treat these defects as regression faults. Since these defects are not the result of code changes, traditional regression testing is not able to detect them. Since this problem is different than previously researched problems in regression testing, current RTS methods do not directly apply. The events which expose these defects are based on a change, however, and solving this problem involves using the same principles. The solution involves an extension to a current regression test selection method, specifically the Traditional Firewall originally developed by White and Leung [13]. The firewall concept is based on 14 building design where hardened walls are created that prevents a fire from spreading from one area to another. This should not be confused with network firewalls, which were named in the same way. This new RTS method is called the Configuration and Settings Firewall. Before the Traditional Firewall can be extended to work with this type of change, a better understanding of RTS methods is presented. RTS methods can be broken down into common steps that are used. The first step determines the specific areas of the system that changed. In many RTS methods, this step is accomplished by performing a code or binary differencing between the newly changed version of code and the previously tested version. This step is often automated by a tool which compares the two software versions and generates a list of changes. Next, each difference is identified at a specifically defined granularity level, depending on the RTS method used. Frequently, the desired granularity involves identifying the specific function or object that contains the change and then marking that entire function or object as changed. Once the changes are identified, the impact of the change on the surrounding system is determined. This step requires analysis and system knowledge, as dependencies and relationships that exist within the system must be determined. These relationships include simple concepts such as control flow, which are related to paths in the code, and more complicated concepts such as data flow, which describe relationships involving variable passing and program state. Once the impact from these relationships is identified, it is used in the final step in RTS methods. This step involves the selection of tests that are needed to verify that this change does not adversely affect the system. This is accomplished by selecting 15 previously created tests which cover the affected areas, or creating new tests when previous tests do not exist or are insufficient to test the impact of the change. A solution to the problem of configurable systems and latent defects involves creating a new type of regression test selection method for configuration and settings changes by extending current, code-based RTS methods. This new RTS extends the firewall method for use on user-configurable systems which is accomplished by making changes to the actions taken in each step of RTS methods. The first RTS step determines the changes within the system. In the new firewall, changes are derived from the configurations themselves by comparing the new configuration to the previously running and tested configuration. Differences are identified and categorized as either settings changes or configuration changes. The differences between these types of change are discussed in Chapter 4. Once all configuration differences have been identified and categorized, the impact of these changes is determined. This is done by starting at the change itself and determining all relationships which exist from the change to the rest of the system, similar to how code-based firewalls are created. Any time there is a dependency between the current function and other functions in the system, the current function is marked as changed. Once complete, the analysis continues on the related function and this process repeats until no more dependencies are present. Dependencies that are checked for include control flow, data flow, and other relationship types for which there are firewalls available. The process stops only when there are no more dependencies to include, which means there is no further impact to the current part of the system from the changed part of the system. Finally, once the impact has been determined, tests are selected and created to completely validate the impacted part of the 16 system. These tests can be either new tests or reused tests, depending on if the change affects parts of the system that have been tested before. This step is very similar to the various code-based firewall RTS methods. More detail will be presented on each of these steps and how they are accomplished in Section 2.3 and Chapter 4. This new Configuration and Settings Firewall makes use of numerous codechanged-based testing firewalls developed in the past for different types of dependencies. These firewall models include the Traditional Firewall, which handles control-flow dependencies, the Extended Firewall, which handles data-flow dependencies, the Deadlock Firewall, and the COTS Firewall, for use where source code in not available. In addition to these firewalls, current research is ongoing to further extend the firewall concept for dependencies related to memory leaks and performance issues. Each of these future firewall models is useful for both code changes and configuration and settings changes. Before the Configuration and Settings Firewall can make use of any of these code-based firewalls, each must be shown to be effective for code-change-based regression testing of industrial systems. This effectiveness is shown by a number of empirical studies conducted on user-configurable software systems at ABB. These studies are presented in Chapter 3. 2.2 Using the Solution Customers must understand that the effectiveness of the configuration and settings firewall is dependent on it being used on each change to the configuration of the system. For each set of changes made, the system must be analyzed and retested with the new firewall to verify no latent defects are exposed by the change. This is especially true if the proposed changes to release testing shown in Chapter 5 are used by the software vendor. 17 Also, the software vendor must have a good relationship with frequent communication with the customers of the software, as each change to the configuration will need to have some testing done to verify that no defects exist in the new configuration. The testing itself can be done by either the original vendor of the software or, assuming they agree, the customer who made the change to the configuration. Each software configuration change will lead to some retesting, so it becomes less important to do exhaustive testing before release. Customer configurations can be retested at each release with traditional code-change-based RTS methods. Currently, unused portions of the system can be tested for very general use to show it performs its requirements and has no defects in sample configurations if time permits. The detailed testing of these unused functions can wait until a customer decides to add them to their configuration. At that time, a configuration and settings firewall is created to verify that no latent defects were exposed by the change. By deferring this testing until customers configure it, the product can have faster times to market and more satisfied customers, as customers primarily care that the system works for their configuration. 2.3 Example Applications of the Solution To better illustrate the concept and use of the Configuration and Settings Firewall, a few high level examples are presented. These examples show when and how the firewall can be used and include configurable control and ERP systems. Further details on the configuration and settings firewall, including how it is built, are presented in Chapter 4 and will not be discussed in this Section. The first system is an embedded process controller. These controllers are configured to do many forms of process control on various input values and types. 18 Configuring this system involves creating a graph which contains both function code blocks, which are configurable elements represented by nodes, and relationships between them, which represent dynamic linkings between the components and are represented graphically by arcs. Relationships can be unidirectional or multidirectional depending on the requirements and usage needed between these function blocks. Each function code contains code which is executed only when an instance of it exists in the configuration that is currently running in the controller. There are a large total number of function blocks that can be used to configure a system, many of them dealing with specific data types and algorithms used to control a process. Within each function code block reside many settings that affect the way the specific function block executes. These settings take the form of values assigned in the configuration of the block itself. In general, settings act as parameters, refining the control block to the specific action required. Settings include values which define timing values, modes of operation, and even simple labels. An example based on the first system starts with a controller module that has been running in the field for five years with no documented issues reported from the customer. This customer has been steadily adding new configurable elements to the configuration for the past five years in order to track additional values in the system. These values are needed by the plant operators and were discovered as important in the process of running the plant after it was commissioned. Each time a new value is added to the configuration, it is loaded into the process controller and put into execution. In the most recent change made, there were a number of new configurable elements added. These additions were new analog data points that the plant operators wanted to view and monitor as the plant was running. The plant engineer made the required changes to the configuration and 19 loaded it into the controller. After the new configuration was running, it went into an error state which caused the plant to shut down. This customer now has a major software quality issue, including a loss of production, even though the system has been running failure-free for this customer and no code changes have been made to the software. Applying the Configuration and Settings Firewall to this example would have prevented the plant from shutting down by determining the impact of the change and testing that area for latent software defects. Creating the firewall for this example starts by identifying the configuration and settings changes made by comparing the new and old configurations. The details for each step will be shown in Chapter 4. Once the changes have been identified, a mapping of the configurable elements to the code that implements them is created. Using this mapping and the differences found, a Traditional Firewall is created that treats each configuration and settings change as a code change. Each of the changes is analyzed to determine if additional relationships are present. If so, other firewalls are created, each of which is described in Chapter 3. Using the results of the various firewalls, tests are either selected or created to verify that these changes did not expose any latent defects. After testing is complete, the configuration is loaded into the customer’s process controller and the plant starts running with the changed configuration. The defect in this example is due adding a new configurable element to the last linking slot of a different, previously configured element. A grouping of up to fifteen analog input points is created by connecting each point to another configurable element which allows up to fifteen connections. In this case, there had never been a full fifteen analog values present in the configuration before, since the previous configuration had only ever used fourteen. This software defect was latent within the code that handles the 20 fifteenth element and was never uncovered since the code for that specific case had never been run. Another example user-configurable system deals with human system interface (HSI) applications. The specific HSI application in this example runs in the Windows OS on standard desktop PCs. This system is responsible for configuring the physical and logical organization of the process control system, including physical hardware, network infrastructure, and control logic. Configuring this system involves adding different configurable elements from various libraries into a configuration, and then linking these software elements to the actual physical elements they are representing graphically in the HSI. In many cases there are multiple configurable elements, such as PIDs and shaping functions, which are all linked to one specific physical element, such as a temperature or pressure measurement device. In this example, the HSI is running the originally released software version and has reported no defects in the system. The HSI’s configuration is changed every few months to match physical changes throughout the plant. Recently, the configuration was changed to add new display and alarm elements for the plant operators. These display elements are linked to previously existing configurable elements, representing physical devices in the plant, specifically a temperature sensor and a pressure sensor. The alarm events are linked to the new display events and its settings contain values describing when to enter the alarm state and what action to take when this alarm state is reached. The configuration is loaded into the system and when the alarm event occurs, the specifically defined action does not occur due to a latent software defect. Thankfully, the 21 plant operator noticed the value should have triggered an alarm and was able to shutdown the plant, or a larger issue would have occurred. By using the Configuration and Settings Firewall in this situation, the area affected by the change would have been identified and tested before the system started executing the change, and the defect would have been detected earlier. For this example, a Configuration and Settings Firewall is constructed. In this case, the new alarm and display elements were linked to previously existing elements and added to the configuration. From this change information, a Traditional Firewall is created, along with any other needed firewalls, which identified the impact of the change. Then this impact is tested, revealing any latent defects exposed in the changed configuration. The defect in this HSI example is related to the actual response to the event that was configured in the system, and this response is a selection of possible actions that can be taken to remedy the event. Inside this HSI system, the action selected by this changed configuration had never been run before, and this action did not respond properly due to the latent defect in the code. This defect existed in an area of the code that was contained in the impact determined by the Configuration and Settings Firewall which would have led to it being detected earlier. A final example of configurable systems is an ERP system. The ERP used in this example is a SAP system and is configured to run centralized accounting functions. This system was running for over four years with no major quality problems. The customer decided to change from the International Financial Reporting Standards to the Generally Accepted Accounting Principles that the US markets use. This required configuration changes, which lead to a small error affecting financial records. This failure was found 22 internally by audits, but it now represents a major defect, since the Sarbanes-Oxley Act requires all financial records to be certified correct by the CEO and executives of the company. By constructing a Configuration and Settings Firewall, defects of these types can be detected before the configuration change goes into live use. These examples make it clear that the problem of latent software defects contained in user-configurable systems is critical. Applying the Configuration and Settings Firewall is an effective way to address this problem and can detect these defects before they cause major customer problems. Chapter 3 briefly presents the underlying code-based firewalls that are used in the Configuration and Settings Firewall, highlighting significant new empirical studies that show their effectiveness in code-based regression testing. Chapter 4 presents the details of the Configuration and Settings Firewall, including the differences between configurable elements and settings, a detailed listing of the action for each of the steps, when and how the firewalls in Chapter 3 are used, and a detailed set of examples on how to construct this new firewall. Chapter 5 presents a modified release testing process to augment use of this firewall after release. Finally, Chapter 6 presents the setup and results of a set of empirical studies that show the effectiveness and efficiency of this method in practice. 23 3. Needed Firewalls The Configuration and Settings Firewall utilizes code-change-based firewalls to determine impact. In order for the new firewall to be accurate, an empirical investigation of previously developed firewall models have been conducted, as well as the creation of a new firewall to address impact propagation that is not currently addressed. The Traditional, Extended, and COTS Firewall ideas were developed by others outside the scope of this research. The Deadlock Firewall and all of the empirical investigations of these firewalls were developed as part of this research. In addition, firewalls are proposed for additional types of impact propagation that are not supported currently. This chapter is organized with Section 3.1 containing a brief overview of the Traditional Firewall [13], as well as the details of the industrial empirical studies conducted on it. Section 3.2 presents the Extended Firewall and the results of new empirical studies on its use in industrial practice. Next, Section 3.3 presents the COTS Firewall for third party components and the results of studies on its effectiveness. Section 3.4 discusses the newly created Deadlock Firewall and presents the results of empirical studies of its use in industry. Finally, Section 3.5 talks about additional future firewalls that would be beneficial for both code-change-based regression and the new Configuration and Settings Firewall. 3.1 Traditional Firewall The Traditional Firewall RTS method (TFW) can be applied to both Procedural software [13] and Object-Oriented software [9]. These methods both involve determining the difference between the code of a previously tested software version and a changed 24 version that needs to be tested. This difference is usually created with some kind of differencing tool, such as Araxis Merge [42]. Each individual change in the source code is mapped to the function or object in which it resides and this function or object is marked as changed. This mapping sets the granularity of the change to the level required for the RTS method, in this case the function or object level. Once all of the differences have been determined, an analysis of the impact the changes have on the software is performed. The impact analysis used in the Traditional Firewall method, both procedural and OO versions, involves starting at the function or object identified as changed and then selecting each function or object that is one level away, in a control flow graph, from the change as needing to be tested. Each of these functions or objects, including the change itself, is considered to be inside the testing firewall. Only the functions or objects that have a calling relationship with the changed entity will need to be tested. Data flow relationships are not considered in this firewall model. Each function or object identified as needing testing is mapped to test cases based on the type of change. These types include checked, requiring only a few tests to check the functionality of the component, changed, requiring a complete retest of the component, or affected, which requires all interfaces to the changed component to be thoroughly tested. Based on this classification, tests are selected to cover the functions and objects within the firewall and nothing outside the firewall requires any retesting. For procedurally designed systems, the firewall is represented graphically as a calling tree where functions are the nodes and function calls are the arrows. Each code change is marked on the node of the graph representing the changed function. The boundaries of the firewall are represented as bold arcs on the function calls leading into 25 the functions requiring test. In practice, when complete calling graphs do not exist, these graphs are generated ad-hoc, by starting at the change and following the control flow dependencies which exist in the code. An example of a procedural firewall graph is shown in Figure 1. Figure 1. Example Procedural Firewall Graph [13] For object oriented systems, the firewall is represented graphically as a class relationship diagram. Each class is represented as a node and each relationship is represented as an arc. Different styled arrows represent each kind of OO relationship supported in the model, namely inheritance, composition, association, and usage. The arrows that mark the edge of the testing required are bold as in the procedural version. These firewalls, when used in industry, are created in a similar manor as the procedural method, by starting at the changed object and building the graph along each control flow dependency until there are no additional relationships left to add. Figure 2 shows an example of an Object-Oriented firewall graph. 26 K H J I M G W L X U sing AA B H C T D N O U S E P Q F R V Specification C ode C hanged C hanged C hanged A ffected A dded Included in the firew all C hecked Figure 2. Example Object-Oriented Firewall Graph These Traditional Firewalls, both procedural and object oriented, were never empirically studied on real systems in industry. Before they can be used in the Configurations and Settings Firewall, their effectiveness on real industrial systems was verified as part of this research. This empirical evaluation was accomplished by selecting iterative versions of many different software systems at ABB and creating Traditional Firewalls on them [7, 39]. The TFW was run on fourteen different major releases of software and the object oriented firewall was run on fifteen different major releases, four at ABB and eleven at another company. These selected versions represented new releases of the software at the time of the study and used the firewall method to select tests required to verify the system had no regression defects. In order to determine the effectiveness and efficiency of the firewall method, the original ABB method of selecting tests was run in parallel. This original method involves guess work based on feedback from previous releases and both system and developer-based expert knowledge. This previous ABB method was, in effect, a very rudimentary RTS method. The analysis conducted here is somewhat different then those used in other studies on regression, as the effectiveness is determined by comparing the ABB expert guess method against the 27 TFW method. Previous studies looked at comparing the retest all case to the specific RTS method under evaluation. Complete retesting of these large systems takes such a long time that using it as a comparison for TFW would result in a reduction so large that it would be meaningless. For example, a complete retest for a specific software product in ABB takes three and a half man years of effort. The ABB expert guess method identified three man weeks of regression testing. The TFW for a small change may select three man days worth of effort, leading to a 99.87% reduction of test time when compared to the retest-all case. A more accurate comparison of the TFW to the current ABB RTS method yields an 85.8% reduction in test time. By comparing the TFW to the current ABB method, we can present a benefit that is more meaningful to management and developers. The Traditional Firewall for procedural software was empirically validated with case studies performed at ABB for the releases listed in Table 1. The time required to test the TFW, shown in the firewall test time column, includes both analysis time, listed in the second column, and time required to execute the tests identified by the TFW. Using the TFW for these releases at ABB led to an average reduction of 42% in test cases executed and an average reduction of 36% in test time, when compared to the original test selection method used in ABB [7]. In addition to savings in test cases and calendar time, an additional 14 defects were detected that would not have been detected in the original testing, as well as all 28 of the defects that would have been detected using the original test selection method. The increase in defects detected is due to the developers and testers selecting incorrect areas to retest based on both their past data and their expert knowledge. In addition to these savings and defects detected, no customer regression 28 defects were reported against the versions in this study after release, showing that firewall is effective at selecting correct test cases. Table 1. Results of Procedural Firewall Testing at ABB Analysis Number Firewall Files of Builds Test Time Time Project (Hours) Modified Tested (Days) 10 10 2 4 1 6 13 1 3 2 13 2 9 8 3 9 17 3 10 4 23 22 5 15 5 18 7 11 25 6 3.5 5 1 10 7 1 7 5 3 8 1 4 1 4 9 0.5 2 2 5 10 0.5 2 4 5 11 60 31 17 15 12 10 113 8 25 13 15 93 27 20 14 338 96 152 Totals: 170.5 Orig Number Number of % % Test of Tests Tests Time Savings: Original Firewall Savings 10 60% 75 42 44% 5 40% 25 10 60% 20 60% 60 40 33% 20 50% 150 84 44% 30 50% 140 66 53% 35 29% 500 305 39% 5 -100% 30 45 -50% 8 63% 110 60 45% 5 20% 50 20 60% 10 50% 60 30 50% 10 50% 60 30 50% 20 25% 200 90 55% 35 29% 250 155 38% 25 20% 200 130 35% 238 36% 1910 1107 42% The use of the object oriented TFW at ABB led to a reduction in test cases executed by 63% and a reduction in test time of 71%, shown in Table 2. The data was collected and created in the same way that it was for the procedural TFW. In addition, no new regression defects were detected by the customers after release in the two objectoriented systems analyzed. The results of using TFW on object-oriented software are shown in Table 2. Additional results are shown in Tables 3, 4, 5, and 6. Table 2. Results of Object-Oriented Firewall Testing at ABB Analysis Number Firewall Orig Number Number Time Files of Builds Test Time Test % of Tests of Tests % Project: (Hours) Modified Tested (Days) Time Savings: Original Firewall Savings 0.5 3 1 3 10 70% 130 51 61% 1 1.5 4 1 2 7 71% 60 20 67% 2 2 7 2 5 17 71% 190 71 63% Totals: 29 Since this TFW method only takes control flow relationships into account, it is not considered “safe”, which is defined as guaranteeing that all possible defect finding tests remain in the regression test suite [6]. Even with that limitation, these empirical results show that the method is effective while still being very efficient for use on real software systems in industry [7]. 3.2 Extended Firewall A main shortcoming with the Traditional Firewall deals with the impact analysis stopping one level away from the change. This limits the effectiveness of the method to only control flow dependencies. While no defects were missed in the releases studied for the TFW at ABB, a few regression defects were found by customers using other products or versions that were not analyzed with the firewall method. Some of these defects would not have been detected in the TFW, as these defects were due to changes made in long data flow relationships within the software. The Extended Firewall (EFW) was developed to increase the effectiveness of the TFW by extending the impact more then one level away from the change when certain data flow paths are present. This method extends the firewall model from just control flow dependencies to include these longer data flow dependencies which exist in large user-configurable software systems. Regression testing of data flow paths is not new and has been used in past research, such as [13, 22, 27, 29, and 39]. The key additions to the Traditional Firewall are the ideas of external dependencies and handling of return values. External dependencies occur when inputs to a function or object are based on another function or component output, creating a data flow dependency between the components. Return values from functions are also now 30 analyzed to determine if they form a unique data flow path in the code. The TFW assumes that all inputs to changed functions have no external dependencies and therefore selecting objects or functions one level away is enough. The EFW starts with a standard TFW, as shown in Section 3.1 As the TFW is created, each input to a changed function or object is inspected to determine if it is has any external dependencies. These dependencies include bi-directional paths where both callers and return paths are included. If dependencies exist, then this dependent function or component is included in the firewall and its callers are checked to see if they have external dependencies or not. This continues until the chain breaks by reaching a function or object that has no external dependencies, which becomes the edge of the firewall. This node represents the start of the data dependency and the whole path, from this starting node all the way back to the change, is identified as a data flow path that needs to be retested. An example of an Extended Firewall is shown in Figure 3. Affected Components External E: P A1 A2 Ak Must Be Tested Cm Messages Must Be Checked Modified Component Checked Components C1 External E: K C2 Figure 3. Example Extended Firewall Graph 31 Figure 3 shows an example EFW graph created for an Object Oriented system. This graph represents a subset of the total relationships that exist in the system and only includes objects within the firewall that require retesting. A data flow path exists, starting at the node labeled P and ending with the node labeled K. Notice that there are numerous nodes labeled A or C that do not include longer data flow paths. For these nodes, the one level away method of the Traditional Firewall is sufficient to detect regression defects. The TFW is contained completely within the EFW, so all tests that the TFW selects are contained in the set of tests that EFW selects. In addition, the EFW only adds tests for these additional data flow paths that are identified in the analysis. In order to determine the effectiveness of the Extended Firewall, an empirical study was conducted on a large user configurable software system. This empirical study was carried out on ABB software as well as a separate study on software from a large telecommunication company in another research project. The first empirical study of the EFW was conducted on a large user configurable software system at ABB as part of this research project. This study was completed on a system with over one million lines of code and more then 1,000 classes. Due to the size of the system, only two incremental versions were analyzed. The results are shown in Tables 3 and 4. Table 3. Results of EFW Testing at ABB Ver. 1 2 Modified Classes 29 : 12 new 18 : 3 new TFW Methods 181 145 EFW Methods 239 163 32 TFW Classes 86 82 EFW Classes 101 94 Faults Detected 6 : 1 new 10 : 2 new Table 4. Effort Required for EFW Testing at ABB Ver. TFW Tests EF W Tests Add. EFW Tests TFW Analysi s (Hrs) 1 2 168 112 219 141 30% 26% 20 17 EFW Analysi s (Hrs) 25 20 TFW Test Time (Hrs) 35 21 EFW Test Time (Hrs) 48 27 Add. EFW Time 37% 29% This data shows the number of classes modified as well as the number of methods that were identified as affected in both the TFW model and the EFW model. The Extended Firewall added approximately 28% more test cases and 34% more test time when compared to the Traditional Firewall. This additional time is due to each input to the changed object or function having to be checked to see if it contained external dependencies. In addition, if it does contain these dependencies, more tests must be rerun. The EFW will always detect all defects that the TFW does, as the EFW only adds additional retesting areas to the results of the TFW. This is shown with the TFW and EFW columns in Table 3. Each version’s TFW numbers are less than or equal to the numbers in the EFW. For all of the extra time spent, there are a total of three defects that are only detected by the EFW. The second empirical study involved a large telecommunication system and was conducted as part of a separate research study by another researcher. It is listed here only to show the effectiveness of the EFW method. This study contained 11 incremental software releases each with around 66 classes and 11K lines of C++ code. Each version had both a TFW and an EFW constructed for it. The results of the empirical study are shown in Tables 5 and 6. 33 Table 5. Results of EFW Testing at Telecom Company Builds Ver. 1 1.1 1.2 Total Ver. 2 2.1 2.2 2.3 2.4 2.5 Total Ver. 3 3.1 3.2 3.3 3.4 Total Modified Classes TFW Method s EFW Method s TFW Classes EFW Classes Faults Detected (Failures) 2 1 (new) 3 5 2 7 13 3 16 5 2 7 7 3 10 2 1 2 1 1 1 1 1 (new) 5 3 1 2 3 1 10 3 1 3 5 2 14 3 1 2 3 1 10 3 1 3 4 2 13 1 new 1 2 2 : 1 new 3 2 : 1 new 1 1 1 1 4 1 4 1 1 7 2 4 3 2 11 1 4 1 1 7 2 4 2 2 10 2 2 3 2 : 1 new 23 : 4 new Table 6. Effort Required for EFW Testing at Telecom Company Ver. TFW Tests EF W Tests Add. EFW Tests TFW Analysis (Hrs) 1 2 3 88 90 80 115 105 110 31% 17% 38% 6 7 5 EFW Analys is (Hrs) 13 11 10 TFW Test Time (Hrs) 15.7 16.0 14.3 EFW Test Time (Hrs) 20.2 18.5 19.3 Add. EFW Time 52% 28% 52% This study showed that the EFW added approximately 28% more test cases and 40% more test time. This time includes both analysis time and test time. While EFW testing required more time then the standard TFW, four new EFW defects were found in the two versions that the TFW would have missed. This data is similar to the data collected from ABB in the first study. These empirical studies show the Extended Firewall to be effective for industrial software systems when extended data flow paths are present and affected by the change. This method is less efficient overall than the TFW, as it takes more time to identify and map the external dependencies in the software, but the benefit of this additional time is an increase in the effectiveness of the model when compared to the TFW. Because of the 34 larger amount of effort to complete this firewall, it is recommended that the EFW be used only when these data flow paths are present in the system, which can be determined as the TFW is being created. 3.3 COTS Firewall The regression test selection methods listed in Sections 3.1 and 3.2 base their analysis on source code. Many software systems, including user configurable systems, use third party commercial-off-the-shelf (COTS) components. These components are sold to the user with no code, only containing executable images such as a library files or DLLs and some user documentation. The users of these COTS components must integrate these black boxes into their system by creating glue code that interfaces between the COTS component and the system which uses it. These COTS components often change, sometimes as frequently as every eight months [43]. When these components change, customers that use them often need to conduct regression testing to verify that the changes in the COTS component do not adversely affect the customer’s own system, which contain and use these components. This is made difficult by the vendors, who do not include any reliable change information with the new version and lack the details needed to help with a code based RTS method. Without source code available, a retest all or an expert knowledge guess method are needed to select test cases for regressing these systems. A solution to this problem is to extend the firewall model to support COTS software, which was developed as a joint collaboration with North Carolina State University (NCSU) and ABB [14]. This extension works directly with the binary image files instead of the source code, as source is not available for COTS analysis. The changes in the binary files must be identified, in 35 this case by a direct differencing of the images. Changes to the internals of the component must then be mapped to the top level API functions that use the changes, which are marked as affected. These affected API functions can then be compared to the customer glue code so customer impact can be identified. Once this is complete, a user can retest the parts of their system that use these affected top level API functions. All other API functions that do not use the changed parts of the COTS component do not require regression testing. Figure 4 shows an example COTS Firewall graph. Figure 4. Example COTS Firewall Graph [14] This figure shows how changes internal to the COTS Firewall are mapped to the top level API functions. Changed are propagated from the changed function to the externally exposed API functions one caller at a time. Changed function N is called by E4, which is then marked as changed. Once the top level API functions are identified as either changed or unchanged, each caller from the customer’s application that call changed COTS functions are marked as changed, shown as G2 and G4. Additionally, TFWs and EFWs are created around each, leading to G3 and the unlabeled nodes directly connected to G2 and G4 to require testing. 36 This firewall is constructed following the same steps as the previous firewalls, namely differencing, impact, and test selection. The difference is determined by comparing the previously used and tested binary image of the COTS component to the newly changed binary image. This difference, since it contains compiled binary code, can have many sources of change, only a few may be due to actual source changes. These source changes are the only differences that are important for this analysis, so a student at NCSU created a method to remove this other unneeded information. The removed information includes address table changes, specific calling addresses that move within the library, and other compiler related flags and options. More information on the details of how this method works can be found in [14]. Once the changed functions are identified, the impact must be determined. The impact analysis begins the same way that the Traditional Firewall does, by starting at the change. Instead of stopping one level away, as the TFW does, or determining data dependencies, as the Extended Firewall does, this method goes up the calling tree to determine the highest level API functions that call this change. Those API functions are considered affected and needing retesting to verify that there are no changes in the COTS component that break the customer application. This calling tree is created by using the various address tables that exist within the different types of components. Each of the affected API functions must then be retested by executing the parts of the customer application that use these functions. No other API functions need to be retested. In order to verify the effectiveness of this COTS Firewall, there were a total of four different empirical studies conducted at ABB. The first study was conducted on a 757 thousand lines of code (KLOC) ABB application written in C/C++, using a 67 37 KLOC internal ABB software component in library (.lib) files written in C. This internal ABB component was considered the COTS component in this study as it was created and built in a different location and then the .lib file was used in the main application. No source code for the .lib file was available to the developers of the main application. The result of the first case study indicates that this COTS Firewall can reduce the required regression test cases by 40% on average [14]. Some releases required no retesting, as no changes in the component affected the APIs that the product was using. The detailed results are shown in Table 7. Table 7. COTS Firewall, First Study Results at ABB Metrics Changed component functions Added component functions Deleted component functions Affected exported component functions Affected functions in the application Total test cases needed % of reduced test cases 1 vs. 2 164 3 4 331 60 592 0% 2 vs. 3 668 2 2 331 60 592 0% Comparisons 3 vs. 4 4 vs. 5 1 664 0 0 0 0 2 331 0 60 0 592 100 % 0% 5 vs. 6 2 1 0 39 0 0 100 % The results in Table 7 show that, for this component and its changes, either all the API functions that are accessed by the customer need to be retested or none of them. In two of the revisions, there was no regression impact identified to the main ABB application at all, while in the other three versions, all of the APIs the customer code used were affected and needed to be retested. The second study was conducted on a 400 KLOC ABB application written in C/C++. This product uses a 300 KLOC internal ABB software component in library (.lib) files written in C. The full, retest-all strategy takes over four man months of effort to run. Five incremental releases of the component were analyzed and compared to study the 38 effectiveness of the COTS Firewall method at reducing regression test cases. The results of the study are shown in Table 8. Table 8. COTS Firewall, Second Study Results at ABB Metrics Total changed functions identified True positive ratio Affected exported component functions % of reduced affected exported component functions Affected user functions in the application Percentage of reduced affected user functions Total test cases needed Percentage of reduced test cases 1 vs. 2 388 99.46% 84 31.71% 38 17.39% 151 30% Comparisons 2 vs. 3 3 vs. 4 1238 4 98.39% 100% 122 1 0% 99.18% 59 1 0% 98.31% 215 11 0% 95% 4 vs. 5 13 100% 8 93.44% 6 89.33% 20 91% The first release was able to reduce the testing by 30% which was a significant reduction over the original testing, a full retest all, that was done on this release without the COTS Firewall. The second release had no reduction due to all of the API functions being affected by internal changes. For both of these first two releases there were a large number of internal changes to core functions. The final two releases had a significant reduction in regression test cases needed saving significant time over the retest all case. The third empirical study was conducted with the same ABB application that was used in the first case study. This application uses many different components, and this study looked at a different component within this application, specifically a three KLOC internal ABB software component. This component is a DLL file written in C. Four incremental releases of this component were analyzed with the COTS Firewall method. The results for this study are shown in Table 9. The results from this study show another example where either complete testing or no testing is required. The first and third comparisons showed no reduction in testing needed due to all the top level API functions being affected by the internal change. The 39 second comparison showed that no testing was needed, as none of the used API functions were affected. Table 9. COTS Firewall, Third Study Results at ABB Metrics Affected exported component functions True positive ratio % of affected exported component functions Affected glue code functions % of affected glue code functions Total test cases needed % of test cases reduction Actual regression failures found Regression failures detected by reduced test suite Comparisons 1 vs 2 2 vs 3 3 vs 4 45 9 44 100% 100% 100% 91.8% 18.4% 84.6% 2 0 2 100% 0% 100% 31 0 31 0% 100% 0% 1 0 0 1 0 0 The final empirical study was conducted on a 405 KLOC ABB application written in C/C++. This application incorporates 115 different internal ABB software components, of which 104 are .dll format and 11 are .ocx format. These components were written in C/C++. Four of these components were selected for study. Each is implemented in the Component Object Model (COM) [44], three of which are packaged in a DLL file and one is packaged in an OCX file. The results of this study are shown in Table 10. Table 10. COTS Firewall, Fourth Study Results at ABB Metrics Same linker? Affected exported component functions True positive ratio % of affected exported component functions Comparisons 1 vs 2 2 vs 3 3 vs 4 Yes Yes Yes 3 10 3 100% 100% 100% 42.9% 66.7% 75% % of test cases reduction Actual regression failures found Regression failures detected by reduced test suite 93.4% 1 1 97.6% 1 1 90.4% 1 1 These results show an average reduction in test cases of over 90% for these releases. In addition to the savings, one regression defect was found in each release by using this COTS Firewall. 40 The final results of the four empirical studies show that the COTS Firewall is effective in reducing the number of tests needed to retest the customer software due to changes in third party COTS components. There are some factors that limit the effectiveness of this method. The first limitation is shared with most other RTS methods and deals with the design of the component itself. If the component is highly coupled, even a simple change can have a very large impact on the rest of the component. This was the case in the components studied, where the test reduction was either 100% or 0%. Another factor which limits the effectiveness of this routine is legal in nature. Breaking down a component into its constituent parts and determining relationships between them could be considered reverse engineering, which goes against the End User License Agreement (EULA) that the COTS components are sold under. It is very important when using this method on 3rd party COTS software to work with the vendor when using this method [45]. 3.4 Deadlock Firewall There are other types of relationships, such as the data flow relationships in the Extended Firewall, which are not handled in the Traditional Firewall method. One of these has been an issue for user-configurable software systems at ABB, specifically relationships that lead to deadlock. Deadlock occurs when two or more processes request the same set of resources in a different order. Since these resources are held by processes which request additional resources, none of the contending processes can make progress in their activity. In the real-time software studied at ABB, deadlock can occur with many types of relationships. These relationships include the traditional case of tasks and semaphores, which are the processes and resources, respectively, as well as slightly 41 different cases dealing with message queues and task interaction, which is a specific case of a blocking system call in the software. Any place in the code where the system or application waits on a resource while holding another resource at the same time is a candidate for deadlock, depending on the order the resources were taken in. The concept of deadlock and its detection is not new, so only a quick overview will be presented. The first case to consider involves a system with two tasks and two semaphores, shown in Figure 5. It is arbitrarily assumed that task T1 requests and takes semaphore S1 before task T2. At a point later in time, T1 will request semaphore S2, and it does this after T2 has requested S2. After T2 requests S2, it requests semaphore S1 at a time after it has been taken by T1. Neither tasks T1 nor T2 can continue to execute because T1 holds S1 and T2 holds S2 and both are waiting for each others resource without releasing it. T1 needs S2, T2 needs S1, and so no progress can be made by either task. This is known as two way deadlock. This precise ordering of the events in tasks T1 and T2 are what cause the deadlock to occur. If the order in which the semaphores were taken was consistent throughout the software system, for example, always requesting S1 before S2, no deadlock could occur. This important ordering is why deadlock rarely happens at release, since the development team has a greater understanding of the usage of resources in the system. Once the software is released and maintenance begins, that system resource knowledge can be lost and regression defects injected. Figure 5. Example Deadlock Graph, Two-Way 42 Another example involves three-way deadlock which can occur in real-time software. The three-way deadlock graph is shown in Figure 6. Three tasks T1, T2 and T3 are involved as well as three semaphores S1, S2 and S3. T1 takes S1 first, and later requests S2. T3 takes S2 first, and later requests S3. T2 takes S3 first, and later requests S1. This is an example of a cyclical deadlock where each task holds one unique resource and is waiting for a different resource already held by another task. It is possible to have a k-way deadlock, where k = 3, 4, 5…n. In practice, this would be very inefficient to design and operate, and k would be limited to the number of tasks and semaphores, whichever is larger. Figure 6. Example Deadlock Graph, Three Way The real-time systems studied at ABB have shown that there are many other sources of deadlock that do not deal directly with tasks and semaphores. Deadlocks can arise from the use of message queues or other forms of task interaction via messages, such as signals, as long as any blocking calls are present. More generally, deadlock can occur whenever a number of processes share any type of global resources, assuming that the task waits for access to that resource. With message queues, deadlock can occur when the queues fill up. This is a condition very difficult to preclude or predict. A common example of this in software today is TCP sockets with blocking waits. These sockets have a sliding window buffer which, when full, will not accept more data, putting the process in a blocked state. Figure 43 7 shows a different scenario that will exhibit deadlock. If any message queue fills up, there is a corresponding task that cannot process an incoming message since it is stuck waiting to send to a filled message queue, which backs up the cycle and causes deadlock to occur. Specifically, if Q1 fills up, task T1 is unable to process incoming messages since it is blocked waiting for an outgoing message be sent. This exhibits the necessary condition for deadlock where a task must be both a producer and consumer of messages. Figure 7. Example Deadlock Graph with Message Queues Within the software studied, initial designs rarely contained conditions for deadlock, but frequently these deadlock conditions were the result of hasty revisions of the code without careful testing or analysis. This led to regression defects in test or in the field due to the changes in the software. This issue becomes even more important when studied in user-configurable software systems, such as those developed by ABB. These systems may not exhibit deadlock in thousands of runs of the software since this deadlock may be dependent on a specific configurable element to be running, or even a specific setting value being used. Unlike the firewalls discussed so far, which are testing firewalls, the firewalls for detecting deadlocks will be based on structural analysis, as opposed to actual testing. The structural analysis will consist of labeled graphs, an example of which is shown in Figure 8. This figure shows a 2-way deadlock situation with three tasks and three semaphores. 44 Figure 8. Example Deadlock Firewall Graph for a Modified Task Creating this firewall follows the same methodology as the previous firewalls. The first step is to create a Traditional Firewall. The analysis for deadlock starts at the changed nodes in the Traditional Firewall and determines if the change affects any use of shared global resources. If the change does not, then no Deadlock Firewall is needed. If the change does include a shared resource then it is considered affected and all users of it are checked for other resources that they can hold or take at the same time as this affected resource. This continues until all related dependencies have been mapped. If this uncovers a cycle then the cycle is checked to see if it contains the ordering issues that lead to deadlock. There were four major empirical studies conducted of the Deadlock Firewall as part of this research. These studies involved analyzing code from ABB’s current realtime systems product line, including real-time process controllers, communication modules, smart sensors, and data servers. The first step was to identify previously released software versions that contained deadlock. The code for the version with a known deadlock defect and the previous version were acquired and then the Deadlock Firewall method was applied to the two software versions. This Deadlock Firewall was then checked to verify that the deadlock defect was detected in the firewall analysis. 45 Since each of these software revisions did not have deadlock graphs constructed for the previous versions, they were created as part of the firewall process. The first study involved a communications gateway, which was an object-oriented software product. The release chosen for analysis contained changes to six source files. Within those six source files, two objects were affected. Each object had six methods changed. A firewall graph was constructed that showed the system without the new change. This is shown in Figure 9. Figure 9. Example Deadlock Firewall Graph, First Study Original This graph only shows the semaphores relative to the firewall constructed in Figure 9. It was determined by a Traditional Firewall that a code change added a semaphore to task T1, which now takes semaphores S1 and S2 in that order. The new semaphore, S2, was accessed by two callers before the change. After this original graph was created, it was updated to show the new semaphore dependency, which is shown in Figure 10. Figure 10. Example Deadlock Firewall Graph, First Study Changed 46 After the update was completed, the resulting deadlock graph was analyzed for any of the deadlock patterns that were identified before. Figure 10 shows that task T1 and task T5 form a cycle with semaphores S1 and S2. T1 takes S1 and then S2, while T2 takes S2 then S1. This shows a potential deadlock between tasks T1 and T2, which is the deadlock discovered by the customer in the released product version. The second study involved a real-time communications gateway, which is a procedural designed module. This release contained 31 files changed, including code changes to 72 functions. A firewall graph was constructed that shows the system before the change. It was determined by a Traditional Firewall that a code change added two blocking message queue calls on message queue A to task T1. These new message queues were added to the graph, which is shown in Figure 11. Figure 11. Example Deadlock Firewall Graph, Second Study The firewall analysis shows that task T1 now both sends and receives on message queue A. Since both the message queue send and receive operations are blocking, this situation leads to a message queue deadlock, as described in Section 5.4. This problem was detected in the field by customers in this software version and reported to ABB. In addition to this one defect, four other potential deadlock conditions involving message queues were identified in the system. These had not been seen yet in the field, but might have been in the future. 47 The third study involved another communications module, which was also a procedurally designed module. This release contained 22 changed files, which contained 41 modified functions. A firewall graph was constructed that shows the system before the change. It was determined by a Traditional Firewall analysis that tasks T1 and T2 now use a blocking message queue send. The new message queues were added to the graph; the results of the firewall are shown in Figure 12. T4 Message Queue C Message Queue A T1 Message Queue C T5 Message Queue D Message Queue B T3 Message Queue A T2 Message Queue B Figure 12. Example Deadlock Firewall Graph, Third Study The firewall analysis shows that task T1 sends a message to task T3 via message queue A. Then task T3 sends a message to task T4 on message queue B. After that, task T4 sends a message to task T1 on message queue C. This cyclical deadlock detected was the same deadlock detected by customers in the field and reported to ABB in this software version. The Deadlock Firewall also detected two additional deadlock cases in this module similar to the one found in the field. These were corrected before the software was released, preventing them from ever being found by the customer. The final case study involved using the Deadlock Firewall analysis on a new piece of changed software. This software had passed all of its unit, integration, regression, and systems tests and was certified as ready for release by the test department. This new software had 93 files modified, including 112 functions. The Deadlock Firewall analysis was conducted and yielded the firewall graph shown in Figure 13. 48 Figure 13. Example Deadlock Firewall Graph, Fourth Study This graph shows that task T1 sends a message to task T2 on message queue A. Then task T2 sends signal 1 to task T3 and then waits for signal 2 from T3. Task T3 sends a message to T1 on message queue B, and also signals back to task T2. This is a cyclical deadlock case using both signals and message queues. As soon as any one message queue is full, all three tasks will have deadlock. This deadlock was successfully detected by the firewall method, and a major defect was detected prior to release of the software. In addition, three other deadlock conditions were detected, two of the three dealt with message queue additions, as discussed in study 2, and the other dealt with a more traditional semaphore deadlock. These graphs are not shown here, as they are similar to the previously described studies. These empirical studies show that the Deadlock Firewall can be effective at detecting regression deadlock defects that make it out to customers in the field. In addition to detecting deadlock previously found in the field, this method was able to identify other deadlock dependencies that existed due to a change that were not yet found by the customers. The Deadlock Firewall model was a key addition to the firewall suite, as deadlock was a relationship that was not handled in existing firewall models. 49 3.5 Other Future Firewalls There are additional dependency types which can propagate impact to other areas of the system that are not covered by firewalls today. This section describes needed firewalls to address these dependencies and relationships. These firewalls will be useful in both traditional code based regression testing as well as in the Configuration and Settings Firewall. Until they are created, these represent limitations on the effectiveness of both the code based firewall suite and the new Configuration and Settings Firewall. There are many forms of impact that can occur from software changes [15], three of which are discussed in this section. User-configurable systems often face regression defects due to memory leaks. Since many software systems continue running for long periods of time, often only shutting down once or twice a year, even a very small infrequent memory leak can lead to a major customer defect. In the software studied at ABB, memory leaks were identified as a major cause of long term customer unhappiness. This unhappiness is partly due to these defects being difficult to detect, debug, and fix, leading to long fix times and recurring downtime for the customer. One key assumption is that all of the memory leaks must be based on code that is accessible to the firewall team. No third party or operating system memory leaks will be detected with this method. This Memory Leak Firewall represents work that needs to be completed in the future and is outside the scope of this work. An additional key dependency in user-configurable systems, especially in embedded systems, involves global variables. Current industrial practice treats global variables as a data dependency and uses the EFW to determine impact. This has not been 50 proven effective and is only done since no other firewall exists for global variables. Future research needs to be performed in order to understand and model the dependencies present when global variables are used. A final key dependency in user-configurable systems is performance. When code changes are made to a system the overall performance can be impacted. In this case, performance means the response time, cycle time, maximum load, throughput, and other quantifiable measures of the system limits. In software studied at ABB, regression defects of this kind are becoming more frequent. Besides traditional performance defects from code changes, latent software defects can be exposed from configuration changes which impact the performance of the system. The impact on other software areas from a performance change represents a dependency which is not covered in firewalls today. In addition, performance testing is often costly, so regression test selection will be very beneficial. Firewalls for these dependencies need to be created, but they are outside the scope of this research. 51 4. Configurations and Settings Firewall In order to address the problem of users changing system configurations and settings, which expose failures related to latent defects as well as traditional regression defects, an extension to current regression test selection methods is presented. Instead of looking at changes within the software code itself as the only source of impact within the system, this approach analyzes changes to the user’s configuration, including both configurable elements and settings, that determine the specific way the software behaves in that user’s environment This new analysis is conducted whenever the configuration or settings in the application change, matching the way that current RTS methods are applied to software for every code change. Latent software defects can exist in many different parts of a system. These include long data flow paths, where a configurable element’s specific action or output result is dependent on a value computed in a different configurable element, or in a code path internal to a configurable element that was previously dormant but, due to a change in the configuration, is now executed. It is also possible that latent defects were previously hidden from view due to either environmental, performance, or other configurable elements but are now exposed due to a change in the configuration. Additional types of changes that go beyond the configuration of the system can also expose latent defects. These include process changes, changes to the way a user interacts or interfaces with the system, changes to the hardware the system is running on, such as upgrading PCs, or changes to other software that runs on the same machine or network, including the operating system or other third party systems. These change types are not handled in this firewall method and will require additional research to address. 52 This new firewall method identifies the different types of changes that exist in the configuration, including both settings changes and configurable element changes, and selects specific code-based firewalls to model these changes from those listed in Chapter 3. The specific firewalls selected for use depend entirely on the types of changes that were made to the configuration. Since the code itself does not change, a set of differences in the configuration and settings must be determined instead. After these differences are identified, each change, including both changes to settings and changes to configurable elements, is mapped to the parts of the source code that represent it within the system. This step will be discussed in more detail in Section 4.2 and 4.3. Once this mapping is completed, the source code representing the difference is marked as changed and a selection of one or more of the code based firewall models is made. Each needed firewall is then created, using both the data present in the configuration as well as the source code and design documents for the system. After the selected firewalls have been created, a test selection or test creation activity is performed to cover the impact to the system identified in the model. It is important to be able to identify and differentiate between a setting and a configurable element when looking at the changes in a configuration. This classification of setting or configurable element must be done for each identified change between two versions of a configuration. Settings, for the purpose of this research, are defined as values that exist inside a configurable element which are visible and changeable by the user. Some settings changes can be made when the system is offline and execution has been stopped while other changes can be made while a system is online and currently executing. In effect, these settings and the specific values they hold resemble and act as 53 parameters in procedural code, or as attributes in object-oriented code, to the configurable elements they reside in. Similar to parameters and attributes, these settings can define the specific behavior of the configurable element such as a specific internal code path that is executed when a function or method is called or the return value that an internal algorithm computes when other objects call this element. Settings will be further discussed in Section 4.1, where settings changes are presented. Configurable elements, on the other hand, are defined as individual parts of the system that can be added to or removed from the system’s configuration. These elements are represented in the software as a specific grouping of code, and can be thought of as, and compared to, a class. In fact, the execution of a configurable element in a system acts just as a class does. Adding a configurable element to the configuration creates an instance of that class with its own settings and memory space, just as creating an instance of a class in code. Similarly, if a class exists in the code but is never instantiated, its code will never be executed. This is the same for configurable systems, where the code for the configurable element exists in the system, but if it is never added to a configuration, that code will never be executed. Even when an element does exist in the configuration, the possibility that it will actually be executed depends on the specific settings, events, or user interactions that are present during the execution of the system by the user. This is also similar to a class, where instantiated objects are only called in response to the occurrence of specific events in the system. From a testing point of view, making a change to a configurable element in the system’s configuration may add new code to the system that has the potential to be executed and can be treated the same as adding a new class to the code of the system. The specific details of the change itself also determine the 54 overall impact to the system. For example, adding a configurable element that has never been used previously in the system adds new code that has the possibility of being executed. Conversely, the configurable element added could have been used previously in other parts of the system. Either of those cases can lead to new failures due to latent defects being revealed in the system, but the specific impact and risk of each change are different. Configurable elements will be further defined and broken down when discussing configuration changes in Section 4.2. Before this new firewall can be used, a few key assumptions must be made. The first assumption is that all code usages and dependencies of an individual setting can be identified in the system. This usually requires the source code, the design documents, and the user’s specific configuration that is currently executing in the field. This first assumption is needed since a hidden use of a setting will propagate the impact to another area of the software and the impact analysis will be incomplete, potentially missing affected areas that may contain latent defects. A second needed assumption is similar to the first and requires that the code implementing a specific configurable element can be identified. Also, all of its interactions and dependencies within the system must be determined, usually requiring access to both the code and the running configuration. This second assumption is needed to address issues that arise when an unknown dependency exists between two areas of the system, causing the impact analysis to not identify an affected area. It is important to note that the first two assumptions both require source code, design documents, and the user configuration to be available for the analysis to be effective. The third assumption states that the focus of this new firewall is on detecting latent software defects and traditional regression defects that exist inside the source code 55 of the system that are revealed due to a change in the configuration by either a configurable element or setting value. All other latent and regression defects that are not affected or revealed by one of these changes are outside the scope of this research and method. Any errors in the logic of the configuration that the user creates or changes are also outside the scope of this research. Specific testing and regression impact due to change on the validity and correctness of the configuration itself are different research areas and not the focus of this work. Finally, the system should not be designed in such a way that any single change impacts the whole system. This means that a fully connected system, where every object or function is dependent or related to every other object or function, will not benefit from this or any other traditional RTS method, since the impact from any one change will propagate to all other areas of the system and require a complete retest. A detailed description of changes to settings and configurations is presented in Sections 4.1 and 4.2, respectively. Actual construction of the firewalls for these change types is shown in Sections 4.3 and 4.4. Section 4.5 will discuss the time complexity of using this firewall method. Finally, Section 4.6 presents some future enhancements that could be made to improve the efficiency and effectiveness of this method. Within the descriptions and constructions shown in the following sections, an example real time industrial control system will be referenced. This system is the same system initially described and used in the examples in Chapter 2. 4.1 Settings Changes A settings change is a change to a specific value that resides inside a configurable element that is both visible to and changeable by the user. Changes to these values can 56 sometimes occur without the need for recompilation by just changing a configuration file or by using a human system interface, such as a GUI interface. Since these changes can be made easily, users often overlook the possible risk that comes from changing settings values in a currently executing system. In addition, some settings changes can be made to the system while it is executing which can lead to serious failures due to latent defects being exposed in a running system. Because of this risk, all changes to the settings should be first done in a test environment using this firewall method to determine impact and to verify the absence or presence of latent defects. These values can describe the behavior that the configurable element will exhibit, similar to the way values supplied to an object can determine the behavior that object will exhibit. Specifically, the values supplied to a configurable element by its settings can affect the output of a specific method in the item which is called by other items up or down a calling path. A settings value could also affect the internal code path that the object takes in response to either a method call or event. Finally, a setting value could have no real effect on the system at all due to the internal usage of that specification inside the configurable element that contains it. An example of this deals with a change to a setting that either affects or is used by code paths that are not currently executing due to other settings or configurable elements in the system. Settings reside internal to configurable elements and changes to them can affect the internal operation of that element. The only way that a settings change can affect external configurable elements is through a data dependency as control flow dependencies are fully contained inside configurable elements similar to classes. Data flow dependencies can occur in two ways. The first dependency is very common in user- 57 configurable systems and occurs when the output of the configurable element is affected by the settings change and is either used as an input to another element connected to it or used as a parameter value to other function calls in the system. This forms the same kind of data dependency existing in traditional procedurally designed software systems, where all of the data is passed as parameter inputs to the next function. An example of a setting change affecting this kind of data dependency is shown in Figure 14. The other dependency type is less frequent and involves retained state information. This state information can exist as an internal state, such as previous state values, weighted averages, global variables, or data stored in a database. Figure 14. Example of a Settings Change Figure 14 shows a settings change occurring within the configuration of the system. This change to setting 1 is mapped to the code that represents it in the code which, in this case, is variable A. This variable is used to determine whether code path 1 or code path 2 is executed whenever method X is called. Before the change to the system, the configuration contained setting 1 with a value of ten. Setting 1 is represented in the code by variable A, so A’s value is also ten. The settings change modified the value of setting 1 to twenty, which means the internal variable A is now also twenty. Variable A is 58 used in the code to determine which code path, either path 1 or path 2, is executed whenever method X is called. This change to setting 1 leads to path 1 being executed in the new configuration, whereas code path 2 was executed in the previous configuration. In this example, code path 1 was never run before at the customer site and could contain a latent defect. This setting change impacts a data flow dependency, as it can change the output of the configurable element. One possible latent defect in code path 1 would cause the output of method X to be incorrect for specific input values passed into the configurable element. For settings changes, a Traditional Firewall is created first to model the change. This firewall is constructed by mapping the settings that have changed into the variables within the code of the configurable element that they reside in. Finding the code for configurable elements and settings is dependent on the implementation of the system and varies for each system. In general, it can be done by either expert knowledge, using design documents and code models if they exist, searching with a predominantly manual brute force effort, or by using some automated techniques recently created in the Information Retrieval and Program Analysis communities, such as [35, 36]. Once the code for the configurable element has been found, the specific variables are identified, and they are marked as code changes. Once all of the changes have been identified, a Traditional Firewall is constructed for each. The TFW is fully contained inside the code of the configurable element, with the exception of calls to helper functions, operating system calls, calls to third party components, or use of global variables. Settings changes can also affect data values that are computed elsewhere in the system if they are connected by a data flow dependency. Due to this, it is important to 59 determine if the changed setting impacts any existing data dependencies. If any of these dependencies exist in the code, the specific changed setting is checked to see if it impacts that dependency, either with a change to control flow or the value used in the data dependency. If the dependency is impacted, an Extended Firewall is created to determine the impact. Otherwise, the EFW is not needed. Determining the functions affected by either control flow or data flow is currently a manual process with some automated tool support. There are many promising research areas that are currently looking at and automating the impact analysis for both control flow and data flow, such as [36]. These techniques and tools will help automate the mostly manual analysis that this method needs, which should make it even more feasible for this method to be used in the future. In addition to traditional control flow and data flow dependencies, settings changes can impact other dependencies in the system. These dependencies include semaphore and other blocking calls, which could lead to deadlock, as well as memory allocations and deallocations, which could lead to a memory leak occurring, as well as performance changes and third party components. Each of these dependencies could be affected by the setting change and, as a result, they must be checked for. If any of these dependencies exist, a firewall for that dependency must be created. Checking for these types of impact in the code is accomplished in the same way as for code changes, specifically treating the changed settings as changed code. Once all of the various dependencies have been modeled with firewalls, the determined impact is used to select tests, if previous tests exist, or create new tests, if they do not currently exist. These tests must cover all of the impact identified by the TFW and any of the additional firewalls used, such as an EFW. 60 An example will now be presented using the control system example shown in Chapter 2. In this example a setting, whose value selects the specific input shaping function to execute on an input value, is changed. This function exists inside the same configurable element as the setting and shapes inputs connected to it by applying a translation function to them. Within the code of the configurable element, the specific shaping function is selected by a case statement that uses the value of the setting to determine the correct algorithm to apply. Applying this to Figure 14, the code uses variable A to determine the specific function to apply, which is based on the value of setting 1 selected by the user. In this example, no additional changes occur in the configuration and no code was changed. There are, however, existing code paths which have never been run by the customer in this way, any of which may contain latent software defects. Therefore, this settings change requires regression testing to determine if any latent defects are exposed by the change. Another example deals with an ERP system which was also discussed in Chapter 2. In these types of configurable systems, settings are usually represented as parameters, configuration files, and database values that are passed into or used by the various configurable elements of the system. These configurable elements exist in code libraries containing objects and functions that are needed to perform the user’s action. The settings themselves are used within the ERP libraries in a similar way as the settings in a control system are used. This includes usages where ranges of values are treated differently, specific events or values trigger defined responses by the system, and many options are available to select that specialize the general solution provided. The specific way the two systems are configured, graphically for the control system and programmatically for the 61 ERP system, does not affect either the use of this method or the validity of its results. It only affects the technical details as to how to compare two configurations and how to map the settings into the code. Settings changes usually involve a smaller impact then that of an added configurable element. The impact of the settings change is completely internal to the configurable element it resides in, with only system calls, data dependencies, performance dependencies, memory dependencies, and global variables having an impact outside the boundaries of the configurable element itself. In addition, any dependencies that do cross the boundary must be impacted inside the configurable element by the setting change. In the worst case, a setting change could affect an output that is used by the entire system, thus requiring a full retest of the entire system, but this is extremely rare as you would need either the setting change to affect code that is fully coupled to code in every other configurable element in the system, or to build a configuration that has a configurable element that is connected to every other configurable element in the configuration. Either of these cases is very unlikely and if they did exist, a full retest would be required as the impact would propagate to every part of the system. In practice large distributed control systems monitor many smaller independent control areas and would never be fully connected to any one configurable element in the system. 4.2 Configuration Changes Configuration changes, unlike settings changes, deal with changes to configurable elements which either add them to or remove them from a preexisting configuration. These configurable elements are internally represented by the system as classes or functions. The specific code implementing the logic for these configurable elements must 62 exist somewhere inside the system, usually in library-like constructs. Even though the code always exists inside the system, until it is used inside the configuration it will never be executed. Placing a specific configurable element inside a configuration creates an instance of that item inside the system in the same way that declaring an object in code creates an instance of that class. Each configurable element contains zero or more settings that can control or impact its execution. It is possible for a configurable element to contain no settings at all, as its behavior is not dependent on any other values or inputs. Changes to a configuration can be categorized into different change types where each type involves different steps and can have different impact on the system. The first type of configuration change involves adding a new configurable element that does not exist elsewhere in the configuration. In this case, the code for this element existed in the system but it was not possible for this code to run, as no instances of it had been used in the configuration before. This type of change requires the most testing, as it has the highest potential risk for new failures from latent defects due to the large amount of unexecuted code now present in the configuration. Using the control system example, adding an I/O module with connected data points to the system, each of which is represented in the system as a configurable element, would represent a configuration change. I/O modules are external physical devices which act as bridges between the controller device and the physical measurement devices, such as thermocouples. These I/O modules read in values, format or convert the data, and transmit it back to the controller. If the added configurable element which represents the I/O module was not used elsewhere in the system, then the change is classified as adding a new configurable element. This change would allow the execution 63 of a new set of code within the system. This code, while previously present in the system, can be considered new since it is now possible to execute it. Similarly in an ERP system, adding a new business process to an existing system would represent the addition of a new configurable element, as these processes are composed of configurable elements represented by functions stored in libraries. Each business process may contain very specific configurable elements, leading to many changes where previously unused configurable elements are added. A second type of configuration change involves the addition of previously used configurable elements into the customer’s configuration, many times with different settings than other instances of it have used before. This is the most common change type encountered in the field, as users often extend a system by adding more of the configurable elements they have already used in their configuration previously. Since some of the code that represents the configurable element has run before in this configuration, there is a lower risk of failure from latent defects, as there is less new code to run. It is important to determine how different this new instance of the configurable element is from other instances. Therefore, these previously used configurable elements that are added must have their settings compared to the previous instances that exist in the configuration. If the settings are exactly the same, then the configurable element itself and all external callers just need to be checked. For a function to be checked, it must have some very basic tests executed on it. If there are only a few differences in settings between the newly added element and the previous instances, a Settings Firewall is created to check which internal code paths and data values of the element are affected. 64 This includes checking to see if the settings are involved in any data dependencies, which require an EFW, or any other dependencies that would propagate the impact to areas outside of the configurable element itself. If the settings in the new instance are completely different then the settings in the previous instances, the whole element is considered completely new and must use the firewall for adding new configurable elements. Figure 15. Example Configuration Addition In the control system example, adding a previously used configurable element, in this case an I/O module, to the configuration represents this type of configuration change. This is shown in Figure 15, where two previously used configurable elements are added to allow input and output of new values to be calculated. This type of change happens frequently, as customers often add additional data values to existing configurations which they did not know would be important until the plant was running. The specific settings in the new instance must be compared to the settings used in previous instances. If the settings are very different, it will be necessary to fully test the added configurable elements, since new code paths may be present which contain latent defects. The final type of configuration change is removing configurable elements. This is the least common type of configuration change, as customers rarely remove pieces of their previously running configuration, usually only when they have to replace it with a different configurable element type. This action does not allow for any new code to be executed and in fact removes code from within a previous executing block. As a result, the objects or functions directly related to the removed item, either within the code or due 65 to the configuration, are all marked as changed and firewall models are constructed around them. These areas are all retested for both latent defects and regression defects, looking closely for changes in data paths and values. In this case it is very important to remember the assumptions with this firewall. The goal is to detect latent defects in the code, not defects in the configuration. The details of how the Configuration and Settings Firewall is created for each case will be described more completely in Section 4.4. The remainder of this chapter describes the algorithms for creating firewalls for each of the types of configuration and setting change. Section 4.3 will detail the construction of the Settings Firewall, used for settings changes. Section 4.4 will present constructing the Configuration Change Firewalls, one for each type of possible change. 4.3 Constructing a Firewall for Settings Changes Settings changes require a firewall to be created for each change within the user’s configuration. Constructing firewalls for settings changes involve following a set of steps, the totality of which define the firewall creation process. The full process is shown in Figure 16, where each process step is represented by a circle, each valid transition is represented by an arrow, and any specific conditions that must be true to take a transition are listed as labels on the arrow. Initially, customers have a previously created and tested configuration running in their environment. The customer decides to make a change to the configuration involving one or more settings values. A copy of the current configuration is created, the changes are made, and the new configuration is saved. Once the changes are made, the next step depends on the access the user has to the internal system source code. If the source code is available then the user can create the Settings Firewall directly. If not, then the user 66 must send both the original and changed configuration to the software vendor for analysis and testing. These examples and steps assume the vendor is doing the analysis, as many systems do not make the source code available for the users of the system. In the case where the source was available to the user, the steps to create the firewall are the same. Figure 16. Process Diagram for a Settings Change 67 Once the software vendor receives the configurations, a difference between the two configurations must be determined. The specific way that two configurations can be compared to one another depends completely on the details of the specific system being used. At a high level, this step is the same as computing a difference between two source code files. There are two primary ways that configurations are presented to the user. The first represents configurable elements and settings graphically, using pictures and lines to represent the elements and relationships, as well as GUI windows to display the setting values contained inside the elements. The other way that configurations can be represented is programmatically, where functions or objects are the configurable elements and parameters represent the settings. The purpose this step is to determine the executable changes that exist within the settings of configurable elements inside the two separate configurations. This step has the same purpose as code differencing in the Traditional Firewall. Identifying the differences can be accomplished by either a text based differencing tool, looking for changes in variables, parameters, or files, or by using a custom tool provided from the software vendor, looking for differences in values contained inside the configurable elements. Determining if a change affects execution is important as some changes, such as element names and comments, do not have any effect on the system and will not expose latent defects. This determination can be hard and will require analyzing the source code of the configurable element to see how that setting is used. All changes that do affect execution are added to a list. A detailed example is presented later in this section that describes these steps for a specific control system example. 68 After the specific settings changes are identified, the source code representing each must be identified. This step is also dependent on the specific system and how configurable elements and settings are implemented within the source code. In a programmatic system, the setting values are contained in parameters or configuration files that are passed into the system at a specific time or in response to a defined event. For these types of systems, finding the users of settings involves tracing the file or parameter from its input, usually a file or database, to its usage in the code. If the system is graphically configured, a similar traceability is conducted from the GUI window into the source in the system which uses the settings values. In either case, settings are contained in configurable elements, and the variables themselves reside in the code implementing the configurable element. A detailed example of mapping the values inside a graphical configurable element to the source code that uses them is presented later in this section. Once the code using the changed setting is identified, it is marked as a code change. Each area of source marked as changed requires a Traditional Firewall to be constructed. Creating the TFW for these changes is the same as creating one for a real code change, following the steps listed in Chapter 3.1. For a settings change, all of the impact identified by the TFW is contained inside the configurable element itself. In addition the TFW, analysis must be done to determine if other dependencies exist which are impacted by the setting change. The main dependencies to look for include data dependencies using the changed setting, blocking calls that are affected by the change, and any third party components that might be affected. Other dependencies include changes in memory allocations or deallocations, as well as any changes that might impact 69 the performance of the system as a whole. Each of these dependency types are searched for and any identified require the corresponding firewall model created. It is important to understand that some relationships and dependencies between configurable elements themselves and also with system functions are created dynamically when the configuration is loaded. These dependencies are really only dynamic when looking at the code since, once loaded, they remain static throughout the entire execution of the system. As a result, the configuration itself must be used when identifying relationships in the needed firewall models. After all the different types of impact have been determined from the created firewalls, tests must be created or selected from previous testing which cover these impacted areas. If tests exist already for an area, they can be reused. These tests can come from many places including previous testing activities for that user, testing completed for other users which would work for this current user, and tests that were completed for product release testing. Various coverage and other test completeness measures are useful here as they can determine the area of the system that the test executes. If previous tests are not available for a specific impact, these tests can be created to cover that new area completely. Once the tests have been completed, they are executed on the system to determine if any latent defects were exposed from the change. In order to show the details involved in creating a firewall for settings changes, an example on a real system will be shown. This example system is the control system discussed in Chapters 2 and 4 and follows the steps shown in Figure 16. This specific system uses a graphical representation of configurable elements which are called function codes. These function codes are inter-connectable blocks of logic that, when joined 70 together, form a solution to the specific controls problem the users of the system need. Each function code contains a number of different settings, some of which can affect the actions, values, and events produced. Figure 17 contains a screen capture of a GUI window showing all of the settings that exist inside a specific function code, along with their current values. Any changes to these settings will require the creation of a Configuration and Settings Firewall. Figure 17. Example List of Settings Making changes to these settings for a graphical function code involves using the GUI window shown in Figure 18. Once this window is open, clicking on any setting will open an additional GUI window specific to the setting clicked. An example of this is shown in Figure 19. This window shows the values allowable for a specific setting and the value can be changed by clicking on any of the options presented in the menu. It is also possible to change the settings value directly by replacing the number or string in the 71 Value column in Figure 17. Settings can only be changed if the user has the permissions required to change the system. Once all of the desired settings changes have been made, the new configuration file is saved and both new and previous configurations are sent off to the vendor of the software for analysis. Figure 18. Example Settings Change GUI When the vendor receives the two configurations, the set of changes made between the previous and the new configurations must be determined. Since the new configuration was saved as a separate file, it can be compared to the previous configuration file. The details on how to actually difference two configurations varies depending on the system itself. For an ERP system, the configuration of the system is often done programmatically by configuration files, database entries, or even writing glue and wrapper code that uses and customizes the specific library functions the system provides. Within that system, settings can either reside as parameters passed into the library functions or objects used, or they exist in configuration files read into the system 72 at setup. As a result, both the code files and the configuration files are compared to the originally running configuration using a text differencing tool in the same way the current code-based firewalls are compared. In the control system example, the system itself supports identifying all the changes that were made between two configurations by using an application that is delivered with the software. An example difference between two configurations is shown in Figure 19. Figure 19. Example Difference of Two Configurations, Settings Now that the differences between the new and old configuration have been identified, a mapping must be made between the changed settings and the internal code representing these settings. Usually, settings are represented internally as variables or attributes inside functions or objects. In an ERP or other programmatically configured system, this mapping is easy as the values are passed into the system as parameters or in specific configuration files on the disk. However, control systems and other graphicallyconfigured systems require more analysis for this mapping as the code that implements each graphical element must be determined. Some system knowledge, documentation, or code searching must be available to support this analysis. For the control system example, each function code is well defined in the source code, including the variables 73 and attributes representing the specifications. In addition, there are well defined functions that extract the settings from the file and assign those values to the attributes in the function code that uses them. Since the code is so well defined, simple searching for the number of the function code and the setting number from the GUI window is effective. A code example showing the internal variables used to hold the specification values is shown in Figure 20. Figure 20. Example Settings Code Definition Once the code representing the settings has been identified, each one is marked as a code change. Each function using these changed variables in the code is marked as affected. Once this is complete for every change, a TFW is constructed around each function identified. After the TFW is complete, analysis is conducted to determine if any additional forms of impact which go beyond simple code flow dependencies are present in the changed functions. Since settings changes in the system deal with variables, each function that is affected must be checked to determine if a data dependency exists 74 between any outputs of the function and the changed variable. If any such data dependencies exist, it is marked as affected also. Once complete, an Extended Firewall is constructed for each dependency, using both internal source code and the configuration file itself to determine the impact. The configuration file is needed in order to resolve the dynamic linking of configurable elements into static relationships which can be used for this analysis. These relationships are considered static as the configuration remains the same throughout the execution of the software. Another possible dependency involving settings changes impacting blocking calls. It is possible that a setting value selects a specific action for the configurable element it resides in to perform. If that changed action involves code that takes any semaphores or performs any blocking calls, either now or before the change, a Deadlock Firewall must be created. Similarly, if the setting change affects the creation, use, or deallocation of memory, then a Memory Leak Firewall must be constructed. Each of these firewalls are created in the same way as they are for a code change, starting from the functions that were marked as changed and then analyzing each dependency affected by that change. In this control system example, a data dependency is found and an Extended Firewall graph is created. This graph is shown in Figure 21 and shows that one of the settings changes was involved in two data dependencies with other functions in the system. 75 Figure 21. An Example EFW from a Settings Change In this figure, functions are represented as circles and calls are marked as arrows. Function A is involved in a data dependency with Function C using the changed specification in the data path. Function B is in a data dependency with Function D and also uses the changed specification in its data path. Since both data dependencies use the changed specification, they are both considered affected and require retesting to check for the absence of latent defects. The final step uses the impact identified in the firewalls constructed so far to select or create additional tests needed to verify the system works correctly and does not contain any latent defects. If tests exist from previous customers or testing, they can be selected and reused, otherwise new tests need to be created. These tests must cover all of the affected areas and determine the presence or absence of latent defects in these affected areas. Once the tests are created, they must be run on the customer’s changed configuration. If the tests detect any defects, they should be corrected and a new version of the system can be sent out to the customer. Otherwise, the results of the testing can be sent back to the customer, allowing them to load the new configuration into their system and run it in their environment. 76 4.4 Constructing a Firewall for Configuration Changes Configuration changes require one or more firewalls to be created each time they change within the user’s configuration. Constructing these firewalls follow steps similar to those described in the previous section for settings changes. The main differences between settings changes and configuration changes are the size and reach the impact has into the surrounding system and the general scope of the impact and testing needed. The main steps in creating a firewall for a configuration change are shown in Figure 22. Just as the process diagram for settings changes, each of the process steps is represented by circles and the figure is followed by a more detailed description of each step. The process starts at the same point that the settings process did, namely a previously created and tested configuration running in the customer’s environment. The customer then decides that a change to the configuration is needed and adds or removes a configurable element from it. A copy of the current configuration is created, the changes are made, and the file is saved as the new configuration. As with settings, the next step depends on the system itself and the access the user has to the internal system code. If the code is available to the user, they can follow all of these steps themselves. If the code is not available to the user, then they can send the original and changed configuration to the vendor company for analysis. This description will assume the vendor is doing the analysis, as many systems do not make the source available for the users of the system and as a result, the user will send the original and new configurations to the company. 77 Bl oc A f kin fe g ct C ed al ? ls y or d? e m te M fec Af Figure 22. General Process Diagram for Configuration Changes Once these configurations are received, the difference between them must be determined. Only identify executable changes between the two configurations, as other change types do not affect execution and do not require testing. Determining the difference between the two configurations can be accomplished by either a text based differencing tool, looking for added method or function calls, or by using a custom tool provided from a software vendor, looking for added or removed configurable elements. Determining if a change is executable is important and may require checking the source 78 code that implements the specific settings that change. Any changes that do not affect the execution of the system are ignored, similar to the way analysis for RTS methods ignore comment changes within source files since they are not executable changes. A few detailed examples of configurable elements, both addition and removal, are presented later in this section which describes these steps for a control system. Once the changes to the configurable elements have been identified they are categorized as one of three types of changes: adding a new configurable element which does not exist elsewhere in the configuration, adding a previously used configurable element to the system, or deleting a configurable element. Each configurable element added to the system must be mapped to the underlying source that executes when that item is used. This code is marked as changed and added to a list. In addition, all relationships that exist in the configuration from the added configurable element to other configurable elements are marked as changed, and the code for each of these configurable elements is marked as changed and added to a list. For all removed configuration items, the configuration must be checked to see what dependencies to other configurable elements were affected by the removal of a specific configurable element. The code for these configurable elements is marked as changed and added to a list. Once the list of configurable element changes is complete a Traditional Firewall is created, just as it was for a settings change. Each item that was added to the list of change code requires that a Traditional Firewall be created. Besides the TFW, some additional analysis must be done to determine if there is other impact from this configuration change to the system as a whole and to determine what that impact is if it exists. The main types of impact that need to be looked for when dealing with a 79 configuration change include data flows using the changed configurable element, new blocking calls that exist within the source, new memory allocations or deallocations, as well as any performance impact the added or removed configurable elements may have on the system as a whole. Each of these additional forms of impact is searched for within the code marked changed and if any are identified, they are added to a list. Each element on that list will need a corresponding firewall model associated with it. Once all of the additional impact types have been identified, the specific firewall model that addresses each one is created. Similar to other firewalls, each area of impacted identified by the firewalls needs tests to be selected or created which cover them. If tests exist already for an area, they can be reused. These tests can come from many places including previous testing activities for that user, testing completed for other users which would work for this current user, and tests that were completed for product release testing. Various coverage and other test completeness measures are useful here as they can determine the area of the system that the test executes. If previous tests are not available for a specific impact, these tests can be created to cover that new area completely. Once the tests have been completed, they are executed on the system to determine if any latent defects were exposed from the change. Now that the general steps have been defined for a configuration change, a few example firewalls will be constructed for each of the possible types of configuration change. Section 4.4.1 will involve constructing a Configuration Firewall for a change involving new configurable elements, 4.4.2 will show creating a Configuration Firewall 80 for previously used configurable elements, and 4.4.3 will create a firewall for removing configurable elements. 4.4.1 Constructing a Firewall for New Configurable elements Constructing a firewall for the addition of new configurable elements into a configuration modifies and further refines the general process, shown in Figure 23. The main difference between the process shown in Figure 23 and the generic process shown in Figure 22 is how the data needed to create the TFW is gathered. For a new configurable element firewall, both identifying all connected configurable elements in the configuration and its source code location are needed. These two steps run in parallel with both acting as input to the TFW. These two actions are shown inside the box in Figure 23. An example system will be used to show the details of how this method works. The example system is the same one used in the Settings Firewall example in Section 4.3, but instead of showing settings changes, this example will involve adding new configurable elements into the system. An example configuration for this system is shown in Figure 24. In that figure, the different rectangular shapes are configurable elements, or function codes as they are called in the system, and the arrows represent the relationships between these elements, called connections. A function code is a graphical unit that represents code in the system that performs a specific function. These function codes are linked together with arrows that describe a specific relationship between them. An example relationship involves an external temperature input function code connected to a thermocouple conversion function code which takes the value from the thermocouple and converts it into a temperature value. The thermocouple translation function code is 81 then connected to a display function code which allows the plant operators to view the value from the control room. Any new function codes added that do not exist elsewhere y or d? em te M fec Af Bl oc Af kin fe g ct C ed al ? ls in the configuration will require this firewall to be created. Figure 23. Process Diagram for a New Configurable Element In this example, the change involves adding a new function code to the diagram in Figure 24. The specific function codes added do not exist elsewhere in this configuration. Adding these function codes to the configuration is accomplished by graphically adding 82 the function code to the configuration shown in Figure 24. Once these function codes are added, it can be connected to any other configurable elements that are related to it in some way. These connections represent many types of relationships such as a data relationship, a logical relationship, or even a physical relationship. In addition, these relationships can be in a single direction, denoted by an arrow at the end of the connector showing the direction of the relationship, bi-directional, shown with an arrow on both ends of the connector, or an association, where the two blocks are able to access data within each other, shown with no arrows on the connection. After the changes have been added, the configuration is saved in a new file. Figure 24. Example Configuration Once the new configuration is complete, the changes between the new and old configuration must be identified. This is done by comparing the original version and the new version of the configuration to each other. Just as in the Settings Firewall in Section 4.3, this control system example supports a tool that reports the differences between two 83 configurations. Each function code type that is added to this configuration must be checked to see if it already exists in the system elsewhere. If it is completely new or needs to be treated as completely new, it is included in this firewall construction. If it previously existed, it will be included in the firewall shown in Section 4.2.2. The differences from this new function code configuration example are shown in Figure 25. This differencing tool shows the new function codes that were added to the configuration. Figure 25. Example Difference of Two Configurations, Adding Now that the newly added function codes have been identified, a mapping must be made between these new function codes and the internal code that implements their functionality. Usually configurable elements, such as the function codes in this example system, are represented internally as objects or library functions stored in a set of common components that are loaded into the system when needed. In an ERP system, configurable elements are usually functions or objects and mapping them to the code is easy since the interface to these functions will be visibly called from the user code that implements their solution. For control systems, which are graphically configured, this mapping may not be as intuitive. For this specific control system example, each function code is well defined in the source code, containing a list of all the methods, variables, and interfaces that exist within it. These interfaces usually just get assigned by the configuration to connect the function code with whichever function code has been 84 connected to it. In addition to the source code of the added configurable element, the relationships between that item and any other items must be analyzed. Each item connected to the changed item will have its source code marked as changed also. A code example showing the details of a function code is shown in Figure 26. Besides determining the source code that implements the functionality of the configurable element, all connected function codes must be identified. This is accomplished by checking the configuration file for these relationships. Figure 26. Example Source for a Configurable Element A Traditional Firewall must now be created, treating each new function code added to the configuration as code changed. In order to determine the relationship these new function codes have to the rest of the configuration, both the internal code and the configuration must be analyzed. As a result, creating the TFW becomes more 85 complicated, as configurations set up dynamic calling relationships between the function codes themselves, as well as system calls, when the configuration file is first loaded. Internal state variables, semaphores, and other relationships that exist statically must be included in the firewall, as well as all dynamic relationships that exist from the configuration connections. These connections link the calling function, shown graphically in Figure 24 as the input and output lines in the blocks, to the functions they need to access. Internal to the code, these connections are usually just addresses which get assigned when the configuration file is loaded into the system. Once these relationships are identified and understood, the TFW can be created. Once the TFW is complete, analysis is conducted to determine if any additional forms of impact beyond code flow are present in the changed functions. Since adding new function codes to the system allow new code to be run, every function and variable inside the function code must be checked to determine if a data dependency exists between any outputs of the function code and the external existing function codes in the system. If any data dependencies exist, an Extended Firewall is constructed. Similar analysis must be performed for each additional type of dependency discussed in Chapter 3. If any of these forms of dependency are present, the corresponding firewall must be created. Just as with TFWs, these firewalls must take into account both the source code dependencies as well as the dynamic dependencies that are created from the configuration. Now that the firewalls are created, tests must be selected or created to both exhaustively test the newly added function code and all related configurable elements and source code identified. These tests need to be run on the customer’s new configuration 86 and should focus on the specific changed and affected methods listed in the firewalls. Some tests for these areas may exist from other customers and can be reused to some extent. 4.4.2 Constructing a Firewall for Previously Used Configurable elements When the user’s configuration change contains the addition of a previously used configurable element, a slightly different firewall construction is used. Constructing this firewall follows the same general process shown in Figure 22, with some small changes which are shown in Figure 27. First, the settings values of the new instance of the configurable element need to be compared to all the other instances in the configuration. The one with the fewest differences is recorded. If all settings are different, then the element is considered as new and the firewall in Section 4.4.1 is created for it. If none of the settings are different, then only a few tests are created to check its basic behavior. Finally, if only some of the settings are different then a Settings Firewall is created for each and the results are aggregated together into a TFW. These changes are shown in the box in Figure 27. The same example system used in Section 4.4.1 will be used when describing the construction of this firewall. For this firewall, the user adds a new instance of a function code that has other instances elsewhere in the configuration. Instead of completely new function codes being added, which enables large amounts of new code in the system for execution, such as those shown in Section 4.4.1, these changes add an instance of an element that has already executed inside the customer’s system and configuration. Since this code was already enabled for execution in the system, the main source of latent defects are from differences in the settings values between the instances themselves. 87 Pe rfo Im rman pa ct? ce y or d ? e m te M fec Af Bl oc Af kin fe g ct C ed al ? ls y ta Da denc n ? pe cted e D ffe A Figure 27. Process Diagram for a Previously Used Configurable Element Creating a firewall for this type of change follows the process steps shown in Figure 27. Once the user adds a new instance of a previously used configurable element, the configuration is saved as a new version and sent off to the vendor for analysis. Once 88 received, the differences between the two configurations must be determined. Each new instance of a previously used configurable element is identified and added to the list of changes. After all of the new instances of configurable elements have been identified, each must have its settings values compared to every other instance in the system. The instance with the smallest number of differences is recorded. If none of the settings values are different, then a few simple checking tests are written and executed. No additional testing is required. If all of the settings values are different, then this instance should be treated as a new configurable element and the process shown in Section 4.4.1 is used. Finally, if only some of the settings values are different, then Settings Firewalls are created for each of the different settings following the steps shown in Section 4.3. In addition to determining the differences between instances of the configurable element, a mapping is made between the new configurable elements and the source code that represents it, as well as determining what other configurable elements and system calls are connected to it in the configuration. For this specific control system example, each function code is well defined in the source code, containing a list of all the methods, variables, and interfaces that exist within it. These interfaces usually just get assigned by the configuration to connect the function code with whichever function code has been connected to it. In addition, each relationship to any other function code must be identified and the code for those function codes will be marked as changed also. This mapping is the same as for new configurable elements described in Section 4.4.1. Once all of the source code and configuration dependencies have been identified, a TFW is created. As a result, creating the TFW becomes more complicated, as 89 configurations setup dynamic calling relationships between the function codes themselves, as well as system calls, when the configuration file is first loaded. Internal state variables, semaphores, and other relationships that exist statically must be included in the firewall, as well as all dynamic relationships that exist from the configuration connections. These connections link the calling function, shown graphically in Figure 24 as the input and output lines in the blocks, to the functions they need to access. Internal to the code, these connections are just addresses, and they are assigned by having the calling function in the first function code get the address assigned to it from the other end of the connection. Once the relationships are understood and modeled, the Traditional Firewall can be completed, stopping one level away from each affected function. Now that the Traditional Firewall is complete, analysis is conducted to determine if any additional dependencies besides code flow are affected by the added instances. Each created Settings Firewall must be checked for data dependencies which might exist between any outputs of the new instance of the function code and the external connected function codes in the configuration. If any data dependencies exist, an EFW is constructed. Similar analysis must be performed for each additional type of dependency discussed in Chapter 3. If any of these forms of dependency are present, the corresponding firewall is created. Finally, tests must be selected or created to test the function codes in their new use. The testing for these reused function codes can be further reduced, when compared to the new function code case, if the settings that control the way the function code operates are the same in the new use as in the old use. If the settings are the same, then the function code itself only needs to be quickly checked, and more exhaustive testing 90 will be performed on the interfaces to the existing function codes in the system. If the settings are different, a Settings Firewall can be made for each changed setting to further reduce the testing. 4.4.3 Constructing a Firewall for Removed Configurable elements Constructing the firewall for removed configurable elements differs slightly from the other cases. It follows the general process steps in Figure 22, but the details of some of the steps have changed. Changes of this type either involve removing individual configurable elements and connecting the surrounding system back together around it or removing a whole connected part of logic completely. The first type, removing an individual element, is more common. An example of this involves the changing of a field device to a new field device which now has the ability to convert its own thermocouple value to a temperature directly. The user would remove the function code that converts the temperature currently from the configuration and connect the input value function code to the display function code directly. The removal type involves the removal of an entire logical part of the system completely. An example of this would be the removal of an old processing line from the system. If the physical plant shuts down a line, the configuration will be updated by removing all of the logic for that line. For both of these cases, it is important to remember that the aim is to detect internal software defects, such as parameter problems or invalid calculation. It is not meant to address the problem of user configuration defects, such as whether the process remains stable with the new configuration. Each time a configurable element is removed from the system a firewall of this type must be created. 91 Determining the differences in the configurations uses the same system tool as shown in the previous examples. The new configuration will have some function codes removed and some connections changed. The changed connections are the elements that are considered changed, and the function code at each end of them is marked as changed. After all the impact to source code mapping has been completed, a Traditional Firewall is created. In order to determine the impact the removal of these function codes had on the rest of the system, both the internal code and the configuration must analyzed. Again, creating the calling tree becomes more complicated for graphically configured systems, as the graphical relationships represent dynamic calling relationships between the function codes that get loaded at runtime from the configuration. Internal state variables, semaphores, and other relationships that exist statically must be included in the firewall, as well as all dynamic relationships that exist from the configuration connections. Since the removal of a certain function code could change the data being used in other connected function codes, each function code that used to connect to the removed one will be considered changed. Once the relationships are understood and modeled, the Traditional Firewall can be completed. Next, analysis is conducted to determine if any additional forms of impact beyond code flow are present in the changed functions. Since adding new function codes to the system allow new code to be run, each new function inside that function code must be checked to determine if a data dependency exists between any outputs of the function code and the external existing function codes in the system. If any data dependencies exist, an Extended Firewall is constructed. Similar analysis must be performed for each additional type of dependency 92 discussed in Chapter 3. If any of these forms of dependency are present, the corresponding firewall is created. Finally, tests must be selected or created to test the function codes in their new use. The testing for these reused function codes can be further reduced, when compared to the new function code case, if the settings that control the way the function code operates are the same in the new use as in the old use. If the settings are the same, then only the function code itself needs to be quickly checked, and more exhaustive testing will be performed on the interfaces to the existing function codes in the system. If the settings are different, a Settings Firewall can be made for each changed setting to further reduce the testing. 4.5 Time Complexity of the Configuration and Settings Firewall In order for this firewall to be successful, it must be both effective and efficient when used in industrial practice. The effectiveness of this firewall is measured by its ability to detect defects exposed when real customer configuration changes are made, shown in Chapter 6. The efficiency of this firewall is shown by analyzing the time required to use this method for each customer change in the field. This time is described in the following equation, Te = At + Ta, where At is the time required to complete the firewall analysis, and Ta is the time needed to test the impact identified by the analysis. Traditional regression test selection (RTS) methods analyze efficiency by taking Te and subtracting it from the originally needed test time, To. This difference represents either the time savings or time lost when using this method, depending on whether the resulting difference is positive or negative, respectively. This new firewall method is different from traditional RTS methods since, in general, no testing currently occurs 93 when customers make changes to their configuration change, as the customer assumes the software is fully tested and contains no defects. Because of this, using the new firewall only adds testing time for each customer configuration change. A small number of customers, who run either critical applications or configurations that revealed defects with previous changes, may do some form of black box testing based on their past data and understanding of the software when they make changes. In effect, these customers guess at the impact of the change in a somewhat directed way. With these customers, their original testing becomes To and the time savings or time loss can be determined, but the number of customers in this category is very small. Since using the Configuration and Settings Firewall in practice will, in most cases, result in additional time for both the analysis and testing the impact of the change, using it must be made as efficient as possible. This will limit the overhead imposed on both the customers and the company developing the software, who must work together to test the software for each configuration change. The two main components of time that arise when using this firewall are At, the time it takes to perform the analysis required, and Ta, the testing time identified by the impact of the change on the software. Ta depends on the size of the impact identified in the analysis, which is influenced by the code dependencies which exist in the implementation of the program under test and the setup and execution time of the tests themselves. The accuracy of the impact analysis is shown in the empirical studies in Chapter 3 for the individual code-based firewalls, and Chapter 6 when applying this new firewall to customer changes. Issues arising from the implementation and the overall analysis time will be discussed in the remainder of this section. 94 The analysis time of this new firewall method can be described by the time complexity of the algorithm used. The algorithm itself takes each change, either a setting or a configurable element, and determines the impact on the system from that change by creating one or more firewalls. More formally stated, the algorithm takes each changed element E from the set of elements that make up the configuration C and creates a Traditional Firewall for it. Creating the TFW involves finding all control flow paths to and from E and marking them as changed. The propagation of the change expands out exactly one level for each control flow dependency, so this analysis can be done in linear time based solely on the number of control flow dependencies present. Once the TFW is created, the algorithm must determine if one or more additional firewall models must be created, depending on the specific change and its implementation. If an Extended Firewall is needed, element E is checked for additional data and control flow paths which extend out past the one level that the TFW requires. This checking involves determining if a function or data value is dependent on something outside the TFW, either up the call stack or previously calculated state values. The details of how the EFW is created are discussed in Section 3.2. When an EFW is created, the propagation of the change is not constant, and the time required to check for the existence of these paths is dependent on the number of control flow paths and the number and length of data flow paths, Pc, Pd, and Pl respectively, that exist in the system through the change. Specific values of Pc, Pd, and Pl were collected in the empirical studies and are shown in Chapter 6. This algorithm is further refined with logical bounds for each value used in the algorithm. Configurations, denoted in this analysis as C, can contain a maximum of N elements. Each configuration element, labeled E, in the configuration C exists as an 95 instantiation of one of M possible configurable elements. The number of base configurable elements, called M, is usually small due to the configurable elements being highly encapsulated. For example, the ABB control system has 247 configurable elements available for use in configurations. Most configurable systems allow many elements in C to be based on the same base configurable element, much as object oriented design allows many instantiations of the same class. The number of specifications, S, in each configurable element E depends on the specific configurable element that E instantiated. Usually S is kept small, just as the number of member variables in a class should be kept small, since each specification in element E should be well encapsulated. Finally, Pd and Pc need to be considered for the program under test, since the time needed to construct an EFW is influenced by these values. For the worst case analysis of this algorithm, the software system has to be implemented in such a way that the implementation of all of the configurable elements and settings in the system are fully coupled together. As a result of this, Pd, Pc, and Pl would be large, since the high coupling must exist in the code as either control flow or data flow dependencies. This full coupling of all objects would require the creation of an EFW for each changed element E in C. In addition, the algorithm for the EFW would propagate the change through the entire code base for each of these changed elements. Also, each configuration must contain a large number of elements, N, and each element must have a large number of specifications, S. Finally, each one of these elements and settings must have to be changed by the customer. For this example, the algorithm would operate on all N elements in C, since they were all changed. Each individual changed element E in C, including both settings and configurable elements, would require an 96 EFW to be created. This Extended Firewall will grow to the size of the entire software program, leading to an algorithm with a O((N*S)(Pd*Pc*Pl)) runtime which has exponential growth and depends on the number of configurable elements and settings, N and S respectively, the number of control flow paths, Pc, and the number and length of data flow paths, Pd and Pl respectively, that exist in the system. This worst case analysis is infeasible for real systems for a number of reasons. First, completely coupled software program are incredibly hard to create, especially large ones that have to perform actual complex functions in the real world. Second, configurable systems tend to have groups of configurable elements that work exclusively with each other. In the control system example, analog inputs have configurable elements created for each of the possible device types you can connect them to, with specific code to convert the values and communicate with that specific device. This code can not be fully coupled with all other methods and classes, as it is specific to that one type of device. The ERP system also contains these sets of similar elements, an example of which is process specific functions that depend on the specifically selected process type being implemented. Since this fully coupled system is infeasible, the EFW created for each changed element E would never have to propagate across all existing Pd and Pc paths. Finally, a customer would never change every setting and every configurable element in the system at once. Since many of these configurable elements represent real world objects, such as field devices in control systems and computer hardware in ERP systems, a complete change of the configuration would mean a complete change of the physical environment also. In this case, an entirely new commissioning effort is going on, and a complete new testing effort must be done anyway. 97 The best case analysis for this algorithm involves a software system where the code is completely encapsulated and minimal coupling exists. This would lead to a small Pd, Pl, and Pc, since these paths represent the couplings that exist in the system. Since there is minimal coupling, only a TFW would need to be created for each changed element in the configuration. Two different customer change patterns will be looked at, containing both single and multiple changed configurable elements. The single element change would cause the algorithm to select the only changed element E from the set C. For that one element, the algorithm would create a TFW only, Pd and Pl would be zero and Pc would be one since there is minimal coupling in the system. Creating this TFW takes constant time, as the propagation stops one level from the change by definition. Since there is only one element changed, this step happens exactly once, which leads to a runtime of O(1). Looking at this example with multiple customer changes, or N changes, the algorithm selects each changed E from C and creates a TFW only, which is created in constant time. This step is done N times, so the runtime for this example is O(N). Finally, an average case analysis for this algorithm involves a system more representative of one found in industry. This system has an average fan in of two and an average fan out of five. The system has around ten thousand elements in the configuration, and customer makes an average of eighteen total changes, of which six are configuration changes and twelve are settings changes. Since the algorithm creates firewalls for each changed element E in C, eighteen TFWs would be created. In addition to these TFWs, one EFW is created per fifteen TFWs, on average. These numbers are created empirically from real customer configurations and shown to be statistically 98 accurate in Chapter 6. Creating this number of TFWs and EFWs represents a very reasonable amount of time and effort for the vendor in order to verify a customer change. 4.6 Future Improvements on the Configuration and Settings Firewall There are many other current research projects in many software engineering communities that could be combined with this method to increase its efficiency and effectiveness. Efficiency increases could be obtained by including techniques and automation being done in Requirements Engineering, Program Analysis, and Information Retrieval, such as [35] and [36] mentioned in Sections 4.1 and 4.2. These automated methods will replace much of the human expert knowledge and manual work that is required to determine the impact from these customer changes. It may be possible in the near future to fully automate this process, where customers can submit their current and new configurations to an automated system. This system would build the firewalls and return to the customer the impact of their changes, allowing them to test their own changes quickly. This would enable this firewall method to scale to any number of deployed systems while still protecting the implementation details from customers and competitors. In addition to improving the efficiency of the analysis time with new methods and automation, this new firewall can be improved by adding some dynamic information, such as execution profiles of the currently running configuration. These execution profiles can be used to reduce the testing required due to a setting or configuration change. By comparing the current new execution profile to other currently tested and running execution profiles from other customers can identify areas of the system that are running in the same way and were already tested. These areas do not require retesting, 99 but in the current firewall they would be retested. This additional reduction in testing will reduce the time it takes to get a new change verified and into execution at the customer site. Many studies have looked at capturing, grouping, and differencing profiles, such as [46, 47]. Finally, some improvements in effectiveness can be made by developing new firewall models for impact types not currently handled. Impact types that do not currently have firewalls include defects related to memory leaks, starvation, and performance. These firewalls, once developed, can be added to this Configuration and Settings Firewall in the same way that other firewalls were added. 100 5. A Process to Support the Configuration and Settings Firewall While the Configuration and Settings Firewall presented in Chapter 4 works well for configuration changes in the field, software companies still release new software versions with code modifications. This chapter presents a product release testing process for use with user configurable software systems which, when used together with the firewall after release, prevent latent and configuration based defects from being detected in the field while reducing the redundant testing whenever possible. This process is based on current industry release testing process with a few modifications, dealing with the specific configurations and settings that must be tested at release time. Many of these modifications aim to offset the additional testing needed when using the firewall for configuration and settings changes throughout the products lifecycle. This chapter is divided into the following sections. Section 5.1 presents an overview of current processes for testing in industry. Section 5.2 presents a modified process for release testing to support the Configuration and Settings Firewall. Section 5.3 presents future changes and research additions that can improve the efficiency and effectiveness of this process. Finally, Section 5.4 discusses the time complexity of using this proposed release process. 5.1 Current Industry Testing Process Testing in industry usually follows the V-Model, shown in Figure 28, or a process that is very similar to it. The early phases of testing, which are labeled in the figure as coding, unit testing, and integration testing, are focused on verification activities. The coding phase, which would not seem to include any verification, includes static analysis 101 and code reviews, two effective early methods of defect removal. These early phases require no changes when testing user configurable software systems, since they are focused on removing as many early defects as possible at a low level in the development process. Later testing phases, labeled on the figure as system testing and acceptance testing, focus more on validation activities. The system testing phase actually includes two activities, product level and systems level testing. These later phases are ideal to modify for user configurable software systems, as these phases deal with showing the software meets its customer requirements. Figure 28. The V-Model [48] The requirements created for configurable systems are not received directly from the customers. Instead they are created by the product management teams, with the help of the marketing team, who have the responsibility to understand the market that the product is sold in. It is done this way because the very nature of configurable systems 102 allows many different customers to refine a general solution into the specific solution that will address their individual needs. In effect, these systems are sold as general purpose, off-the-shelf software systems with a goal of meeting the needs of an entire broad, diverse market. Due to the lack of direct requirements from any specific customer, the product requirements for a specific customer can be best described by the current running configuration they are using, as this configuration contains the functions and features that the customer cares the most about at this point in time. Since the overall system requirements come from the market as a whole, late phase testing can be shifted from exclusively validating these market requirements to validating the currently running customer configurations. 5.2 Modified Industry Testing Process Fully testing the configurations for each customer completely would be a monumental task, and provide limited benefit compared to the overall effort. Instead of fully testing each configuration, a traditional code-based regression analysis will be performed on the system. The impacted areas determined from the code-based analysis will then be compared to the customer configurations in use in the field. Only the configurations that contain configurable elements impacted by the code changes for the release will be retested, looking for any latent, regression, or new defects in the system. In addition to these configurations, a set of common changes to these configurations could be tested if time remains. These common changes should be based on both the overall system requirements and changes that other customers have made in the past. This additional testing should allow for detection of defects that have a high probability of being seen by customers in the field soon after release. Finally, this new release testing 103 process can lead to a reduction in the time that a release requires to finish testing, thus getting the software out to the customers faster. Testing other features not currently in use by customers and correcting defects found in those features will be postponed until customers start using them by adding the configurable elements to the configuration. Using the Pareto principle [48], also known as the 80/20 rule, as a rough guide, it is likely that around 80% of the customers are only using 20% of the features of the software. Other studies of software, including prioritization and prediction, use this principle in a similar way [49]. It is safe to postpone the testing of features not currently configured by customers as they can only be added later in one of two possible ways. The first way a previously unused feature could be added involves the customer changing their configuration to include it. This would require the use of the Configuration and Settings Firewall, which would detect the new feature from the change and trigger testing of that new area. The only other way a previously unused and untested feature can be added is when a new customer buys the system and needs that new functionality in their configuration. In this case, the extensive commissioning testing that occurs before the software goes online will detect these defects. Modifying the release testing process for these user configurable systems has the potential to both reduce time and increase defects detected. The actual measured improvement for a specific team or company will vary, depending on the current balance between finding defects and reducing time that currently exists in the company. If the release schedule has been the most important driver for releases, then the major gain will be in defects detected, as previous testing did not have enough time to address the large 104 number of possible configurations and settings. If detecting as many defects as possible was the driving factor instead, then a significant increase in release schedule can be made, as previous testing was potentially looking at features that are not currently used in the field. Most companies do not use either of those factors to the exclusion of the other, so some measurable gain in both will be possible. Once the software is released, the Configuration and Settings Firewall will be used on each customer configuration and settings change. The details on how to create the firewall for each of those changes was presented in Chapter 4. Customers must work with the vendor to determine the impact of their change, and the vendor will test the software with the new configuration and settings values and correct any defects that are found. This process also has a positive side effect. Customers know that a change will require them to send in a new version to the vendor and wait for validation, leading them to make fewer ad hoc changes to the configuration, similar to how developers stop making ad hoc code changes when companies use controlled Configuration Management processes. More deliberate changes will help the customer have better reliability with the software, as well as the vendor having less defects reported on the system. Figure 29. Release Testing, Old and New Methods 105 In Figure 29, both the traditional and proposed release testing processes are shown side-by-side. The x-axis represents time, the y-axis represents the range of possible configurations, each black x represents a defect, the small dots represent the configurations tested, and the vertical line represents the point in time where the software is released to the customers. Each dot represents a set of tests and the coverage those tests had on the execution of the software. Any x’s which are not detected before release become latent defects. The top image shows the coverage achieved for user configurable software when tested with a traditional release testing process. Only a few different configurations are tested, and those tested are very similar to each other. This leads to many tests being redundant with regard to configuration and execution, and many latent defects not being covered by tests. The bottom image shows coverage achieved when testing these types of systems with the proposed new release testing process. In this case, testing is spread across the product lifecycle leading to less redundancy, larger coverage of possible configurations and executions, and a larger detection of customer relevant latent defects. The overall number of tests run is the same for both the traditional and proposed testing methods shown in Figure 29. The traditional release testing case has all of the tests and configurations run in a short period of time before release. For the proposed method, the total testing time is same, but this time is spread out over the life of the product. It is important to note that, over the life of the product, the proposed release testing method may require the same or even more total time to test the system than the traditional method. This additional time amounts to the additional testing and fix time required for each customer configuration change. As these previously unused features are 106 configured, testing must be conducted on these features and the defects found must be corrected. This new proposed release testing process, combined with the Configuration and Settings Firewall, amount to a test-as-you-use model, where the costs of testing and fixing defects are spread out over the life of the product. Even when the combined method does cost more over time then traditional testing, there is still a guarantee that only areas of the software that are in use are being tested and the defects detected are ones that pose the greatest risk to customers running the system in the field. 5.3 Time Study of the Proposed Release Testing Process In order for this new release testing method to be considered effective, it must detect defects that were injected by code change which were found by customers just after release. Notice this does not involve a configuration change but a code change, as the customers are running the same configuration but they have updated the software version. This evaluation will be shown empirically in Chapter 6. For the new process and method to be considered efficient, it must save as much time as possible. Determining this savings from the theory side is simple, as the number of possible configurations and settings to test is prohibitively large. Since there are a much smaller number of customers than combinations of configurations and settings, it will certainly take less time to test each of their configurations than all possible combinations before release. On the practical side, since it is infeasible to test all possible combinations, current industry testing selects a specific subset to validate the system with before release. This subset is based on a combination of expert system knowledge, guesswork, and conservatism, many times leading to a set that is larger than it needs to be 107 in order to detect as many defects as possible. Since this subset varies for each product, the savings in time will also vary and need to be determined for each system. Empirical studies on the time savings of this method will be presented in Chapter 6. 5.4 Future Additions to the Proposed Release Testing Process Additional time savings could be found by applying other research areas in software engineering to this modified release testing process. One potential savings could come from comparing the executions of a number of similar customer configurations to determine if their execution patterns are the same. In addition, logging and comparing previously tested execution patterns with newly changed configurations may lead to a further reduction in the change-based testing done after release. The goal is only testing the first of these similar configurations completely and then just loading and executing the other configurations. This kind of execution profiling and comparison could be based on system execution profiling information and clustering techniques similar to those presented in [46, 47]. Another area of improvement would be better automation of the customer configurations. If a set of parallel systems could be set up and augmented with some automated load and test driver software, this testing would become much less burdensome. 108 6. Empirical Studies of User Configurable Software Firewalls This chapter examines the effectiveness and efficiency of the Configuration and Settings Firewall on user-configurable software in more detail by presenting a set of empirical studies that were conducted on various software products developed at ABB. These products are large real-time user-configurable software systems currently in use at thousands of locations around the world for industrial control. These systems are configured to run various types of process control applications such as power generation, chemical and beer production, and pharmaceutical manufacturing. The first part of this chapter, Section 6.1, presents an overview of the empirical studies that were conducted. Section 6.2 discusses the limitations of the studies that were performed. Sections 6.3 and 6.4 present the first case studies and their results, respectively. In Section 6.5, a breakdown of the type of configuration based defects found by customers in the field is presented. Finally, in Section 6.6, a case study was performed to show how efficiently this method can be used in practice. 6.1 Empirical Studies Overview In order to validate that the Configuration and Settings Firewall is effective and efficient, a number of empirical studies were performed. These studies involved a few different approaches, depending on the goal of each study. The first approach, used in the first and second case studies, involves applying the Configuration and Settings Firewall to a large number of past customer configuration changes and then comparing the identified impact to any known defects that customer found when those changes were made. This approach is useful to show the effectiveness of the change determination, 109 code mapping, and impact analysis steps of the firewall at determining the correct areas of the software to test. By not running the tests, test effectiveness is removed from consideration when determining the effectiveness of this new firewall method as well as increasing the number of changes that can be analyzed given a set period of time. A second approach, used in the third case study, takes a smaller subset of the customer changes used in the first two case studies, applies the Configuration and Settings Firewall just as before, but now includes execution of the tests. The goal of this approach is to show any additional defects that pose a future risk for the customer that have not yet been detected in the field. The third approach shown in the fourth case study involves looking at each latent defect found by customers and classifying it using the Beizer Defect Taxonomy [2]. Once classified, an analysis of the defect types found by customer configuration change is presented. This analysis provides insight into the number of defects found by customers in each defect type. These defect types are each assigned to a code change firewall based on the dependency involved with that defect. Once this assignment is done, the number of defects that can be detected by each firewall is calculated. The final approach presented in the fifth case study involves measuring and recording static metrics of the code analyzed by the firewall for the first two sets of changes. These measures, including Pc, Pd, Pl, and the frequency of EFWs, will help describe the time complexity of the firewall. The first and second case studies aim to show the effectiveness of the new firewall model in detecting latent defects when the configuration and settings change. The first study is conducted on an embedded process controller module which is 110 procedurally designed and implemented as a mix of C and C++. The second study is conducted on an HSI console product running on a standard PC under Windows. This HSI is implemented in C++ and C# following Object-Oriented design principles. Each of these case studies start with a released version of the specific software and a running customer configuration. Many customers are inherently secretive with their specific configuration, as the actual running of their process is often a trade secret. Due to this, ABB only has a few opportunities to get real customer configurations. One configuration commonly available is the initial configuration used when the plant was first commissioned. In addition, customers submit their currently running configuration when field failures are detected in the software. These failures, if caused by latent software defects, are detectable in the submitted customer configuration. To prevent any bias in the analysis, no information about the customer found defect is available at the time the firewall is created, but is utilized at a later time for firewall evaluation. There are two ways available to get the specific changes the customer made to the running configuration, both of which were used in these studies. The first method is possible only when the customer submits detailed steps of the actions they took that caused the failure to occur, as well as the configuration they were running when the failure occurred. For this case, the submitted configuration is opened and the changes they made are removed, leaving a configuration similar to what they were running before the failure. Then the changes are made and a second configuration is saved, representing the changed configuration. This method provides the precise configuration and settings changes which caused the failure and exposed the latent defect, allowing for a more accurate logging of the time required to analyze the changes. This accuracy is due to the 111 analysis only using the actual set of changes which exposed the defect, as opposed to a grouped set of all changes over time that ABB knows about. This is further explained in Section 6.3. The second method for determining change information from the customer involves taking a previously known configuration and using it as the base configuration. The submitted customer configuration, when compared to this base configuration, contains all of the changes to the configuration the customer made over some period of time. This may include many years worth of changes, depending on when the last field failure was reported for that customer. As a result, this method is slightly less representative of a single atomic change set, but does allow more studies to be performed when the detailed change information is not submitted by the customer. When using this second method, the difference between the new and base configurations is broken into a set of small grouped changes. Each of the changes in the groups has a Configuration and Settings Firewall created for it, approximating incremental changes coming from the customer. In the first and second case study, once all of the impact areas have been identified by the Configuration and Settings Firewall, reported customer failures which are due to latent software defects are analyzed and checked against the impact identified by the change. If they exist within the areas identified as needing retesting, the defects are considered detected. If they exist outside the impacted areas, the defects will be studied to determine if they were related to the configuration change. If they are related, the defects are considered missed, and if not, they are considered outside of the scope of this firewall method and discounted. 112 The overall effectiveness is measured by the percent of customer reported latent software defects that were detected by the Configuration and Settings Firewall. Any additional defects found are considered new defects that have not yet been detected in the field, but do exist in the system as future risks. In addition, the time required to perform the analysis and create the needed firewalls is recorded. The third case study takes a small set of changes from the second case study and involves running the tests themselves in addition to just checking impact. A few changes from the GUI configuration product are used with a goal of detecting additional defects that are currently at risk to that customer at that point in time. If a defect is found, it will be checked against the known defect list for the product to determine if it was already detected or is still latent in the software. The goal of this study is to show that the firewall can actually detect the identified latent defects by testing, and also determine if additional defects in and around the change can also be found. The fourth case study takes all of the customer reported defects in the embedded controller module and classifies them using the Beizer Taxonomy [2]. Instead of the full four levels of detail used in the Beizer taxonomy only the first two levels are used. More information on the taxonomy and the customizations are given in Chapter 6.6. The fourth case study shows the types of defects that are revealed by customer configuration changes as well as the types of defects that the different firewalls can detect. Finally, the fifth case study presents a set of static metrics measured from the software that was used for these empirical studies. These measures were taken using both the source code as a whole and using just the code representing the configurable elements. The measures collected include the fan-in and fan-out of each configurable 113 element, either class or function, the maximum calling tree depth containing a class or method in a configurable element, the cyclomatic complexity of the functions inside the configurable element, and the number of external values used in the function. 6.2 Limitations of Empirical Studies It has not been determined yet if results on ABB systems are representative of all real-time industrial software applications. In addition, real-time systems themselves may behave in a way that is different than other types of applications. Also, the configurations used by ABB customers are treated as trade secrets and there is no way to know exactly how all of the changes were performed over time. Currently, the only way ABB knows about customer configuration changes is when a failure is observed and reported to technical support. Since time sequence data for each change is not available, the total changes made to the customer’s configuration are split arbitrarily into a set of smaller changes. This could lead to a larger amount of time for analysis and testing, due to overlapping of the firewalls. A final limitation of the study is that the test time component of the efficiency data is based upon a small number of test runs, as it was not possible to run tests for each of the studies. The static metrics collected in Section 6.7 were calculated on the entire system and support the claim of efficient test time creation and execution. 6.3 First Case Study The first case study was conducted using an embedded process controller which is implemented as a hybrid containing both OO designed C++ code and procedurally designed C code. This software includes 761 files, 4831 functions, 49 classes, 533,002 114 Executable Lines of Code (ELOC), and 247 configurable elements. This software runs on a custom ABB designed hardware board running a proprietary embedded operating system. Since this system runs a proprietary OS, there are no third party components in the system that would require a COTS Firewall to be created. This case study is broken up into many smaller studies involving different customers and configuration changes. The main goal of these smaller studies is to show that the Configuration and Settings Firewall is effective at detecting latent software defects exposed by configuration change at the customer site. This is accomplished by creating the required firewalls for all of the changes and then determining if they contain the failure reported by the customer. Configurations for this system are created graphically and compiled by a tool into files which are loaded into the specified controllers. Inside these files, the configurations are represented as a list of configurable elements in the order they are to be executed. Each configurable element in this list contains values for each of its settings, which are then assigned to the internal variables that represent them when the configuration is downloaded to the controller. 6.3.1 First Customer Study – Embedded Controller The first configuration studied is from a customer that has used the system for many years and is very familiar with how it works. There were two separate failure reports submitted by this customer reported against the same version of the software. Each failure was caused by separate latent defects in the software and contained the configuration that was used to observe the defect. The most recent configuration was compared to the originally installed configuration and the changes identified were grouped randomly into three sets. It was not known which set contained the two reported 115 defects and the specific details of the defects themselves were not known before the analysis was done. Prior to these changes, this specific customer had been running for many years without reported failures. The first set of changes included five configurable elements being added as well as changes to nine settings. The settings changes were examined first. Each setting was mapped to the internal data variable in the code representing it. This variable was marked as a code change, and a TFW model was created for it. As the TFW was being created, each control flow path was checked for any data flow dependencies. Since no dataflow dependencies were found, an EFW was not needed. In addition, the settings changes did not cause any change to blocking calls in the system, so no Deadlock Firewall was needed. For the added configurable elements, four of them were previously used elsewhere in the configuration and the final element was new to the configuration. The code representing each element was considered code changed and a TFW was created for each element. Just as with the settings changes, no dataflow or blocking calls were affected, so only the TFWs were created. The second set of changes from this customer included two added configurable elements as well as five settings changes. Each settings change involved determining which internal variables represented the setting, marking them as changed, and creating a TFW model. As the TFWs were being created, each control flow path was checked for any data flow dependencies. Since none were present, no EFWs were created. Each of the added configurable elements was used elsewhere in the configuration. The code representing it was marked as a code change, and a TFW model was created. When creating the model, it was determined that one of the changes included a new semaphore 116 call, so a Deadlock Firewall was also created. There were no affected data flow dependencies in these added configurable elements, so no EFWs were needed. The final set of changes included only nine settings changes. Each setting was mapped to the variables that represented them and each was marked as code changed. These variables were checked to see if they belonged in a dataflow path with any other parts of the system while the TFW was being created. It was found that two of the changed variables were included in a longer data flow path which could lead to impact spreading more then one level away. As a result, EFWs were created for these two settings changes. The remaining seven only had TFWs created for them. Once all of the changes were studied and the required firewalls created, the customer defects were analyzed and their code locations determined. Each of the defects was compared to the impact identified in the TFWs and if they were inside the impact, they are considered detected. The first set of changes contained one latent defect reported by the customer. This defect resulted from a change where a previously used configurable element was added at the end of the configuration and its output value was being passed to an element that existed earlier in the configuration. This type of change is valid, but the system executes code in the order it exists in the configuration, specifically by a unique ordering ID called a block number. Since the new configurable element is at a higher ID, it is executed after the existing configuration element that uses its output. The value being passed from the new element back to the existing element was not initialized properly and the existing element had no check inside it to verify that its connected data providers were executed before it. This led to a potential error when the system is first started up where the 117 uninitialized value can cause the system to perform incorrectly or crash. This defect is only observable when the source configurable element is added after the receiver element which is dependent on its value, since the initial value is never needed otherwise. The specific configuration change is shown in Figure 30. Figure 30. Case Study 1, Configuration Change with Latent Defect Figure 30 shows a new configurable element being added whose output is used by a previously existing element. The latent defect exists within the new element and only occurs when the new element is added in such a way that its execution happens after the execution of the existing element. In this case, the currently existing element uses the value from the new element before the new element has written to that value. This defect is considered detected by the TFWs since the output function from the new configurable element was marked as code changed and the existing configurable element, specifically the interface from the new element to the existing one, as needing to be checked. The second set of changes contained no defects reported by the customer. In the third set of changes, one of the settings changes led to a customer defect. This setting change affected the calculation of the output value for the configurable element it resided in. This output was, in turn, passed between many configurable elements until it was finally used to compute a final value that was then output to the system. This change is shown in Figure 31. 118 Figure 31. Case Study 1, Settings Change with Latent Defect Figure 31 shows an existing configurable element with a settings change. There was a latent defect in configurable element C which was exposed due to the settings change affecting the output value from configurable element Z. This defect was detected by the EFW created for this change, as it includes this latent defect within the data flow path marked to be retested. This first customer configuration study shows that latent defects existing in the code base can be detected by the Configuration and Settings Firewall. In total, two latent defects were detected by this firewall. These defects were originally detected by customers in the field and required development and test rework to correct. In addition, no defects were missed by the firewall as no additional latent defects were reported from the customer. 6.3.2 Second Customer Study – Embedded Controller The second configuration studied involved a different customer with a completely different configuration. This customer has also been running the software for a long time and was very familiar with the working of the system. There were three failure reports submitted by this customer on the same version, each of which was caused by a latent software defect. For this study there were four sets of changes created. Previous to the first set of changes, the software had been running continuously failure free for a number of years. 119 The first set of changes included three settings changes and the addition of one configurable element used previously in this configuration. Each setting was mapped to the internal data variable in the code representing it. This variable was marked as a code change, and a TFW model was created for it. As the TFW was being created, each control flow path was checked for any data flow dependencies. No data dependencies were found so no EFWs were created. For the added configurable element, a TFW was created. As this TFW was created, each control flow path was checked for data flow dependencies. No dataflow paths were affected, so no EFWs were created. In addition, no blocking calls were affected by either type of change. The second set of changes included the addition of four configurable elements which were used previously in the configuration. No settings were changed. The code dealing with the new configurable elements was marked as a code change and a TFW was created. While creating the TFW, no blocking calls or dataflow paths were affected, so no other firewall models were created. The third set of changes included changes to three settings values as well as the addition of three configurable elements which were new to the configuration. The settings changes were mapped to the internal variables, which were marked as code changes, and then TFWs were created. It was determined, while creating the TFWs, that one data flow dependency was affected by one of the changed settings. This required the creation of an EFW for that dependency. The code for the added configurable elements was marked as code changed and a TFW was created around them. No blocking calls or data flow dependencies were affected by the added configurable elements. 120 The final set of changes contained only five settings changes and one added new configurable element. These settings changes were mapped to code variables, marked as code changed, and then a TFW was created. No data flow dependencies or blocking calls were affected, so no additional firewalls were created. The added configurable element was new to the configuration and performed a smoothing operation on an input value. The code for the configurable element was identified and a TFW was created. No data flow dependencies or blocking calls were affected by the change, so no additional firewalls were needed. A latent defect was exposed by one of the settings changes in the first set. This setting controlled which operation a configurable element performed, and when changed, affected which code inside that element was executed. Specifically, the element was a mathematical shaping function used to smooth analog input values and the defect existed in a code path only executed for the mode of operation selected with the settings change. The defect involved the accuracy of the shaping function for a certain range of values and the failure report indicated that it caused process issues for the customer. This defect is described in Section 4.1 and shown in Figure 14. The firewall model created for this settings change contained the newly selected execution path within its boundaries, so this defect is considered detected. In the second set, one of the added configurable elements previously used in the configuration exposed a latent defect in the software. It involved adding this configurable element with its default values, which are automatically set by the configuration tool. If no changes are made to the settings and the initial defaults are loaded, the configuration will fail right away. The default values include a value which is specifically not allowed 121 in the configurable element. But the configurable element does not check this value correctly when it is changed with the controller offline. This defect was definitely detected by this method as the entire configurable element was selected for retesting by the Configuration and Settings Firewall. An example of this kind of change is shown in Figure 32. In this figure, a new configurable element is added without changing its default settings. It is just dragged into the configuration page and saved. The other instances of this configurable element in the configuration had their settings changed before the system was run. Figure 32. Case Study 1, Added Configuration Change The third set of changes contained one latent defect reported by the customer. This defect involved messages being missed in the system, due to increased processing time required by the newly added configurable elements. TFWs and one EFW were created for this change, but the defect was not contained inside the impact. Once the failure report was analyzed, the underlying cause of the failure was a performance defect. Performance defects such as these will require a Performance Firewall in order to detect these defects reliably. In the final set of changes, two separate defects existed. The first defect involved a settings change which led to a latent defect being found in the software. The latent defect itself prevented any changes to the settings of this element from taking effect until a restart is done where no restart is usually required. Therefore, when the settings change was made, it did not take effect initially. This defect was identified by the Configuration and Settings Firewall, since both the setting itself was marked as changed as well as the 122 startup routine. As a result, this defect is considered detected by the firewall. The other defect involved the addition of a configurable element which was new to the configuration. This new element takes an input value and applies a mathematical function to smooth its value out. This element was added in response to captured data values from the input, showing that the physical device providing the input was causing variance in the input value that did not actually exist in the process. This second configuration studied showed similar results to the first study. Both settings changes and configuration changes can lead to latent defects in the field. The Configuration and Settings Firewall was successful at identifying the correct area to test after the changes were made. 6.3.3 Additional Customer Studies – Embedded Controller In order to prevent additional repetition, all of the additional customers studied are summarized in Table 11 at the end of this section. In addition, each reported customer defect and the configuration or settings change which exposed it are described separately. The same process used for the studies in 6.3.1 and 6.3.2 is used here, but a description of the steps followed for these additional studies is omitted for the sake of brevity. The overall data collected for all of the studies conducted on this embedded controller are shown in Table 11. The third customer studied had a number of settings changes, including a settings change to an advanced PID configurable element inside their existing configuration. This change involved the customer changing the increment and decrement limit settings used by the PID element. The customer changed these values by a large amount and the process variable spiked rapidly, leading to the controller entering an error state. The 123 cause was an internal code defect, where the system would first disable increments and decrements for the PID algorithm, forcing the output to remain the same. This was accomplished by using a copy of the last output value as the output value for the PID element. The internal algorithm did not stop calculating the error between the set point and the current plant value, since it was updating its actual output, leading to larger and larger changes of the output to correct the perceived error. Once the changes were complete to the limit settings, the held output value was cleared, and the actual output was connected. This led to a very large PV value being output from the PID element and the controller, detecting it, entered the error state. This defect was detected by a TFW built around the setting that was changed and its users, as a change in the setting value executed the control flow path which held the output value steady. The fourth customer studied changed only settings values in their configuration. This customer had been improving the overall physical process with better materials and up front quality control. When the physical quality was good enough, the customer made a few changes to the advanced PID configurable element to allow for tighter control since the process had less variation. These changes affected the values of the proportion and integration settings used by the PID algorithm. Once the changes were made, the process would drift by 4%, even though the underlying process did not require it. Internal to the PID algorithm, the calculation used had a small rounding error in the calculation of the new output value, leading to the detected instability. This defect was also detected by a TFW built around the changed setting values and their uses inside the configurable element. 124 The fifth customer studied added a set of previously used configurable elements to the system. These elements used different values for a small number of settings, so TFWs were created for each of the settings values that were different than previous usages. When the customer loaded this configuration and started running it, the controller crashed and went into error mode. The defect was related to one of the new settings values used in the added configurable element. There is a latent code defect which is only revealed when the settings value is set to a number above 16384. The customer had set the setting value to 18726, which caused the error to be revealed. This defect was contained inside the TFW created for the setting and its users inside the configurable element. The sixth customer studied also added a set of previously used configurable elements to their configuration. These elements represented new physical IO devices that were added to the system, and IO values which are read in from them. The settings values used are mostly the same between the new instance of the elements and the previous usages. As a result, TFWs are created for the configurable elements and only a few settings. Three EFWs were created, as the new elements were connected in the configuration to previously configured elements by data values being passed to them. The customer loaded this configuration into the controller and started running it. Every once in a while, the data sent to the previously used elements goes bad and then recovers a few seconds later. This defect exists in the interface between the device bus and the configurable elements, but is only detectable by elements that are connected to it. This defect was detected by one of the EFWs, as the previously existing configurable elements were involved in a data relationship with the newly added elements. 125 Table 11. Summary of Case Study 1 Embedded Controller: Cust 1: Cust 2: Cust 3: Cust 4: Cust 5: Cust 6: Total: # of Settings Changes 23 11 18 8 4 0 64 # of Defects # of Added Used CEs 1 2 1 1 0 0 5 6 5 0 0 9 21 41 # of Defects # of Added New CEs # of Defects Analysis Time (Hours) 1 1 0 0 1 1 4 1 3 0 0 0 0 4 0 1 0 0 0 0 1 4 1.5 1 0.5 1.5 2 10.5 # # # Deadlock # 3rd TFWs EFWs FWs Pty 30 19 18 8 13 21 109 2 1 0 0 1 3 7 1 0 0 0 0 0 1 0 0 0 0 0 0 0 Table 11 shows the summarized results for the entire first case study. In all, 64 settings were changed and 45 configurable elements were added, 41 of which were instances of previously used elements. These changes led to the creation of 109 TFWs, 7 EFWs, and one Deadlock Firewall. Creation of all of these firewalls took only 10.5 hours, as there were few EFWs and Deadlock Firewalls created. These firewalls were able to detect 10 latent software defects originally detected in the field, missing only one performance defect which requires an additional future firewall to detect. 6.4 Second Case Study The second case study was conducted on a graphical configuration program that is used by customers to configure the entire system, from the embedded controllers to Human System Interface displays and graphics. This system is implemented as a hybrid of OO designed C++ code and procedurally designed C code. This software includes 5121 files, 39655 functions, 3229 classes, 767431 Executable Lines of Code (ELOC), 2398 configurable elements, and 17 third party components. This software runs on a standard PC running the Windows operating system. This case study is broken up into many smaller studies containing different customers and configuration changes. The 126 main goal of these smaller studies is to show that the Configuration and Settings Firewall is effective at detecting latent defects found at customer sites that were exposed by configuration changes. This is accomplished by creating the required firewalls for all of the changes and then determining if they contain the defect reported by the customer. Configurations for these systems contain configurable elements which are used to create the files that are downloaded to the controllers, Human System Interfaces, and other software products in the system. These configurations are stored as projects containing a physical layout of the process. Each physical part of the process contains a link to files that contain a graphical list of configurable elements, settings values, and the relationships between them. Customers use this product to create the graphical configurations, compile them, and then load them into the various software products that use them. 6.4.1 First Customer Study – GUI System The first customer study for this GUI-based system involves a customer who was adding new graphical display elements into their configuration. These changes involved the addition of previously used configurable elements into the configuration. These added elements were connected to previously used input values from a field device and allow these values to be displayed in the Human System Interface by plant operators. The customer made a set of changes to their configuration to allow a large number of internal input values to be displayed by the HSI. These were values that were found to be important after the initial configuration of the plant was complete. The project files before and after the change were analyzed, and a list of changes was created. This list 127 was split randomly into two smaller groups of changes and each group had Configuration and Settings Firewalls created for it. The first group of changes contained three settings changes and the addition of two previously used configurable elements. For the settings changes, a set of Settings Firewalls were created. Settings were mapped to the internal variables in the code which represent it. These variables were marked as code changed, and TFWs were created for them. As the TFWs were being created, each changed variable was checked to see if it was involved in any data flow dependencies. For these settings changes, no data dependencies were found, and no EFWs were created. Each of the added configurable elements was previously used elsewhere in the configuration, so their settings were compared to the previously used instances of these elements. Only a few settings had changed for each, so Settings Firewalls were created around those variables and uses, resulting in a set of TFWs. When creating the TFWs, the variables were checked to see if they were used in any data dependencies. Two of the added configurable elements had different settings which had relationships to other configurable elements in the configuration. As a result, EFWs were created for each. None of the configurable elements or settings changes were involved with any third party components or blocking calls, so these firewalls were not needed. The second group of changes studied contained two settings changes and the addition of one new configurable element. Each of the changed settings was mapped to the internal variables which represented them, and were marked as code changes. TFWs were created, and each included checks for data dependencies. None were found, so no EFWs were created. For the newly added configurable element, it was marked as a code 128 change and all dependencies, both into and out of it, were marked as affected. These included both dependencies in the code and dependencies based on the configuration. TFWs were created first, and data dependencies were checked. One such dependency was found, and an EFW was created for it. Finally, no blocking calls or third party components were affected, so these firewalls were not created. After all of the changes were identified and the firewalls created, the failures reported from the customer were analyzed. The three added configurable elements, one new and two previously used, each exposed failures. These failures were related to one latent defect in a support function for the added elements. This defect involved connecting configurable elements across different graphical pages of the configuration, by way of a reference. These references act as helper functions for all configurable element types, but contained a defect for the two types which were added by the customer. When the customer compiled the project into configuration files, the compiler generated an error saying that the compilation failed. The failure was due to no matching reference being found for these three additions to the configuration. The EFWs created for the two previously used configurable elements and the new configurable element contained this defect inside its identified impact area. The data dependency itself involved the output of the newly added elements being connected and used by other elements on other pages through a set of connected cross page references. These references allow values and elements on one page of the configuration to be connected to elements that exist on a different page. The defect existed in the processing of these cross sheet references, and only occurred when the specific configurable elements were connected through it. 129 6.4.2 Second Customer Study – GUI System The second customer study for this GUI system involved a customer upgrading their Human Systems Interface software. In addition to upgrading the software, the customer changed their configuration to take advantage of new features in the HSI. Many of these new features require changes to the settings of existing configurable elements, allowing for better information to be displayed on the new HSI. The changes that were made to the configuration were determined by comparing two versions of the customer’s project. Once the configuration changes were identified, they were broken down into three groups. The settings changes were split up randomly, but the configuration changes were grouped together, as they represented replacing one set of elements with a different set. This change is known to be an atomic set, as the problem description describes it in high detail. Each group of changes had Configuration and Settings Firewalls created for it. The group that contained the defect was not known when the firewalls are created. The first of the three groups contained nine settings changes. Each of the changed settings was mapped to the underlying code variables which represent them and marked as a code change. Once complete, TFWs were created around these variables and their uses, checking for data dependencies as they are constructed. No data dependencies were found, so no EFWs were created. In addition, no blocking calls or third party components were impacted, and these firewalls were not needed. The second group of changes contained the addition of three new configurable elements and the removal of three others. A set of existing elements were removed and a different set were added which allowed the customer to take advantage of functionality provided in the new HSI system. Each removed element was replaced with a new 130 element that contained additional functionality specific to the new HSI. These changes, taken together, constitute an atomic change which was done in response to the HSI upgrade. Since this change was atomic, only one set of firewalls were created for this change, instead of one for each removal and one for each addition. For each of the newly added configurable elements, the internal code for each was considered changed, and TFWs were created. These TFWs contain control flow dependencies from both the code and from the configuration itself. While creating the TFWs, analysis for data dependencies was conducted. No dependencies were found, and no EFWs were created. In addition, no blocking calls or third party components were impacted, so these firewalls were not created. The final group of changes contains four settings changes. These settings changes affect the format of output data needed by the new HSI. These settings were mapped to the internal variables, and were marked as code changed. After this, TFWs were created, and each setting change was found to impact existing data dependencies. These dependencies are between the output of the configurable elements which have settings changes, and the configurable elements which send data out to the HSI. These dependencies resulted in EFWs being created for each changed setting. No deadlock or third party components were affected, so these firewalls were not created. Once the various firewalls were created and the impact of the change identified, the reported failures were studied. There were four failures detected, each resulting in incorrect data being displayed on the new HSI. The failures caused values to be truncated to 14 characters, instead of the 16 characters stated in the requirements. All of the other types of configurable elements that send data to the HSI correctly send 16 characters. 131 These failures are caused by a single latent software defect contained inside the configurable elements added in the second set of changes. This defect exists in the impact identified by the TFWs created for the settings change inside the added two configurable elements. The defect can be observed by checking the output value of the added elements, requiring specific tests on those outputs. If the TFW impact is too difficult to test, as the affected function is called by system elements outside the product itself, the EFWs created also contain the defect, as they included testing the data dependency between the newly added element and the HSI. Since the quality of the tests was not the point of this study, the TFW is considered to have found this defect. 6.4.3 Additional Customer Studies – GUI System To prevent additional repetition, each additional customer change studied is summarized in Table 12. Each of the reported customer failures and the configuration or settings changes which exposed it are described separately in high detail. The same steps were followed for these additional studies as were used in Sections 6.4.1 and 6.4.2, but these details are omitted for the sake of brevity. The overall data collected for all of the studies on this GUI System are shown in Table 12. The third customer change studied involved adding five configurable elements, all of which were previously used in the configuration. These added elements represent redundant controller modules which were added to increase the reliability and safety of the process. These modules do not contain any new logic, as they represent redundant modules in the system. When the customer next exported this project for use in their HSI system, the operation does not export all of the data in the project to the HSI. The defect 132 underlying this failure involved the algorithm used to export projects to the consoles. The algorithm created exports one controller at a time in the order they appear in the project. When the export processes the redundant module, it finds no logic, and continues on. This works for all cases except when certain configurable elements are configured as redundant modules. In this case, the number of controllers to export, which is used as the loop termination value, is based only on the number of primary controllers. As a result, the data exported is incomplete. Both TFWs and EFWs were created, as the added configurable elements had data dependencies to many other areas of the software. This defect, which caused failures for each element added, is contained only in the EFWs created for these added configurable elements, as the elements are accessed by the export routine which contained the defect. The fourth customer change studied involved a change where the customer added ten previously used configurable elements to their configuration, eight of one type and two of another. These elements were additional values needed by the plant operators, and were added to the configuration loaded into the HSI. Once the configuration was loaded a failure was observed. The new instances of these configurable elements only had a few settings values different from previous usages, leading to Settings Firewalls being created for each changed value. These settings differences did not affect any data dependencies, so no EFWs were created. The first type, of which eight were added, only required one TFW while the second type required five TFWs for each of the two added elements. These added configurable elements are involved in a dependency with a third party component. This component was a Microsoft database, which was used to store all of the configurable elements in the configuration. Due to this dependency, a COTS Firewall 133 was created. This firewall creation is in reverse compared to the code change version. Instead of finding a change in the module and propagating it out to the high level APIs that use it, the high level API affected was propagated inward, and all other API functions which use these changes are marked as affected. The failure occurred due to a latent software defect involving the database, which used a user-passed parameter as the index value instead of generating its own index, as described in the documentation. When negative values were passed into this database, it crashes with an unhandled exception, which was the case for these newly added configurable elements. This latent defect was contained in the COTS Firewall and was exposed by this change. The fifth customer change involved a customer who added eight new configurable elements to the system. These elements represented an analog input module and data values which were connected to it. These new configurable elements were considered code changes and TFWs were created for them. These TFWs include both static relationships, based on the code, and dynamic relationships, based on the configuration file in use. None of these newly added configurable elements were involved in a data dependency with other parts of the code or configuration, so no EFWs were created. Once these changes were made, the customer tried to export the configuration for use in another software product in the system. The export completed successfully, but when the configuration was loaded into the other product, eight failures were detected. The failures involved a number of values being incorrect, specifically the values from the newly added configurable elements. The underlying latent software defect was contained in the TFWs created, as the configurable elements internal export method was being called by 134 the export routine. This required the export routine to be retested and a failure corresponding to this defect was detected. The final customer change studied included 25 settings changes. These settings changes affect the update rates of the data being displayed. Each of the settings changes had TFWs created for them. While creating these TFWs, a number of blocking calls were identified, necessitating the use of a Deadlock Firewall. This firewall, once completed, identified the potential for deadlock. Since this firewall is an analysis firewall and not a testing firewall, the likelihood of the deadlock occurring is not determined, just that the potential exists. The defect itself was an occurrence of deadlock, where the changed update rates for the displayed values caused the GUI configuration software to lock up completely. The change in timing was just enough to expose the latent deadlock to be observed by the customer when the configuration was changed. Since the tech support and test labs used different speed hardware, they were not able to directly reproduce this problem, and engineers had to go out to the site to study the problem. This deadlock was the same deadlock that was detected by the Deadlock Firewall. By using the Configuration and Settings Firewall, this deadlock and large expense could have been avoided. Table 12. Summary of Case Study 2 HSI System Cust 1: Cust 2: Cust 3: Cust 4: Cust 5: Cust 6: Total: # of Settings Changes 5 13 0 0 0 25 43 # of Defects # of Added Used CEs 0 0 0 0 0 1 1 2 0 5 10 0 0 17 # of Defects # of Added New CEs # of Defects Analysis Time (Hours) 2 0 1 1 0 0 4 1 3 0 0 8 0 12 1 1 0 0 1 0 3 1.5 2.5 0.5 3 0.5 4 12 135 # # # Deadlock # 3rd TFWs EFWs FWs Pty 8 16 5 18 8 25 80 3 4 5 0 0 0 12 0 0 0 0 0 1 1 0 0 0 1 0 0 1 6.5 Third Case Study The goal of the third case study is to run the required tests for a few of the configuration changes analyzed for the GUI Configuration product selected from those studied in the second case study. Running these tests allows actual testing time to be measured, as well as determining if any additional existing defects around the change can be detected. When testing the impact around the change, few detailed tests currently exist to select from. The lack of existing detailed tests is due to the way that current release testing is performed today, where testing spreads out very thinly just trying to test representative configurations. As a result, the needed tests are created with exploratory testing [4] using the impact identified by the created firewalls and system knowledge. While the tests are being run on the system, certain measures are being recorded. These are shown in Table 13 at the end of this section, with a separate line for each of the two customer changes tested. These measures are broken into two categories. The first category is time required and the second category is failures detected. The measures selected for required time are presented first. A count of tests run is the first measure used. Analysis time, which is the time needed in the second study to create the firewalls, is taken from the second case study. The time required to run the tests is logged as test time and is calculated as the elapsed time from the start of the tests to the end of the tests, as measured by a wall clock. After that, the total test time is calculated as the sum of the analysis and test times. Next, the original time is calculated by summing the time required to investigate and discuss this problem, including technical support, development, and management. By comparing the original time to the total time, 136 a time savings is computed. This savings represents the decreased time required to reproduce and discuss the problems, when compared to field reported failures. The second category of measures in Table 13 involves failures observed during the testing. First, the overall number of observed failures is counted. Each of the observed failures is split into two categories, known and new. New defects do not currently exist in the defect repository while known defects do. This determination is made using expert knowledge to compare the observed failure to existing failures described in the defect repository. Finally, a determination is made if the observed failures contain the failure reported by the customer. 6.5.1 First Testing Study – GUI System The first change tested was one of the changes studied in Section 6.4.2. This change involved adding configurable elements which export values out of the GUI system and into the HSI. When the change was performed, a failure was detected where values were truncated to fourteen characters instead of the required sixteen. The TFWs and EFWs that were created for that study were reused here. In order to test these changes, the correct version of the product was installed. Once installed, the configuration was loaded which caused the failure to be detected. Then, tests were created to cover the impact from the TFWs and EFWs generated. These tests were created by using concepts from the Complete Interaction Sequences (CIS) method [10]. This method will not be discussed here, but the key concept used involved testing a required action by creating tests for all of the possible ways in which the GUI allows that action to occur. For example, in a GUI system, copying a configurable element can be done either through the Edit menu or by right clicking on the element and 137 selecting Copy. For this study, both of these actions would have been tested. Using this method, along with exploratory testing, allowed more test cases to be created and run leading to a higher overall coverage, while still being completed in a short amount of time. When testing was performed for this impact, a few failures were observed. The first failure occurs when a settings value is being updated. If the user tries to switch GUI screens in the middle of updating the settings, they are prompted to save the changes. If the user selects cancel, the changes are lost and a message appears which says the record could not be locked. This failure was not found in the failures listed in the defect repository for this product, and is considered a new failure. A second failure was found that occurs when the user enters 16 characters into a description field and tries to export the list of configurable elements. This export operation fails, as the exported list contains only 14 characters of the text. This failure matches the customer reported failure in Section 6.4.2 exactly, so it is counted as a known failure. A third failure was detected when the customer configuration was first imported into the tool. The tool reported an error when this operation was first attempted, displaying only “Non-recoverable Error”. This is a non-descriptive error, as it gives no information about the issue or resolution, and was listed as a failure. This failure is a known error as it was detected internally by ABB when performing testing for a service pack release. The final observed failure occurred when a user exports the list of configurable elements. If the user selects an available option on the export dialog box, the resulting 138 output contains no data. This occurs regardless of what configurable elements are contained in the list. This defect is not due to configuration and settings changes, but represents a more traditional latent defect in the functionality of the system. This failure was the same as one described in the defect repository that was originally reported by a separate customer one year after the release of this version. Therefore, it is counted as a known failure. This first study tested the impact identified by the Configuration and Settings Firewall for the defect studied in Section 4.6.2. These tests were created and executed in 1.5 hours and was able to detect four separate failures in the released software. One of these failures was not reported by testing or customers previously. Two of the failures were found by other customers and the final failure was observed by ABB in testing subsequent releases of the product. A summary of the data collected from this study is shown in Table 13. 6.5.2 Second Testing Study – GUI System The second change tested in this study is one from Section 6.4.1. The specific change tested was the addition of three previously used configurable elements. These elements were connected across graphical pages by references. When the customer made these changes, a failure was observed. This failure involved the compiler failing the compilation of the configuration and providing no error message documenting the issue. The TFWs and EFWs created for the study in Section 6.4.1 were reused here. The correct version of the product was installed and the customer configuration that detected the failure was loaded. Tests were created using the CIS method. 139 When testing the impact of this change, a number of failures were observed. The first test involved simply compiling the project. This basic operation exposed a defect, where the compiler failed due to the customer adding three configurable elements. This first failure matches the original customer reported failure for this configuration change. A second failure was observed when testing alternate ways to change settings in the added configurable elements. If a specific setting contains a value which comes from a configurable element on a different page of the configuration, then the entire program crashes when the configurable element is opened for change by the tool. When this happens, the software must be restarted and all unsaved changes are lost. This failure matches a failure found in the defect repository that was observed by a separate customer in the field. An additional failure was observed while testing the export functionality to verify that the newly added configurable elements are exported properly. The system seems to export correctly, as a file is generated and no errors are detected. Once the file was opened, it was observed that the system failed to export all of the configurable elements and settings to the file and reported no errors. This failure was matched to a separate customer reported failure that was described in the defect repository. After further analysis, this failure is due to the same defect as the one found by the third customer change studied in Section 6.4.3. This defect was not found by this customer as export is not a function they currently use. Since this failure was observed by another customer, it is counted as a known failure. One final failure was observed when testing the impact of this change. Any textual changes to a setting value in the configurable element connected to the newly 140 added element are not saved. The failure is observed by changing the setting value, saving the project, and then checking the value of that setting. This failure was found by another customer when they were updating the settings values of those elements and existed in the defect repository at the time of this study. The final results of this study are shown in Table 13. Table 13. Results from the Third Case Study # of Tests Run Analysis Time (Hours) Test Time (Hours) Change 1: 25 2.5 Change 2: 18 1.5 43 4 GUI System Total: Total Time (Hours) Original Time (Hours) % Time Saved: # of Failures # of Known Failures Detected # of New Failures Detected Reported Failures Found? 2.5 5 42 88.10% 4 3 1 Yes 2 3.5 51 93.14% 4 4 0 Yes 4.5 8.5 93 90.62% 8 7 1 100% 6.5.3 Summary of Third Case Study The third case study shows that performing the testing identified by the firewalls can detect the original customer-found failures based on the configuration changes, as well as additional failures in areas around the change. Seven of these failures were reported by customers at a later point in time than the original configuration change, representing defects that would have been found by ABB before customers observed them. In addition, one new defect was found. This defect was likely observed in the field by a customer but not considered important enough to report back to ABB. 6.6 Fourth Case Study The fourth case study aims to better describe the latent defects which cause the failures reported by customers, specifically latent defects related to configuration and settings changes. In this study, 210 customer defects reported against the embedded controller were studied, as well as 250 customer defects reported on the GUI 141 configuration system. These defects were reported by more then 150 different customers from sites located around the world. 6.6.1 Taxonomy Overview Each defect was classified into a slightly modified Beizer Defect Taxonomy [2]. This taxonomy splits defects into eight main categories, each describing a specific grouping of defects around a specific characteristic. Each individual main category is further refined three additional times, each representing a different, more specific sub level. A defect is then assigned a four digit number with each digit representing a category. The first digit is the main category followed by progressively more detailed subcategories for digits two through four. For example, processing bugs would be 32xx, where the three designates a structural defect and the number two further refines this defect into the subcategory of processing. The last two numbers, shown as x here, would refine the defect to more levels of detail if they were used. It is not necessary to specify a defect to all four levels of the taxonomy. The eight main categories are shown in Table 14. Table 14. Beizer’s Taxonomy’s Major Categories 1xxx Functional Bugs: Requirements and Features 2xxx Functionality As Implemented 3xxx Structural Defect 4xxx Data Defect 5xxx Implementation Defect 6xxx Integration Defect 7xxx System and Software Architecture Defect 8xxx Test Definition or Execution Bugs 142 For this study, only the first two levels of the taxonomy are used and the resulting classification is displayed without the trailing x’s. An overview of the subcategories is shown in tables 15-19 and the surrounding paragraphs. This overview is meant to clarify the taxonomy and its use here, not to describe how the defects were classified. The first subcategory used is Functional Bugs. These defects deal with errors in the requirements themselves. This subcategory includes defects dealing with incomplete, illogical, unverifiable, or incorrect requirements. The specific subcategories can be seen in Table 15. Table 15. Beizer’s Taxonomy’s Functional Bugs 11xx 12xx 13xx 14xx 15xx 16xx Requirements Incorrect Logic Completeness Verifiability Presentation Requirements Changes The second subcategory, Functionality as Implemented, deals with defects where the requirements are known to be correct, but the implementation of these requirements in the software product was incorrect, incomplete, or missing completely. This includes defects due to incorrect implementation, missing cases, incorrect handling of ranges of values, and missing exceptions or error messages. These subcategories can be seen in Table 16. Table 16. Beizer’s Taxonomy’s Functionality as Implemented Bugs 21xx 22xx 23xx 24xx 25xx 26xx Correctness Completeness – Features Completeness – Cases Domains User Messages and Diagnostics Exception Conditions Mishandled 143 The next two subcategories deal with low level developer defects which exist in the source code. Structural defects deals with control flow predicates, loop iteration and termination, and control state defects. The Data defects subcategory deals with defects such as initialization of variables, scope issues, incorrect types, and manipulation of data structures. The subcategories for both of these defect types are shown in Table 17. Table 17. Beizer’s Taxonomy’s Structural & Data Bugs 31xx 32xx 41xx 42xx Control Flow and Sequencing Processing Data Definition, Structure, Declaration Data Access and Handling Two other categories, implementation and integration bugs, deal with errors such as simple typos, code not meeting coding standards, or documentation problems. Specific errors in these types include missing or incorrect code comments, mistyped or copy paste issues, as well as violations of department or company coding standards fit in this category. Integration defects, on the other hand, are errors in the interfaces, both internal and external, that make up the software system. The subcategories for both Implementation and Integration are shown in Table 18. Table 18. Beizer’s Taxonomy’s Implementation & Integration Bugs 51xx 52xx 53xx 54xx 55xx 61xx 62xx Coding and Typographical Standards Violations Internal Documentation User Documentation GUI Defects Internal Interfaces External Interfaces The final two categories of defects deal with System and Test defects. System defects comprise errors in the architecture, OS, compiler, and failure recovery of the system under test. Test defects represent errors found in the test descriptions, 144 configurations, and test programs used to validate the system. These last two subcategories are shown in Table 19. Table 19. Beizer’s Taxonomy’s System and Test Bugs 71xx 72xx 73xx 74xx 75xx 76xx 77xx 78xx 81xx 82xx 83xx 84xx OS Software Architecture Recovery and Accountability Performance Incorrect diagnostic Partitions and overlays Environment 3rd Party Software Test Design Test Execution Test Documentation Test Case Completeness 6.6.2 Embedded Controller Defect Classification The taxonomy was used to classify all customer defects reported against the embedded controller used in the first case study. Specifically, defects which were either contained in configurable elements or exposed from changes to configurable elements were selected. In total, 204 defects were found. Of these, 45 defects were still under investigation at the time of the analysis and no fix had been made. The reports for these defects did contain enough information on the failure to determine if they were configuration related, so they were included in the study. Once all of the defects were studied, 82 of the 204 defects existed in configurable elements or were related to configuration changes. These 82 defects were classified using the taxonomy and the results are shown in Figure 33. Figure 33 shows that the majority of configuration related defects are low level code problems, which are classified as 31, 32, 41, and 42. In addition, a large number of user documentation problems were found. These defects mostly dealt with configuring 145 the system into a state that was not supported by ABB, but the documentation did not explicitly prohibit those configurations. As a result, the documentation was updated to better describe the illegal configurations. 14 Count of PRC ID: 12 10 8 6 4 2 0 11 13 21 22 23 24 25 26 31 32 41 42 55 61 71 72 Classification Figure 33. Classification of Embedded Controller Defects It is possible to map these defect types to the firewalls that have the best chance of finding them. Traditional Firewalls are ideal for detecting defects of type 31 and 32, since these defects deal directly with control flow and internal function processing. In addition, defects of type 21, 22, 23, and 24 are also found by TFWs, as they deal with incorrect or missing implementation of the configurable element, as well as ranges of acceptable values inside that element. These defects are exposed by either settings changes or by adding the element to the configuration. In addition, defects of type 61 are detectable by the TFW, as these defects represent interfaces which are internal to the configurable element itself. Finally, many of the defects of type 55 are found by TFWs. These defects usually just involved making a configuration settings change which leads to either a crash 146 or an obvious violation of the intended change, such as no change in the behavior of the system. In total, 62% of the reported configuration defects can be detected by the TFW. As a validity check, all of the defects found in the first and second case studies by the TFW were classified. The results show that all of these defects were included in these types. Extended Firewalls are ideal for detecting defects of type 41 and 42, since defects of these types involve data access, manipulation, and computation. Many of the defect reports found also show that the observed failure for these types of defects were not contained inside the configurable element, but resided in a different element or the system itself. In addition, defects of type 25 and 26 represent defects that EFWs are effective at finding. These types deal with user messages and exception handling. These defects are usually observed outside the element itself, and mostly are related to data flows. Finally, EFWs are effective at finding defects of type 62 and 63, representing external and configuration interfaces, respectively. In total, 29% of the reported defects in configurable elements can be detected by the EFW. Just as with the TFW types, all of the defects found by EFWs in the first two case studies were classified and their respective classifications were all included in these types. Finally, Deadlock and COTS Firewalls are able to detect 3% of the customer defects. The remaining defects, those of type 71 and 72, require new code change-based firewall models to be created in order to detect them, both for traditional code changes and configuration and settings changes. These types of defects are architectural and performance defects, which include memory leaks, starvation, timing and race conditions, and overall performance stability. Once code change-based firewalls are created for these 147 defect types, the firewalls can be used in the Configuration and Settings Firewall in the same way that the currently existing firewalls are. They are not critical at this point, as they represent only 6% of the total defects, but they do represent the largest group of defects that are not detectable by firewalls as of now. 6.6.3 GUI System Defect Classification The taxonomy was also used to classify recent customer defects reported against the GUI Configuration system used in the second case study. For this product, the 250 most recent customer defects were studied. Of these, 77 were classified using the taxonomy, as they were either contained in configurable elements or were related to changes in the configuration and settings. The result of using the taxonomy to classify these 77 defects is shown in Figure 34. 12 Count of PRC ID Composer Database: 10 8 6 4 2 0 21 22 23 24 25 26 31 32 41 42 51 53 55 62 71 72 74 75 Classification Figure 34. Classification of GUI System Defects Figure 34 shows a few similarities to the classified defects from the embedded controller. First, many of the customers reported configuration defects are low level code 148 problems, classified as 31, 32, 41, and 42. Second, a large number of user documentation problems were found. In this case, the defects dealt with prohibited values, ranges, and combinations which were not explicitly prohibited by either the software or the documentation. Management decided that the documentation would be updated to better describe the illegal configurations, as opposed to prohibiting these modes of operation in the software itself. One difference found in the data between the embedded controller and the GUI configuration system deals with defects in the category of Functionality as Implemented, labeled as 21, 22, or 23. There were more of these defects found in the GUI system than in the embedded controller. Further analysis was done on these defects, and it was determined that the difference was related to the level of detail in the product requirements. For the embedded controller, the detail was high as its functions were well understood and fully documented. The GUI system, on the other hand, had a low level of detail for many parts of the requirements. These low detail requirements led to defects where the implementation was incorrect, type 21, or where the implementation was correct for all existing cases, but certain other cases were missing. Low detail requirements also lead to a large number of defects having a resolution type of Not a Problem, Works as Designed, or Will Not Fix. Defects of these types involve customers submitting defects which represent behavior they believe to be incorrect. Once the defect is received, product management, marketing, and development together decide that the product was never meant to do this, resulting in no code change to resolve the issue. These defects represent a misunderstanding of the requirements between management, development, and the customers. In total, 240 defects of these 149 three types were reported from customers, out of 894 total customer reported defects. This represents 27% of all customer reported defects, compared to only 19% in the embedded controller study. A more formal study of these relationships is ongoing now and is outside the scope of this research. The mapping of defect type to firewall remains the same for the GUI system. TFWs are effective at detecting defects of type 21, 22, 23, 24, 31, 32, 55 and 61. The justification remains the same as for the embedded controller study in 6.6.2. In total, 61% of the reported customer defects for the GUI system can be detected by the TFW. EFWs are effective at detecting defects of type 25, 26, 41, 42, 62, and 63. Overall, defects of these types make up 25% of the total reported costumer defects for this GUI system, and require an EFW to be created. Deadlock and COTS Firewalls are able to find some of the defects of type 72 and all of the defects of type 75. These represented 5% of the total defects. Finally, defects that require new code change-based firewall models to be created represented 9% of the defects. The fourth case study shows the common types of defects that are exposed by configuration changes, as well as the relative frequency of occurrence of each type compared to the other types. In particular, low level coding issues that are only observed when certain configurations and settings are selected represented 47% of the total defects found by configuration change. This may seem higher than expected but analysis of these defects show that the code paths containing the defects were never executed before, due to the prohibitively large number of configurations and settings possible in the system. By mapping certain types of defects to firewalls that have the best chance of finding them, it is shown that TFWs are able to detect 61% of the customer exposed defects. 150 EFWs, even with the larger effort required to create them, are still very beneficial, and can detect 27% of the defects. Deadlock and COTS Firewalls are able to detect 4%, on average, of the customer defects. These defects, while small, usually lead to long fix times and larger customer unhappiness. Finally, on average, 7% of the defects reported are not able to be detected reliably by any of the current firewall models. These defects will require new code change-based firewalls to be created, and the defects were mostly memory leaks and performance issues. 6.7 Fifth Case Study The goal of the fifth and final case study is to show that configurable elements are easier to analyze and test than the system as a whole. Section 4.5 discussed the time complexity of analyzing and creating firewalls for user-configurable systems and identified some key factors which affect the analysis time. These factors include the number of control flow dependencies and the number and length of data flow dependencies, each of which represent paths through the code. In order to describe these important factors, a set of static metrics were selected and collected. These metrics include: 1. Call depth – A measure of the maximum calling depth of a function which is related to the maximum length of a dependency. 2. Fan in – A measure of the number of functions who call a specific function which is combined with fan out to represent the number of control flow dependencies. 3. Fan out – A measure of the number of callers a specific function calls which is combined with fan in. 151 4. Global Variables Used – A measure of the number of global variables referenced in a function which is used to describe the number of data flow dependencies. 5. ELOC / Method – A measure of the size of a function which is used with cyclomatic complexity to quantify the number of tests needed. 6. Cyclomatic Complexity [50] – A measure of the number of independent paths through the code which is used with ELOC / Method. These metrics were calculated from the source code for the embedded controller used in the first study. The directory layout of the source code for this product made it very easy to determine a logical split between configurable elements and the rest of the system. The GUI system used in the second study was not included in this study, as the code for its configurable elements is mixed together with the code for the rest of the system. While it is possible to find each configurable element in the code, splitting configurable elements from the rest of the system for every function and class would take a large effort. Each of these metrics was run on the system with Understand for C++ [51], a commercial tool to calculate source-based metrics with a customizable Perl engine. Each metric was collected twice, once on the system as a whole and once on only the source code representing the configurable elements in the system. Afterwards, the results were imported into Excel and saved. Inside the Excel workbook two sheets were made, one containing all of the functions, and one containing just those functions that exist inside configurable elements. Excel was then used to summarize each metric into small 152 comparable values, including minimum, maximum, mean, standard deviation, and median. These values are simple to calculate and allow the claims made in this study to be validated. They are shown in Table 20, where each metric group has two sets of recorded values: All, representing the value for the whole system, and CEs, representing the value for the source code of the configurable elements. As the value increases for each of these metrics, the analysis and test time increase. Table 20. Summary of Source Metrics Min: Max: Average: St.Dev.: Median: Min: Max: Average: St.Dev.: Median: Call Depth All CEs 0 0 10 9 1.4398607 1.407679 2.1212166 1.839869 0 1 Globals Used All CEs 1 1 206 206 12.951104 14.64268 15.65772 17.04165 7 7 Fan In All CEs 0 1 598 48 7.395059 1.937677 40.05258 3.959914 1 1 LOC / Method All CEs 1 1 578 405 32.74362 26.54504 49.80465 40.15872 15 10 Fan Out CEs 0 1 186 110 8.490187 5.011331 15.6977 10.01778 1 1 Cyclomatic All CEs 1 1 169 70 5.237573 4.21813 8.819084 6.747848 2 2 All When comparing the All column to the CEs column for each metric in Table 20, a decrease of some magnitude is visible for all categories except globals used. After further analysis, it was found that the embedded controller module’s proprietary OS contains no memory manager. Due to this, all allocations of memory are made from a large, statically allocated array which is treated by Understand as a global variable. In addition, all other standard system functions offered by the operating system also access this global memory array, leading to global variable access for all operations on semaphores, inter-task messages and signals through message queues, and all buffer access. This causes too many false positive values for globals used, so it was removed from further analysis. All 153 of the other values showed some amount of decline between the whole source code, in the All column, and the configurable elements, in the CEs column. In order to use these data to form any conclusions, a set of statistical hypothesis tests were conducted on each metric type. A two-sample T-test for unequal variances was used. The hypothesis presented for each of the tests was that there was no difference in the means of the values. If this hypothesis is true, then the conclusion is that the metric calculated for the whole system is not statistically different from the metric calculated for just the configurable elements. Otherwise, if the hypothesis is rejected, the means are proven to be different, and the measure for the configurable elements is smaller than the measure for the program as a whole. Table 21. T-test Results for Call Depth t-Test: Two-Sample Assuming Unequal Variances Mean Variance Observations Hypothesis Mean Diff df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail All Depth 1.439478261 4.495814652 5750 0 3342 0.612732905 0.270047326 1.6453097 0.540094652 1.960674021 CE Depth 1.407679277 3.385116137 1771 The first metric set compared was call depth. This metric measures the largest number of functions which are called sequentially and containing this function in the sequence. As call depth increases, so does the length of control flow paths and possibly data flow paths, if they exist in the system. This tends to leads to a longer EFW creation time. Performing the hypothesis test yielded the result shown in Table 21. These results show that the hypothesis of equal means can not be rejected, as the P value is 0.27 which 154 is greater than 0.05. Therefore, the call depth for the program as a whole and for the configurable elements is the same. The second metric set compared was fan in. This metric represents the number of functions that call the specific function being measured. It is one of the usual measures of coupling in a system. A higher fan in means more functions can be impacted by a change in the function being measured. In addition, a higher fan in leads to a larger analysis time by the TFW and EFW, as each direct caller and callee must be analyzed when creating these firewalls. The results of performing this test show that the hypothesis, that the means are different, can be rejected, as the P value is ~0, which is less then 0.05. Therefore, the fan in of the configurable elements is smaller than the fan in for the entire system. The details are shown in Table 22. Table 22. T-test Results for Fan In t-Test: Two-Sample Assuming Unequal Variances Mean Variance Observations Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail All Fanin 7.395058878 1604.209481 4331 0 4534 8.86137118 5.57795E-19 1.645189773 1.11559E-18 1.960487286 CE Fanin 1.937677054 15.6809209 1765 The third metric set compared was fan out. This metric represents the number of functions that are called from the specific function being measured. This is also one of the standard measures of coupling in a software system. A higher fan out means more functions could be affected by a change in the function being measured. In addition, a higher fan out also leads to a larger analysis time when creating both TFWs and EFWs, 155 as the algorithm looks at each caller and callee. The results of this hypothesis test are shown in Table 23. The results show that the hypothesis can be rejected, as the P value is ~0, which is less then 0.05. Therefore, the fan out of the configurable elements is smaller than the fan out for the entire system. Table 23. T-test Results for Fan Out t-Test: Two-Sample Assuming Unequal Variances All Fanout 8.490187024 246.4176289 4331 0 5015 10.3145661 5.33694E-25 1.645157526 1.06739E-24 1.960437078 Mean Variance Observations Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail CE Fanout 5.011331445 100.3558806 1765 The fourth metric set compared was executable lines of code per method (ELOC / Method). This metric is computed for each method in the system by counting the lines of executable code that exists inside it. Comments, blank lines, and braces are not counted, unless they also have executable code on the same line. This measure is the standard measure for the size of a function. A higher ELOC / Method count usually indicates a function with less potential reuse and less cohesion, as each function is implemented to performing too many unrelated tasks. The results in Table 24 show that the hypothesis can be rejected, as the P value is ~0, which is less then 0.05. Therefore, the ELOC / Method of the configurable elements are smaller than that of the system as a whole. 156 Table 24. T-test Results for LOC / Method Mean Variance Observations Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail All Count LOC / Method 32.74361845 2480.503347 4466 0 3979 5.113989091 1.65139E-07 1.64523667 3.30279E-07 1.960560307 CEs Count LOC / Method 26.54504249 1612.723168 1765 The fifth and final metric set compared was Cyclomatic Complexity [50]. This metric is computed for each method in the system by computing: v(G) = e – n + p, where G is the program’s flow graph, e is the number of edges in the graph, n is the number of nodes in the graph, and p is the number of connected components. As this number increases, the number of paths that must be covered by tests increases. This leads to a larger testing effort, as high code coverage is a common measure of thorough testing [2]. The results of the hypothesis test shows that the hypothesis can be rejected, as the P value is ~0, which is less than 0.05. The details are shown in Table 25. Table 25. T-test Results for Cyclomatic Complexity t-Test: Two-Sample Assuming Unequal Variances Mean Variance Observations Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail All Cyclomatic 5.237572772 77.77624531 4466 0 4194 4.904046521 4.87183E-07 1.645217029 9.74366E-07 1.960529726 CEs Cyclomatic 4.218130312 45.53345795 1765 The fifth case study clearly shows that, for the system studied, configurable elements are smaller, more encapsulated, and less coupled than the system as a whole. 157 This proves that, on average, the underlying code change-based firewalls needed for the Configuration and Settings Firewall are easier to create than they are on the system as a whole. In addition, this data was used for the average case time complexity in Section 4.5. This study presents a set of static measures of the source code used in the first case study. The data collected show that configurable elements have smaller values in each key metric. Since smaller values lead to smaller analysis and test time, configurable elements themselves are easier to analyze and test than the entire system. This was proven statistically and supports the discussion of time complexity in Section 4.5. Finally, the Configuration and Settings Firewall only creates code change-based firewalls on configurable elements, which require less analysis time then those created on the system as a whole. Combining this with the empirical data collected on code changebased firewalls in Chapter 3, which shows the efficiency of these firewalls on userconfigurable systems, the Configuration and Settings Firewall itself is efficient on these systems. 158 7. Conclusions and Future Work User-configurable systems present many difficult challenges to software testers. Combinatorial problems prevent exhaustive testing before release, leaving many latent defects in the software after release. Customers are then at risk to exposing these defects at a later point in time, whenever they make changes to their running configuration. Current methods for regression testing systems are based on code changes and rely on differences in various software artifacts, such as source code, metadata, or executable images, to determine impact on the system. Due to this, current RTS methods are not directly applicable to the problem where configuration changes reveal latent software defects. The Configuration and Settings Firewall was created as a solution to this problem. This method allows incremental testing of user-configurable systems by determining the impact of each customer change on the system as a whole, and determining what retesting is needed, if any. Impact analysis is determined by mapping configurable elements and settings onto the code in the system that implements them and treating that code as changed. Once the mapping is complete, the impact is propagated through the system and tests are selected or created, using existing code change-based RTS methods. In addition, these existing RTS methods have been validated for industrial use on userconfigurable systems. A set of five case studies were performed on the Configuration and Settings Firewall, showing its efficiency and effectiveness at detecting customer found defects in real, deployed industrial systems. These studies analyzed 460 reported failures on two very large user-configurable systems, each of which are used at thousands of locations 159 around the world. The results of the study show that each of the reported customer defects would have been detected by this method, as well as some additional defects found later in the system by other customers and testers. In addition, the analysis time required to create this new firewall is not substantial compared to the cost of diagnosing and fixing the problems found at a customer site. Future research on user-configurable systems is very important, as these systems are becoming more widely used for critical applications. In the short term, a better definition of user-configurable systems is an important contribution still needed. Any definition should cover the degree of configuration allowed, as almost all programs today allow some level of customization. In addition, an understanding of these systems needs to be published. This should include a detailed analysis of the source code, defects, and customer usage of these systems. An initial set of this data is included in this work, but more information needs to be collected and published. The first main area of future work is initial release testing of these userconfigurable systems. Previous work in that area, such as [33, 34], shows some techniques which may work on smaller systems with fewer configurable elements and settings. These techniques will need to be studied on software with a larger number of configurable elements and settings, such as ERP systems and industrial control systems. Chapter 5 presented a proposal for a way to initially test the software before release. This proposal needs to be further refined into a method which can be empirically studied. This may involve adding in elements from previous work testing userconfigurable systems. A main focus on this proposed release testing process is a focus on 160 testing relevant customer configurations. This will allow the method to compliment use of the Configuration and Settings Firewall. Another area of future research involves creating the firewall models for the additional forms of impact analysis identified in Chapter 3, such as performance and memory leaks. Changes that impact these types of dependencies are not currently supported for either code change-based RTS or configuration change-based RTS today. This represents only 6-9% of the defects studied so far, but it is a large risk to its mainstream adoption by industry. Besides the research areas above, which aim to increase the effectiveness of the testing performed, a reduction in the effort required to release these systems should be researched. One possible way to reduce this effort involves applying research in execution profiling to this area. Currently, no real understanding exists on how userconfigurable systems run in the field. It is common to compare the static configurations between two different uses of the system, but execution information provides much more data on how the events and user-interactions caused the system to run. Besides providing a better understanding of the system, these methods may a further reduction in the testing required when using the Configuration and Settings Firewall on two similar customers making the same changes their configuration. This reduction will significantly reduce the overhead that the software vendor does for each customer configuration change. Another reduction is possible with automation. Each system using this method should have access to a configuration differencing tool, either proprietary or third party. These tools, combined with recent advances in semantic and static impact propagation, allow many steps of the firewall creation to be automated. Also, some research into the 161 feasibility of a web system to determine the difference and impact of a configuration change needs to be done. This system would enable customers to propose some changes and get fast feedback on how large of an impact these changes may have on the software. While this system must protect proprietary information, it should be possible to provide a feature or requirement level impact, along with an overall measure of system impact. 162 8. References [1] IEEE, "IEEE Standard Glossary of Software Engineering Terminology," IEEE Standard 610.12, 1990. [2] Beizer, B. Software Testing Techniques, Second Edition. International Thompson Computer Press, Boston, 1990. [3] Bach, J. Satisfice, Inc. ALLPAIRS test generation tool, Version 1.2.1. http://www.satisfice.com/tools.shtml, 2004. [4] Kaner, C, Bach, J., and B. Pettichord. “Lessons Learned in Software Testing: A Context Driven Approach,” Wiley Publishing, New Jersey, 2001. [5] Sommerville, Ian, “Software construction by configuration: Challenges for software engineering research”. ICSM 2005 Keynote presentation, Budapest, September 2005. [6] J. Bible, G. Rothermel, and D. Rosenblum, "A Comparative Study of Course and Fine-Grained Safe Regression Test-Selection Techniques," ACM Transactions on Software Engineering and Methodology, vol. 10(2), pp. [7] L. White and B. Robinson, "Industrial Real-Time Regression Testing and Analysis Using Firewall," in International Conference on Software Maintenance, Chicago, 2004, pp. 18-27. [8] K. Abdullah, J. Kimble, and L. White, "Correcting for Unreliable Regression Integration Testing," in International Conference on Software Maintenance, Nice, France, 1995, pp. 232-241. [9] L. White and K. Abdullah, "A Firewall Approach for the Regression Testing of Object-Oriented Software," in Software Quality Week San Francisco, 1997. 163 [10] L. White, H. Almezen, and S. Sastry, "Firewall Regression Testing of GUI Sequences and Their Interactions," in International Conference on Software Maintenance, Amsterdam, The Netherlands, 2003, pp. 398-409. [11] H. Leung and L. White, "A Study of Integration Testing and Software Regression at the Integration Level," in International Conference on Software Maintenance, San Diego, 1990, pp. 290-301. [12] H. Leung and L. White, "Insights into Testing and Regression Testing Global Variables," Journal of Software Maintenance, vol. 2, pp. 209-222, December 1991. [13] L. White and H. Leung, "A Firewall Concept for both Control-Flow and Data Flow in Regression Integration Testing," in International Conference on Software Maintenance, Orlando, 1992, pp. 262-271. [14] J. Zheng, B. Robinson, L. Williams, and K. Smiley, "Applying Regression Test Selection for COTS-based Applications," in 28th IEEE International Conference on Software Engineering (ICSE'06), Shanghai, P. R. China, May 2006, pp. 512521. [15] R. Arnold and S. Bohner, Software Change Impact Analysis: Wiley-IEEE Computer Society Press, 1996. [16] T. Ball, "On the Limit of Control Flow Analysis for Regression Test Selection," in ACM SIGSOFT International Symposium on Software Testing and Analysis, Clearwater Beach, FL, March 1998. 164 [17] S. Bates and S. Horwitz, "Incremental Program Testing Using Program Dependence Graphs," in 20th ACM Symposium on Principles of Programming Languages, January 1993, pp. 384-396. [18] P. Benedusi, A. Cimitile, and U. D. Carlini, "Post-Maintenance Testing Based on Path Change Analysis," in Conference on Software Maintenance, October 1988, pp. 352-361. [19] D. Binkley, "Reducing the cost of Regression Testing by Semantics Guided Test Case Selection," in International Conference on Software Maintenance, October 1995, pp. 251-260. [20] T. L. Graves, M. J. Harrold, Y. M. Kim, A. Porter, and G. Rothermel, "An Empirical Study of Regression Test Selection Techniques," ACM Transactions on Software Engineering and Methodology, vol. 10(2), pp. 184-208, 2001. [21] R. Gupta, M. J. Harrold, and M. L. Soffa, "An Approach to Regression Testing Using Slicing," in Conference on Software Maintenance, November 1992, pp. 299-308. [22] M. J. Harrold and M. L. Soffa, "Interprocedural Data Flow Testing," in Third Testing, Analysis, and Verification Symposium, December 1989, pp. 158-167. [23] M. J. Harrold and M. L. Soffa, "An Incremental Approach to Unit Testing During Maintenance," in Conference on Software Maintenance, October 1988, pp. 362367. [24] D. Kung, J. Gao, P. Hsia, F. Wen, Y. Toyoshima, and C. Chen, "Change Impact Identification in Object-Oriented Software Maintenance," in International Conference on Software Maintenance, Victoria, B.C., Canada, 1994, pp. 202-211. 165 [25] D. Kung, J. Gao, P. Hsia, F. Wen, Y. Toyoshima, and C. Chen, "Class Firewall, Test Order and Regression Testing of Object-Oriented Programs," Journal of Object-Oriented Programming, vol. 8(2), pp. 51-65, 1995. [26] Dunietz, I. S., Ehrlich, W. K., Szablak, B. D., Mallows, C. L., and Iannino, A. “Applying design of experiments to software testing.” Proceedings of the International. Conference on Software Engineering, 1997, pp. 205–215. [27] T. J. Ostrand and E. J. Weyuker, "Using Dataflow Analysis for Regression Testing," in Sixth Annual Pacific Northwest Software Quality Conference, September 1988, pp. 233-247. [28] G. Rothermel and M. J. Harrold, "Selecting Regression Tests for Object-Oriented Software," in International Conference on Software Maintenance, September 1994, pp. 14-25. [29] A. B. Taha, S. M. Thebaut, and S. S. Liu, "An Approach to Software Fault Localization and Revalidation Based on Incremental Data Flow Analysis," in 13th Annual International Computer Software and Applications Conference, September 1989, pp. 527-534. [30] Cohen, M. B., Dwyer, M. B., and Shi, J. “Interaction testing of highlyconfigurable systems in the presence of constraints,” In Proceedings of the 2007 international Symposium on Software Testing and Analysis. July 2007, pp 129139. [31] Cohen, D. M., Dalal, S. R., Fredman, M. L., and Patton, G. C. “The AETG System: An Approach to Testing Based on Combinatorial Design,” IEEE Transactions on Software Engineering. July 1997. 166 [32] “Software Fault Interactions and Implications for Software Testing.” IEEE Transactions on Software Engineering. June 2004, pp. 418-421. [33] X. Qu, M.B. Cohen and K.M. Woolf, “Combinatorial interaction regression testing: a study of test case generation and prioritization,” IEEE International Conference on Software Maintenance, Paris, October 2007, pp. 255-264. [34] Cohen, M. B., Snyder, J., and Rothermel, G. 2006. “Testing across configurations: implications for combinatorial testing,” SIGSOFT Software Engineering Notes November 2006, pp. 1-9. [35] Poshyvanyk, D., Marcus, A., "Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code", in the Proceedings of the 15th IEEE International Conference on Program Comprehension, Banff, Canada, June, 2007, pp. 37-48. [36] E. Hill, L. Pollock, and K. Vijay-Shanker. “Exploring the Neighborhood with Dora to Expedite Software Maintenance.” International Conference of Automated Software Engineering. November 2007. [37] Dorf, R., Biship, R. “Modern Control Systems,” Eleventh Edition. Prentice Hall, 2008. [38] O’Leary, Daniel. “Enterprise Resource Planning Systems: Systems, Life Cycle, Electronic Commerce, and Risk,” Cambridge University Press, 2000. [39] White, L., Jaber, K., and Robinson, B. “Utilization of Extended Firewall for Object-Oriented Regression Testing.” Proceedings of the 21st IEEE international Conference on Software Maintenance.” Budapest, September, 2005, pp. 695-698. 167 [40] Kuhn, D., and Reilly, M. “An investigation of the applicability of design of experiments to software testing.” Proc. 27th Annual NASA Goddard/IEEE Software Engineering Workshop. 2002, pp. 91–95. [41] H. Do, S. G. Elbaum, and G. Rothermel. “Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact.” Empirical Software Engineering: An International Journal, 10(4):405–435, 2005. [42] Araxis Inc. “Araxis Merge: A two and three way file and folder comparison tool.” http://www.araxis.com/merge/index.html. January 11th, 2008. [43] Basili, V. R. and Boehm, B., "COTS-Based systems Top 10 List," IEEE Computer, 24(5), 2001, pp. 91-93. [44] S. Williams and C. Kindel, "The Component Object Model: A Technical Overview," in MSDN Library, 1994. [45] C. Kaner, J. Zheng, L. Williams, B. Robinson, and K. Smiley, "Binary Code Analysis of Purchased Software: What are the Legal Limits?" Submitted to the Communications of the ACM, 2007. [46] Dickinson, W., Leon, D., and Podgurski, A. “Finding Failures by Cluster Analysis of Execution Profiles.” Proceedings of the 2001 International Conference on Software Engineering, Toronto, May 2001. [47] Dickinson, W., Leon, D., and Podgurski, A. 2001. “Pursuing failure: the distribution of program failures in a profile space.” Proceedings of the 8th European Software Engineering Conference Held Jointly with 9th ACM SIGSOFT international Symposium on Foundations of Software Engineering. Vienna, Austria, September, 2001. pp. 246-255. 168 [48] Pressman, Roger S. “Software Engineering: A Practitioner's Approach,” Sixth Edition. The McGraw-Hill Companies, 2005. [49] Li, P. L., Herbsleb, J., Shaw, M., and Robinson, B. 2006. “Experiences and results from initiating field defect prediction and product test prioritization efforts at ABB Inc.” In Proceeding of the 28th international Conference on Software Engineering, Shanghai, China, May 2006. pp. 413-422. [50] McCabe, Thomas J. “A Complexity Measure.” In IEEE Transactions on Software Engineering, 2(4) 1976, pp. 308-320 [51] Scientific Toolworks Inc. “Understand for C++: A software metrics tool for C/C++.” http://www.scitools.com/products/understand/cpp/product.php. May, 2007. [52] Adam Porter, Atif Memon, Cemal Yilmaz, Douglas C. Schmidt, Bala Natarajan, “Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance.” IEEE Transactions on Software Engineering, August 2007, 33(8), pp. 510--525. 169