Software Testing Process Management by Applying Six Sigma Ljubomir Lazić, SIEMENS d.o.o, Radoja Dakića 7, 11070 Beograd, Serbia&Montenegro, Dušan Velašević, Electrical Engineering Faculty, Belgrade University, Bulevar kralja Aleksandra 73,11070 Beograd, Serbia&Montenegro Nikos Mastorakis, Military Institutions of University Education, Hellenic Naval Academy, Terma Hatzikyriakou, 18539, Piraeus, Greece Abstract:-Software Quality Engineering (SQE) is a comprehensive life-cycle approach concerned with every aspect of the software product development process (SDP). An SQE program includes a comprehensive set of quality objectives, measurable quality attributes (quality metrics) to assess progress towards those objectives, and quantitative certification targets for all component software development processes. Approach, presented in this paper, is to focus on identification and elimination of defects as early in the life cycle as possible, thus reducing maintenance and test costs. Software testing parameters (e.g. software defect detection rates, defect containment, cost etc.) of concern may be estimated and updated, and the software testing strategy is accordingly adjusted on-line. Test cases are selected or generated by the optimal testing strategy and thus constitute an optimal test profile. The software testing process (STP) parameters are estimated on-line and the corresponding optimal actions are determined based on the estimates of these parameters. In order to assure stable i.e. controllable and predictable STP we apply Six Sigma (6σ) for software methodology called DMAIC for “Define, Measure, Analyze, Improve, and Control” because it organizes the intelligent control and improvement of existing software test process. Our 6σ deployment to STP is demonstrated by examples from some project’s experiences. Key-Words:- software testing, validation and verification, test evaluation, quality engineering, metrics, Six Sigma. quality. Our approach is to focus on identification and elimination of defects as early in the life cycle as possible, thus reducing maintenance and test costs [27]. Embedded SQA methods include formal inspection, reviews, and testing. An independent test function or testing which is witnessed by an independent entity such as SQA is one method of providing product assurance. At the beginning of software testing our knowledge of the software under test is limited. As the system/software testing proceeds, more testing data are collected and our understanding of the software under test is improved. Software testing parameters (e.g. software defect detection rates, defect containment, cost etc.) of concern may be estimated and updated, and the software testing strategy is accordingly adjusted online. Test cases are selected or generated by the optimal testing strategy and thus constitute an optimal test profile. In a broad sense, an optimal test profile defines how to apply the right (testing) techniques (or, actions) at the right times in order to achieve the best results. In other words, the software testing process parameters are estimated on-line and the corresponding optimal actions are determined based on the estimates of these parameters. This leads to an adaptive software testing strategy. A non-adaptive software testing strategy specifies what test suite or what next test case should be generated e.g. random testing methods, whereas an adaptive software testing strategy specified what next testing policy should be employed and thus in turn what test suite or next test case should be generated [8] in 1 Introduction Software Quality Engineering (SQE) is a comprehensive life-cycle approach concerned with every aspect of the software product development process (SDP). Schulmeyer [1], defines software quality assurance as “. . . the set of systematic activities providing the evidence of the ability of the software process to produce a software product that is fit for use.” Assurance that the product performs as specified is the role of product assurance. This includes “in process,” or embedded, product assurance, as well as some methods that involve independent oversight. Takes into account customer product requirements, customer quality requirements, and corporate quality requirements, SQE is integrated assurance approach to three attributes of SDP: Quality, Process and Management, leading to these SQE components: Quality Assurance (QA), Configuration Management (CM), Verification & Validation (V&V), Test & Evaluation (T&E). It consists of both process and product assurance. Its methods include assessment activities such as ISO 9000 and CBA IPI (CMM-Based Appraisal for Internal Process Improvement), analysis functions such as reliability prediction and causal analysis, and direct application of defect detection methods such as formal inspection and testing. The purpose of embedded SQA processes is product quality assurance. The activities are part of the development life cycle which will “build-in” the desired product 1 accordance with the new testing policy to maximize test activity efficacy and efficient subject to time-schedule and budget constraint. In order to assure stable i.e. controllable and predictable software testing process we apply Six Sigma (6σ) for software methodology called DMAIC for “Define, Measure, Analyze, Improve, and Control” because it organizes the intelligent control and improvement of existing software test process. The name, Six Sigma, derives from a statistical measure of a process’s capability relative to customer specifications. Six Sigma is a mantra that many of the most successful organizations in the world swear by and the trend is getting hotter by the day. Six Sigma insists on active management engagement and involment, it insists on financial business case for every improvement, it insists on focusing on only the most important business problems, and it provides clearly defined methodology, tools, role definitions, and metrics to ensure success. So, what has this to do with software? The key idea to be examined in this article is the notion that estimated costs, schedule delays, software functionality or quality of software projects are often different than expected based on industry experience. Software projects are very risky. A survey of 8,000 large US software projects (Standish Group Chaos Report 2001) indicates an average cost overrun of 90%, schedule overrun of 120%, and 25% of large projects cancelled due to some combination of delays, budget overruns, or poor quality. Six Sigma tools and methods can reduce these risks dramatically. Experience with 6σ has demonstrated, in many different businesses and industry segments that the payoff can be quite substantial, but that it is also critically dependent on how it is deployed. The importance of an effective deployment strategy is no less in software than in manufacturing or transactional environments. The main contribution of this paper is mapping best practices in Software Engineering, Design of Experiments, Statistical Process Control, Risk Management, Modeling&Simulation, Robust Test and V&V etc. to deploy Six Sigma to the Software Testing Process (STP). Next section 2, describes the Role of the Six Sigma Strategy in Software Development/Testing Process is explained. In section 3, Definition phase of 6σ deployment to the STP is presented. After that, in section 4, the Measurement integration to the STP is demonstrated. In section 5, the Analyze phase of 6σ deployment to the STP is presented. After analyze phase, the actions to STP Improvements by 6σ deployment, tools and methods that can reduce STP costs, schedules delay, efficacy and risks dramatically is proposed. Finally the Control phase of 6σ deployment to the STP is described in order to assure stable i.e. controllable and predictable STP. The typical SDP/STP is a human intensive activity and as such, it s usually unproductive, error prone, and often inadequately done. Because software engineering processes and products include elements of human participants (e.g., designers, testers), information (e.g., design diagrams, test results), and tasks (e.g. “design an object-oriented system model”, or “execute regression test suite”), uncertainty occurs in most if not all of these elements. There are at list four domains of software engineering where uncertainty is evident: Uncertainty in requirements analysis, Uncertainty in the transition from system requirements to design and code, Uncertainty in software re-engineering and Uncertainty in software reuse [9]. As a consequence, software projects are very risky. The key idea to be examined in this article is the notion that estimated costs, schedule delays, software functionality or quality of software projects are often different than expected based on industry experience. It is nearly impossible to completely prevent or eliminate defects in such large complex systems. Instead, various QA alternatives and related techniques can be used in a concerted effort to effectively and efficiently assure their quality. Testing is among the most commonly performed QA activities for software. It detects execution problems so that underlying causes can be identified and fixed. Inspection, on the other hand, directly detects and corrects software problems without resorting to execution. Other QA alternatives, such as formal verification, defect prevention, and fault tolerance, deal with defects in their own ways. Close examination of how different QA alternatives deal with defects can help one better use them for specific applications. How Do We Develop a High-Yield (HY) Process? Possible answer could be Six Sigma Strategy to [6]: Assure High Software quality Emphasize results of “successful” inspections and tests, namely defects and failures Assess cost-effectiveness quantitatively Identify best practices Continuously improve. It can be done by effective and efficient i.e. HY SQA and management based on these best principles and practices: Key engineering principles (Pragmatic Iterative and Incremental Process, Client Orientation– Correctionality, High-Yield and CoverageDriven Design Inspection, High-Yield Testing Process, Well-Integrated Team of highly qualified experts, Monitoring+Integrated Management etc.) Setting Management objectives (in Quality, Productivity and Resources) Integrate Measurement Process (define and measure Process and Product Metrics) Predict, Assess and Monitor Cost of Quality 2 The role of the Six Sigma Strategy in Software Development/Testing Process 2 Identify sources of variation. Improve the target process by designing creative solutions to fix and prevent problems. Create innovate solutions using technology and discipline. Develop and deploy implementation plan. Control the improvements to keep the process on the new course. Prevent reverting back to the "old way". Require the development, documentation and implementation of an ongoing monitoring plan. Institutionalize the improvements through the modification of systems and structures (staffing, training, incentives). Each phase is designed to ensure (1) that companies apply the technique in a methodical and disciplined way; (2) that Six Sigma projects are correctly defined and executed; and (3) that the results of these projects are incorporated into running the day-to-day business. Apply Constant Process Improvement (CPI) based on Optimization techniques of: Product quality, Development cost, Time to market, and Development Resources (personnel and facility). The formulation of sound software engineering policy and practices must be based on statistical analysis of data that is accurate and consistent. The one possible solution is Six Sigma strategy deployment that provides an effective and efficient approach to addressing all of the obstacles to improvement described above. Unlike any other approach to process improvement so far devised, Six Sigma insists on active management engagement and involvement, it insists on a financial business case for every improvement, it insists on focusing on only the most important business problems, and it provides clearly defined methodology, tools, role definitions, and metrics to ensure success. Six Sigma provides the mechanisms and training needed to get us ‘over the hump’ to begin to solve the software mess. It provides specific guidance and a structured process with clearly defined phases (Define, Measure, Analyze, Improve, Control – “DMAIC”) that insures success by invoking a quantitative discipline throughout. Opinion takes a back seat. DMAIC refers to a data-driven quality strategy for improving processes, and is an integral part of the company's Six Sigma Quality Initiative. DMAIC is an acronym for five interconnected phases: Define, Measure, Analyze, Improve, and Control. Each step in the cyclical DMAIC Process is required to ensure the best possible results [10]. Many companies shorten this acronym to MAIC. The distinction is generally one of labels only, as MAIC companies do the define work under ‘Measure.’ The 6σ process steps are: Define the Customer, their Critical to Quality (CTQ) issues, and the Core Business Process involved. Define who customers are, what their requirements are for products and services, and what their expectations are. Define project boundaries the stop and start of the process. Define the process to be improved by mapping the process flow. Measure the performance of the Core Business Process involved. Develop a data collection plan for the process. Collect data from many sources to determine types of defects and metrics. Compare to customer survey results to determine shortfall. Analyze the data collected and process map to determine root causes of defects and opportunities for improvement. Identify gaps between current performance and goal performance. Prioritize opportunities to improve. 3 Define The Define phase is critical in ensuring the success of a Six Sigma project. The project’s purpose and scope is defined and background on the process and customer is obtained. A key deliverable of the Define phase is the Project Charter, which among other items contains the Problem Statement, Goal Statement, Constraints, Assumptions, and Project Plan. In addition to the Project Charter, a high level map of the process is generated along with a list of what is important to the customer. 3.1 STP Problem, identification Goal, Constraint Identify deals with the very front-end explorations of the voices of the internal or external customers (the target software environment, stated and latent needs) and the voice of the company (technology, business, and market issues). SDP/STP requirements are translated to functional, measurable language, to the right level of detail, and verified and characterized with enough representative customers to assure that upcoming concept design work will be done with all the right issues in view and in balance. STP costs, qualities, and delivery schedule are determined by a variety of factors. The most obvious of these factors are; the software product’s required capabilities and attributes i.e. software quality, the SDP/STP process’ maturity and effectiveness, availability and use of appropriate resources and facilities, the project staff’s experience and training, and 3 the nature and structure of project. Although each of these factors can be quantified, the units of measure and the measurements themselves are often discretionary and biased by the experience of the project staff. As a consequence most project data, although useful within the context of the project, is difficult to correlate with data from other projects. One of the checkpoints for Define phase completion should be: Business case – What are the compelling business reasons for embarking on this project? Is the project linked to key business goals and objectives? What key business process output measure(s) will the project leverage and how? What are the rough order estimates on cost savings/opportunities on this project? Problem statement/identification – What specifically is the problem? Where does it occur? When does it occur? What is its extent? Goal statement/identification – What is the goal or target for the improvement team’s project? Do the problem and goal statements meet the SMART (Specific, Measurable, Attainable, Relevant, and Time-bound) criteria? Has anyone else (internal or external to the organization) attempted to solve this problem or a similar one before? If so, what knowledge can be leveraged from these previous efforts? How will the project team and the organization measure complete success for this project? Roles and Responsibilities – By the way, 6σ deals with five ‘W’ such as ‘Who’, ‘What’, ‘When’, ‘Where’ and ‘Why’ e.g. does that happen? In this issue, ‘Who’ is responsible and ‘What’ are for each team member’s role? ‘Where’ is this documented? Project scope – What are the boundaries of the scope? What is in bounds and what is not? What is the start point? What is stop point? How does the project manager ensure against scope creep? Is the project scope manageable? What constraints exist that might impact the team. improvements. There are a variety of approaches for establishing measurement programs that have appeared in the literature. Among the various methods for software measurements [11-13], the Goal-QuestionMetric (GQM) approach [13] is one of the most effective and well-established ones. Professor V. Basili and his research group at the University of Maryland, in close co-operation with NASA Software Engineering Laboratory, developed the GQM method. Since then, the method has also been applied by several software development organizations, including Ericsson, Daimler-Benz, Bosch, Schlumberger and Nokia, among others. The method is based on a simple process by which software developers and managers first define the goals that the software process and its related products must achieve (on organization and project levels), then refine the goals into a set of questions and finally identify the metrics that must be provided to be able to answer the questions. Thus, GQM provides a top-down approach to the definition of metrics, whereas the interpretation of the measured data is done in a bottom-up way. This helps software developers and managers to share a common view of the target of the measurement, knowing both what to measure and for which purpose the measured data will be used. The result of the application of GQM is the specification and implementation of a measurement plan for a particular set of goals and a set of rules for the interpretation of the measurement data within the context of these goals. The GQM model has three levels: 1. Conceptual level (GOAL): A goal is defined for an object (product, process, project or resource), for a variety of reasons, with respect to various models of quality, from various points of view, relative to a particular environment. 2. Operational level (QUESTION): A set of questions is used to characterize the way the assessment / achievement of a specific goal will be performed based on some characterizing model. Questions try to characterize the object of measurement (product, process, etc.) with respect to a selected quality issue and to determine either this quality issue from a selected viewpoint or the factors that may affect this quality issue. 3. Quantitative level (METRIC): A set of data is associated with every question in order to answer it in a quantitative way. The data can be objective (e.g. person hours spent on a task) or subjective (level of user satisfaction). A GQM model has an hierarchical structure starting with a goal, that specifies the purpose of measurement, the object to be measured and viewpoint from which the measure is taken. The goal is refined in several questions, which usually break down the issue into its major components. Each question is then refined into metrics. The same metric can be used in order to answer In this paper we will concentrate on Goal and Problem statement/identification. In our a framework called the Integrated and Optimized Software Testing Process (IOSTP) [3-7], the process of setting goals and refining them into quantifiable questions is complex and requires adequate methodology. The technology that was deployed in IOSTP to alleviate the above problem is the Goal-Question-Metric (GQM) method. GQM was applied in the context of typical project developing/testing software for static analysis and measurement-driven STP control. Such measurements will enable the identification of appropriate areas for improvement in SDP/STP practices and will support the implementation and subsequent monitoring of the 4 different questions under the same goal. Several GQM goals can also have questions and metrics in common, provided that when the measure is actually collected, the different viewpoints are taken into account correctly (i.e. the metric might have different values if taken from different viewpoints). With the GQM method, the number of metrics that need to be collected is focused on those that correspond to the most important goals. Thus, data collection and analysis costs are limited to the metrics, which give the best return. On the other hand, the emphasis on goals and business objectives establishes a clear link to strategic business decisions and helps in the acceptance of measurements by managers, team leaders and engineers. The GQM approach can be used as stand-alone for defining a measurement program or, better, within the context of a more general approach to software process improvement. A good approach for software process improvement, that is compatible with the GQM approach, is the Software Engineering Institute’s Capability Maturity Model (CMM) [14] combined with Six Sigma strategy for improvements implementation presented in this paper i.e. another approach is the Quality Improvement Paradigm (QIP) an iterative, Goal-Measurement-driven framework for continuous improvement of the software development [6,15]. This method is actually an offspring from the development of the GQM one. Because information necessary for applying the GQM method is derived and/or used in every step of QIP, GQM has also been described as the measurement view of the QIP. Product and process goals are handled differently; examples of both types of goals are given following the template. Many following templates for a GQM applied to SDP/STP goals has been developed to indicate the required contents of a GQM goal and thereby to support the goal-setting activity. The template identifies five major aspects, namely the object, purpose, quality focus, viewpoint, and environment of a measurement program. First, the object aspect expresses the primary target of the study; i.e., the process or product that will be analyzed. Second, the purpose aspect expresses how the object will be analyzed; i.e., will it be analyzed for purposes of understanding and characterization, will it be compared with some other object, etc. Third, the quality focus aspect expresses the particular property of the object that will be analyzed in the course of the study, such as cost, reliability, etc. Fourth, the viewpoint aspect expresses information about the group of people that will see and interpret the data. By stating clearly the group to which the analyzed data will be released, issues of confidentiality can be addressed before any data are collected. Finally, the environment aspect expresses the context in which the study will be performed, and is used to make influencing factors explicit. Testing, in our SDP was one of the identified problem areas before Six Sigma deployment [2-7]. Test cases were documented, repeatable and therefore fulfilled the ISO9001 requirements, but efficiency, cost effectiveness, duration and effort of the testing process needed to be improved. The main facts to be addressed were: Not enough attention is paid to the fact that testing is an intellectual and organizational challenging process in need of accurate planning on the one hand but a big potential for improving cost effectiveness on the other. Testing types (module, integration and functional testing) are often mixed up. Clear defined test environments, test cases and test data are often missing. Tests are performed not systematically due to lack of time and resources. Regression testing is insufficient because of poor tool support. Configuration & version management is not well established for applications in the testing stages. Metrics important for the testing process are sometimes missing (defects, testing coverage, defectiveness, efficiency etc.). There are six fundamental issues that are common to all SDP/STP activities. They are: 1. Schedule and Progress 4. Product Quality 2. Resources and Cost 5.Development Performance 3. Growth and Stability 6. Technical Adequacy Successful, effective management requires continuous visibility of each of these issues so that timely and informed adjustments can be made to schedules, budgets, and processes. Frequent sampling and assessment of project measurement data provides that visibility. For each of the fundamental issues there are key questions that the project manager must periodically ask to ensure that the project remains on course and under control. To answer these questions, specific categories of measurement data must be available to the project manager. Among many goals and problems identified in former SDP/STP before 6σ deployment, we will focus in this paper to STP improvement on: Development/Testing Performance and Product Quality- issues. 4 Measure phase The Measure can be generated using data that pinpoints problem location and baselines the current process capability (sigma). The fundamental reason for measuring software and the software process is to obtain data that will help project management address and resolve resource and product issues. Software measurement also provides a basis for effective communications within the project and justifying project decisions to higher management levels. In the 5 recent years measurements play a critical role in achieving effective software development/testing process. The technology that was introduced to alleviate the above problem is aforementioned Goal-QuestionMetric (GQM) method [13]. For each of the fundamental issues there are key questions that the project manager must periodically ask to ensure that the project remains on course and under control. To answer these questions, specific categories of measurement data must be available to the project manager. The issues, key questions related to each issue, and categories of measures necessary to answer the questions are show in Table 1. Table 1. The issues, key questions related to each issue, and categories of measures Issue 1. Schedule & Progress 2. Resources & Cost 3. Growth & Stability 4. Product Quality 5. Development / Testing Performance 6. Technical Adequacy Key Questions Is the project meeting scheduled milestones? How are specific activities and products progressing? Is project spending meeting schedule goals? Is capability being delivered as scheduled? Measurement Category 1.1 Milestone Performance 1.2 Work Unit Progress Is effort being expended according to plan? Are qualified staffs assigned according to plan? Is project spending meeting budget objectives? Are necessary facilities and equipment available as planned? Are the product size and content changing? Are the functionality and requirements changing? Is the target computer system adequate? 2.1 2.2 2.3 2.4 Is the software good enough for delivery? Is the software testable and maintainable? Will the developer be able to meet budget and schedules? Is the developer efficient enough to meet current commitments? How much breakage to changes and errors has to be handled? Is the planned impact of the leveraged technology being realized? By focusing data collection activities on measurement categories that answer the key issue questions the project can minimize resources devoted to the measurement process. Among many Goals and Problems identified in former SDP/STP, before 6σ deployment our focus for STP improvement for 1.3 Schedule Performance 1.4 Incremental Capability Effort Profile Staff Profile Cost Performance Environment Availability 3.1 Product Size & Stability 3.2 Functional Size & Stability 3.3 Target Computer Resource Utilization 4.1 Defect Profile 4.2 Complexity 5.1 Process Maturity 5.2 Productivity 5.3 Rework 6.1 Technology Impacts demonstration purpose in this paper are issues Development/Testing Performance and Product Quality i.e. only to these sampled issues, key questions related to each issue, and categories of measures necessary to answer the questions are show in Table 2 to 6. Table 2. Key questions related to each issue, and categories of measures 4. Product Quality 5. Development / Testing Performance Is the software good enough for delivery? Is the developer efficient enough to meet current commitments? 4.1 Defect Profile 5.2 Productivity being done must be performed to identify inconsistencies or problem areas that might cause or contribute to the problem. Deliverables of this 6σ phase are: data and process analysis, root cause analysis, quantifying the gap/opportunity and checkpoints for completion is to identify gaps between current 5 Analyze phase In the Analyze phase of 6σ deployment, statistical methods and tools are used to identify and confirm root causes of defects. Not only must analysis of the data be performed, but also an in depth analysis of the process to ensure an understanding of how the work is actually 6 Prioritize list of 'vital few' causes (key sources of variation). Verify and quantify the root causes of variation. performance and the goal performance. Root Cause Analysis should be done to: Generate list of possible causes (sources of variation). Segment and stratify possible causes (sources of variation). Table 3. Measurement Category and Specific Measures Measurement Category 4.1 Defect Profile Specific Measures 4.1.1 Problem Report Trends 4.1.2 Problem Report Aging 4.1.3 Defect Density 4.1.4 Failure Interval Table 4. Focus question and specific measure 4 PRODUCT QUALITY Are difficult problems being deferred? Are reported problems being closed in a timely manner? Do report arrival and closure rates support the scheduled completion date of integration and test? FOCUS QUESTION How long does it take to close a problem report? How many problem reports are open? What are their priorities? How many problems reports have been written? How much code is being reused? How often will software failures occur during operation of the system? How reliable is the software? What components are candidates for rework? What components have a disproportionate amount of defects? What components require additional testing or review? What is the program’s expected operational reliability? What is the quality of the software? In order to quantify the Gap/Opportunity answering the questions: What is the cost of poor quality as supported by the team's analysis? Is the process severely broken such that a re-design is necessary? Would this project lend itself to a DFSS (Design For Six Sigma) project? What are the revised rough order estimates of the financial savings/opportunity for the improvement project? SPECIFIC MEASURE 4.1.2 4.1.1 4.1.1 4.2.6 4.1.4 4.1.4 4.1.3 4.1.3 4.1.3 4.1.4 4.1.3 Problem Report Aging Problem Report Trends Problem Report Trends Depth Of Inheritance Failure Interval Failure Interval Defect Density Defect Density Defect Density Failure Interval Defect Density Have the problem and goal statements been updated to reflect the additional knowledge gained from the analyze phase? Have any additional benefits been identified that will result from closing all or most of the gaps? What were the financial benefits resulting from any 'ground fruit or low-hanging fruit' (quick fixes)? What quality tools were used to get through the analyze phase? In this paper our focus is on software Error and Defect Root Cases Analysis applying Defect Classification scheme . Table 5. Measurement Category and Specific Measures Measurement Category 5.2 Productivity 4.1.2 Problem Report Aging 4.1.2 Problem Report Aging 4.1.1 Problem Report Trends Specific Measures 5.2.1 Product Size/Effort Ratio 5.3.2 Functional Size/Effort Ratio 5.8.1 Tracking Defect Containment 7 Table 6. Focus question and specific measure FOCUS QUESTION How efficiently is software being produced? What is Phase Defect Detection effectiveness? What is Defect Escape effectiveness? What is Post-Released Defect number? How much effort was expended on fixing defects in the software product? Is product being developed at a rate to be completed within budget? Is the amount of rework impacting cost or schedule? Is the amount of rework impacting the cost and schedule? Is the planned software productivity rate realistic? What software development activity required the most rework? What was the quality of the initial development effort? SPECIFIC MEASURE 5.2.1 Product Size/Effort Ratio 5.8.1.1 Phase Containment Effectiveness 5.8.1.2 Defect Containment Effectiveness 5.8.1.3 Total Containment Effectiveness 5.3.2 Rework Effort 5.2.1 5.3.2 5.3.1 5.2.1 5.3.2 5.3.1 Product Size/Effort Ratio Rework Effort Rework Size Product Size/Effort Ratio Rework Effort Rework Size and how this analysis can be performed. This question is on one hand interesting because in the set-up of a measurement program the potential for measurement goals answerable with defect classification should be known. On the other hand, knowing methods to analyze defect classification data allows fully exploiting the information and addressing the measurement goals. The cornerstone of the process is causal analysis. Rather than rely on a single developer's opinion of the cause of the fault, each developer is expected to track objective data about each fault. The fault data are entered into a database, and at the end of each phase, a meeting is conducted to analyze the faults collectively to try to determine their causes. The "causal analysis meeting" is a two-hour meeting occurring at the end of every development phase. Management is not invited to these meetings in the hope that developers will be inclined to speak more freely about root causes. The development team as a whole discusses all of the faults tracked in the database. Causal analysis involves abstracting the fault to its associated error to get a better understanding of the fault. When the errors have been abstracted, they are classified into one of four categories. Oversight. The developer did not completely consider some detail of the problem. For example, a case might have been missed or a boundary condition not handled properly. Education. The developer did not understand the process or a portion of the assigned task, presumably because of a lack of training in that area. Communications failure. Information was not received, or information was miscommunicated and thus not understood properly. Transcription error. The developer made a typographical error or other mistake. The developer fully understood the process to be applied, but made a mistake. The researchers feel that, in order to improve processes, more information is needed than what can be provided 5.1 Defect Classification Scheme Defects play a crucial role in software development. This is because on one hand, defects, when detected, should be corrected so that the final version of the developed software artifact is of higher quality. On the other hand, defects carry a lot of information that can be analyzed in order to characterize the quality of processes and products, to track the progress of a project and control it, and to improve the process. Therefore, defect measurement plays a crucial role in many software measurement programs. Consequently, in many measurement programs defect data are collected. Generally there are several pieces of information that can be collected about defects. The most often used pieces of information relate to the quantity of defects (i.e., their number) and their type. For the latter one, defect classification schemes are used to quickly characterize the nature of defects. Defect classification schemes are used by organizations of low and high maturity. For example, Paulk et al., [16] report that many high maturity organizations (CMM level 4 and 5) use defect classification (esp. orthogonal defect classification) for their quantitative management. But also when implementing measurement programs in companies with smaller CMM-levels, defect classification is used very frequently (e.g., Briand et al., [17]). Two important questions arise, when using defect classification. The first question is how a “good” defect classification scheme can be defined. This question is of practical relevance, as many companies want to define and use their own classification scheme, which is specifically tailored to their needs and goals. However, in practice such self-defined classification schemes often impose problems due to ambiguous or overlapping attribute types, or by capturing different aspects of a defect in the same attribute, as also reported by Ostrand and Weyuker, [18], Hirsh et al., [19]. Thus, a systematic approach to define defect classification schemes would be useful. The second question is, for what purposes defect classification data can be analyzed 8 by simply looking at statistics on defect or error classifications for a project. So while the error classification scheme is valuable, it is just a step in the analysis process. The participants in the causal analysis meetings discuss the errors among themselves and try to determine the cause of the error and when it was introduced, with a focus on how to prevent similar errors from occurring again. Part of the process involves searching for similar errors that may still exist in the artifact but have not yet been detected. The goal of this last effort is what the researchers have named "defect extinction" - removing all existing instances of the error and adjusting the development process so that no more instances of this error will ever occur. defect. For example, the ODC attribute Trigger captures the mechanism that allows a defect to surface. For instance, during inspections the inspector classifies a defect according to what he was thinking when detecting the defect while a tester classifies according to the purpose of the test case revealing the defect. Under symptom it is also possible to classify what is observed during diagnosis or inspection. For example, the attribute Type of the IEEE classification scheme provides a very detailed classification of the symptom. End result: End result describes the failure caused by the fault. For example, the ODC attribute Impact captures the impact of a fault (or resulting failure) on the customer. This attribute contains values such as Performance, Usability, Installability, etc. Mechanism: Mechanism describes how the defect was created, detected, and corrected. Creation captures the activity that inserted the defect into the system. The Activity captured the activity that was performed when the defect was detected (e.g., inspection, unit test, system test, operation). Finally, the correction refers to the steps taken to remove the defect. For example, the ODC attribute Type is explicitly defined in terms of activities performed when correcting defects. Many defect classification schemes contain attributes that describe the creation or correction of defects in terms of omission and commission. (e.g., schemes for inspections, HP scheme and attribute mode, and ODC scheme defect qualifier.) These can also be seen as describing how a defect was created and corrected. Cause: Cause describes the error leading to a fault. For example, Mays et al., [24] uses attribute values like Education, Oversight, Communication, Tools, and Transcription for an attribute Cause. Leszak et al., [22] uses different attributes capturing different kind of causes: Human-Related Causes (e.g., lack of knowledge, communication problems, etc), Project Causes such as time pressure or management mistake. Finally, Review Causes describe why the defect slipped potentially through an inspection (e.g., no or incomplete inspection, inadequate participation, etc). Severity: Severity describes the severity of a resulting or potential failure. For example, it might capture whether a fault can actually be evidenced as a failure. Cost: Cost capture the time or effort to locate/isolate a fault and correct it. Typically, such information is captured not by means of a classification but by means of a ratio-scale. However, it is also possible to capture time or effort data on an ordinal scale (thus, being able to use classification). Figure 1, summarizes this information by mapping attributes of the schemes presented above to the key elements. 5.2 Scope of classification: what can be classified There are many aspects of a defect that might be relevant for analysis. Defects are inserted due to a particular reason into a particular piece of software at a particular point in time. The defects are detected at a specific time and occasion by noting some sort of symptom and they are corrected in specific way. Each of these aspects (and more) might be relevant for a specific measurement and analysis purpose. The key elements of a defect have been chosen to be (as far as possible) mutually independent (i.e., orthogonal). Each of the framework’s key elements can be refined leading to many attributes of a defect that can be captured by means of measurement in the form of defect classification. In order to give the reader an idea on what might be possible to measure, explain the key elements and illustrate them with concrete attributes from of existing, common defect classification schemes-IEEE [20], HP [22] and ODC - Orthogonal Defect Classification at IBM [21]. Location :The location of a defect describes where in the documentation the defect was detected. This information can be very detailed and capture a withinsystem identifier (e.g., document name or module identifier). However, the attribute can also contain attribute values describing different high-level entities of the entire system (Specification/Requirements, Design, Code, Documentation, etc.) or describe the faulty document in more detail such as its age or history. Timing: The timing of a defects refers to phases when the defect was created, detected, and corrected. An attribute like Detection Phase can capture the phase during which the defect was detected. Another aspect of timing beside the detection of a defect is the question of when the defect was created and first introduced in the system. This information is usually captured in an attribute Origin that contains process phases as attribute values. Symptom: Symptom captures what was observed when the defect surfaced or the activity revealing the 9 Figure 1. Defect Categorization Scheme [23] Feedback is a very important part of the defect prevention process. First, developers need to know that management takes their comments seriously and that they can influence changes in the development environment. Second, the developers need feedback to keep them up to date on current process changes and on what errors other developers discover, so that they do not make the same mistakes themselves. To serve this end, at the start of each development phase, the technical lead holds a "stage kickoff meeting" for the team. At this meeting, the lead discusses the process for the stage so that each developer knows how to perform the tasks required in the phase. The inputs to and outputs from the stage are also discussed, as is the "common error list." This list is a compilation of errors, which occur frequently during this stage of development, is valuable information for Defect Prevention and STP Improvement. productivity, quality, cost and return. It will also discuss the risks associated with SPI projects and how they can be mitigated. We will argue that high returns are possible for SPI projects if known risks are mitigated. First we will survey the literature on SPI projects. The projects described are generally wildly successful. The various studies have somewhat different ways of measuring cost, productivity, quality and financial benefit. But there is no mistaking the fact that most studies document very significant returns. The studies usually try to measure the cost, benefits, and return on investment (ROI) of SPI efforts. The costs generally come from training, Software Engineering Process Groups (SEPG) to develop and maintain the process, ongoing quality reviews of project artifacts, and tools used to help manage, design, build, and document in an intelligent way. The literature also yields insights into where the value comes from. There are practices described below which, when they are institutionalized and made part of everyday operations, yield improvements in productivity and quality. The benefits are usually estimated using some variant of the following analysis. One source of benefit is reduced defects. First, defect measures are defined e.g. defects per thousand lines of code (for code) or defects per artifact (for artifacts like requirements documents). Second, the average cost of fixing a defect in each phase is estimated. This includes rework and re-testing. As process maturity improves over time and additional measurements are taken, defect rates go down. The benefit is calculated by multiplying the difference in defect rates by the cost of fixing a defect. 6 Improve The Improve phase focuses on discovering, refining, and implementing solutions to the root causes of the problems identified in the Analyze phase. Every attempt should be made to maximize the benefits of proposed solutions. There is substantial evidence that a successful software process improvement (SPI) project will result in improvements in productivity, quality, schedule, and business value. There is also evidence that most SPI projects fail. This paper will present evidence gathered from published studies of successful projects that sheds light on the impact of SPI projects in terms of 10 A second source of financial benefit is increased productivity. Measures of output such as lines of code or function points per month go up with maturity level. Also, reuse becomes more predominant at higher maturity levels and contributes to productivity. Benefit is calculated by multiplying the improvement in production by the average labor rate. Costs of SPI generally include training, maintaining a Software Engineering Process Group (SEPG), cost of process assessments, and cost of inspections and reviews. ROI is calculated based on the benefits and costs over time. Reported returns of 400% to 800% on dollars invested in SPI are not uncommon. Of course, not all SPI programs succeed. Anecdotal evidence and analyses by organizations like the Gartner Group indicate a significant failure rate. Case studies of the failures are generally not written up in the SPI literature. We will present material from the analysts and personal experience describing the risks to SPI projects and what it takes to do them successfully. Lastly we will present some simple improvements that are practically guaranteed to yield significant improvements quickly as well as improvements that will provide added benefits over a longer period of time. The relationship between software process maturity, quality, and productivity has been studied extensively. In this section, we will review the results of several studies. Though the studies differ in the details, they show very substantial improvements in productivity and quality as well as very large returns on investment. Many of the studies are oriented around the Software Engineering Institute’s Capability Maturity Model (CMM). This model describes five levels of process maturity. Each level is associated with particular portions of the process that mark a logical progression up from the next lowest level. Many of the studies document the results of progression through the levels. In one of the early classic studies of the impact of CMM, the Software Engineering Institute (SEI) studied 13 companies using CMM. The companies differed significantly in size, application domain, and approach to process improvement. To compensate for differences in scale, organization, and environment, results were measured within each organization over time. A summary of the study results is in the Table 7, below. Category Total yearly cost of SPI activities Years engaged in SPI Cost of SPI per software engineer Productivity gain per year Early detection gain per year (defects discovered pre-test) Range $49,000 -$1,202,000 1-9 $490 - $2004 9% - 67% 6% - 25% Yearly reduction in time to market 15% - 23% Yearly reduction in post release defect reports 10% - 94% Business value of investment in SPI (value returned on each 4.0 - 8.8 dollar invested) Median $245,000 3.5 $1375 35% 22% 19% 39% 5.0 Table 7. SPI programs success from SEI [25] There is need to evaluate strengths and weakness of various software test methods concerning their effectiveness, efficiency, range, robustness and scalability, combine them to build optimized test scenario for particular SUT based on their strengths but avoid the problems associated with either approach. In order to significantly improve software testing efficiency and effectiveness for the detection and removal of requirements and design defects in our framework of IOSTP [3-7], during 3 years of SPI we calculated overall value returned on each dollar invested i.e. ROI of 100:1.We deployed DFSS strategy by applying Model-Based Testing activities through computer Modeling&Simulation (M&S) and Design of Experiment because building the model is testing and vice versa. This approach, (model first; simulate; test; fixing after each step and then iterate the test results back into the model) combined with Design of Benefits of Successful SPI Projects The studies above and others document the benefits of successful SPI projects. Typical increases in productivity are on the order of 30% to 35% per year. Reported Return On Investment (ROI) figures are in the neighborhood of 500%. Annual reduction in postrelease defect reports is typically around 40%. In most cases, the improvement continues year after year over the duration of the study even for organizations that has been practicing CMM for five or more years. To summarize, the benefits of SPI projects are: High return on investment Increased productivity Shorter schedules Fewer defects Cost savings More accurate estimates and schedules 11 How will control chart readings and control chart limits be checked to effectively monitor performance? Will any special training be provided for control chart interpretation? Is this knowledge imbedded in the response plan? What is the most recent process yield (or sigma calculation)? Does the process performance meet the customer's requirements? Has the improved process and its steps been standardized? Is there documentation that will support the successful operation of the improvement? Does job training on the documented procedures need to be part of the process team's education and training? Have new or revised work instructions resulted? Are they clear and easy to follow for the operators? Is a response plan in place for when the input, process, or output measures indicate an 'out-ofcontrol' condition? What are the critical parameters to watch? Does the response plan contain a definite closed loop continual improvement scheme (e.g., plando-check-act)? Are suggested corrective/restorative actions indicated on the response plan for known causes to problems that might surface? Does a troubleshooting guide exist or is it needed? Transfer Of Ownership (Project Closure) Who is the process owner? How will the day-to-day responsibilities for monitoring and continual improvement be transferred from the improvement team to the process owner? How will the process owner verify improvement in present and future sigma levels, process capabilities? Is there a recommended audit plan for routine surveillance inspections of the DMAIC project's gains? What is the recommended frequency of auditing? What should the next improvement project be that is related to the process? What quality tools were useful in the control phase? Integrating and Institutionalizing Improvements, Knowledge and Learning What other areas of the organization might benefit from the project team's improvements, knowledge, and learning? Experiment is a solution for IOSTP [4]. It produced a set of scenarios used to test target software implementation, fund bugs, serves as test oracles in all test phases and track software development progress. In order to assure stable i.e. controllable and predictable software testing process we apply Six Sigma for software methodology called DMAIC for “Define, Measure, Analyze, Improve, and Control” because it organizes the intelligent control and improvement of existing software test process. Six Sigma is a mantra that many of the most successful organizations in the world swear by and the trend is getting hotter by the day. 7 Control phase To ensure that the same problems do not reoccur, the processes that create the product or service are monitored continuously in the Control phase. In this article we described the process improvement experiment, and in this section our experiences with the supporting tools from Quantitative Software Management (QSM) and the lessons we learned during the experiment. Deliverables of Control Phase are: Documented and implemented monitoring plan, standardized process, documented procedures, response plan established and deployed, transfer of ownership (project closure). Checkpoints for Completion assessment are: Control plan in place for sustaining improvements (short and long-term). New process steps, standards, and documentation are ingrained into normal operations. Operating procedures are consistent. Knowledge gained on process is shared and institutionalized. Response plans established, understood, and deployed. Transfer ownership and knowledge to process owner and process team tasked with the responsibilities. Questions To Determine Appropriate Application are: What is the control/monitoring plan? How will the process owner and team be able to hold the gains? What key inputs and outputs are being measured on an ongoing basis? How will input, process, and output variables be checked to detect for sub-optimal conditions? How will new or emerging customer needs/requirements be checked/communicated to orient the process toward meeting the new specifications and continually reducing variation? Are control charts being used or needed? 12 How might the organization capture best practices and lessons learned so as to leverage improvements across the business? What other systems, operations, processes, and infrastructures (hiring practices, staffing, training, incentives/rewards, metrics/dashboards/scorecards, etc.) need updates, additions, changes, or deletions in order to facilitate knowledge transfer and improvements? preventive actions should be an integral part of any QA plan. Causal analyses can be performed to identify systematic problems and select preventive actions to deal with the problems. To maximize the benefit-to-cost ratio, various risk identification techniques can be used to focus inspection and testing effort on identified highrisk product components. If consequence of failures is severe and potential damage is high, however, they can be used to further reduce the failure probability or to reduce the accident probability or severity. According to the different ways these QA alternatives deal with defects, they can be classified into three categories: defect prevention, defect reduction, and defect containment. The survey and classification of different QA alternatives in this article bring together information from diverse sources to offer a common starting point and information base for software quality professionals. As an immediate follow-up to this study, the authors plans to collect additional data from industry to quantify the cost and benefit of different QA alternatives to better support the related cost-benefit analysis. The plans are to package application experience from industry to guide future applications. These efforts will help advance the state-of-practice in industry, where appropriate QA alternatives can be selected, tailored, and integrated by software quality professionals for effective quality assurance and improvement. In order to significantly improve software testing efficiency and effectiveness for the detection and removal of requirements and design defects in our framework of IOSTP we applied Model-Based Testing activities through Simulation (M&S) and Design of Experiment because building the model is testing and vice versa. It produced a set of scenarios used to test target software implementation, fund bugs, serves as test oracles in all test phases and track software development progress. In order to assure stable i.e. controllable and predictable software testing process we apply Six Sigma for software methodology called DMAIC for “Define, Measure, Analyze, Improve, and Control” because it organizes the intelligent control and improvement of existing software test process. Six Sigma is a strategy and mantra that many of the most successful organizations in the world swear by and the trend is getting hotter by the day because experiences from industry of Six Sigma deployment prove this strategy as solution for business mantra – better, faster, cheaper. It has been said that a good process delivers not only its intended output but also, additionally, information about itself. In this stage of MAIC a team considers how to best instrument the process so that: (1) Defects and other problems will be prevented, or detected at the earliest time (2) Symptoms of trouble will be detected early before much or any damage is done. The information used to flag the trouble will contain enough detail to help focus any containment and improvement work on the cause (diagnostic data). (3) Ongoing, unbiased, performance data can be captured with least effort. In software development document inspections can be an important source of control as they address all these instrumentation issues. Another set of software development controls can be built into various tracking systems from automated data bots in configuration management and document control systems to manual tracking of things like change requests or bug reports. Discriminating among mistakes, errors, and defects can provide important resolution and problem-solving source data. This forms the basis for normalized defect density reporting. Applying best practices in these Software Engineering areas does control function: 1. Project Management, 2. Requirements Management, 3. Statistical and Adaptive Process Control Theory, 4. Risk Management in order to keep improvements stable. The mapping Key Process Area Improvement by Six Sigma Evolution for Software Systems is presented in Table 8. 8 Conclusions and perspectives In general, a concerted effort is necessary with many different QA activities to be used in an integrated fashion to effectively and efficiently deal with defects and ensure product quality. Error removal greatly reduces the chance of fault injections. Therefore, such 13 Level 2 ● Software Project Tracking Subcontract Management ● Quality Assurance Configuration Mgmt. ● Process Focus Process Definition Training Level 3 Integrated SW Management Product Eng. Coordination Intergroup Coordination Peer Reviews Level 4 Quantitative Process Management Quality Management Defect Prevention ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Level 5 Technical and Process Change Mgmt 14 14. Robust Test (DOE) 13. Model Based Improvement 12. Root Cause /Visual Tools ● ● ● ● ● ●● ● ● ● 11. Walkthrough and Audit 10. Computational Intelligence Tools 9. Inspections and Reviews 8. Capability Analysis 7. Risk Management: FMEA 6. Design Planning: QFD 5. Modeling&Simulacion 4. Thinking Statistically 3. Requirements Management ●● ● ● ● ● ● ● Requirements Management Software Project Planning 2. Voice of the Customer Six Sigma Methods & Tools CMM Key Process Area Improvement Roadmap 1. Project Management Table 8. Six Sigma Methods & Tools vs. CMM Key Process Area Improvement Roadmap ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Software Engineering, volume 1, John Wiley & Sons, 1994, pp. 528-532 [14] M. C. Paulk, C. V. Weber, B. Curtis, M. B. Chrissis, "The Capability Maturity Model: Guidelines for Improving the Software Process", Addison-Wesley Publishing Company, 1995, ISBN 0-201-54664-7. [15] David L. Hallowell, Six Sigma Evolution for Software Systems, http://www.6siga.com, URLs cited were accurate as of April 2003. [16] Mark C. Paulk, Dennis Goldenson, and David M. White:”The 1999 Survey of High Maturity Organisations”, Tech. Rep. CMU/SEI-2000-SR-002, Software Engineering Institute, Feb. 2000. [17] Lionel C. Briand, Bernd Freimut, Oliver Laitenberger, Guenther Ruhe, and Brigitte Klein: “Quality Assurance Technologies for the EURO Conversion – Industrial Experience at Allianz Life Assurance”, in Proceedings of the 2nd International Software Quality Week Europe, 1998. [18] Thomas J. Ostrand and Elaine J. Weyuker:” Collecting and Categorizing Software Error Data in an Industrial Environment”, Journal of Systems and Software, vol. 4, pp. 289--300, 1984. [19] Barbara Hirsh, Robert Bleck, and Steven Wood:” A Comprehensive Defect Prevention Program Using the ODC Methodology”, in Proceedings of the 10th International Symposium on Software Reliability Engineering (Fast Abstracts and Industrial Practices), pp. 61--81, 1999. [20-74] Institute of Electrical and Electronics Engineers, IEEE Standard Classification for Software Anomalies, IEEE Std. 1044-1993, 1994. [21-75] Ram Chillarege, Inderpal S. Bhandari, Jarir K. Chaar, Michael J. Halliday, DianeS. Moebus, Bonnie K. Ray, and Man-Yuen Wong: “Orthogonal Defect Classification -- A concept for in-process measurements”, IEEE Transactions on Software Engineering, vol. 18, pp. 943--956, Nov. 1992. [22-76] Marek Leszak, Dewayne E. Perry, and Dieter Stoll: “A case study in root cause defect analysis”, in Proceedings of the 22nd International Conference on Software Engineering, pp. 428-437, 2000. [23-68] Pfleeger, S. Lawrence: “Software Engineering:Theory and Practice”,NJ Prentice-Hall, 1998 [24-69] Mays,R. at all : “ Experiences with Defect Prevention”, IBM Systems Journal, 29(1), January (432),1990. [25] www.bds.com, URLs cited were accurate as of February 2004. References: [1] G. Gordon Schulmeyer, et al, The Handbook of Software Quality Assurance (Upper Saddle River, NJ: Prentice Hall PTR, 1998), p. 9. [2] Ljubomir Lazić. Software Testing Methodology, YUINFO’96, Brezovica, Serbia&Montenegro, 1996. [3] Ljubomir Lazić, D. Velasević. Integrated and optimized software testing process based on modeling, simulation and design of experiment”, 8th JISA Conference, Herceg Novi, Serbia&Montenegro, June 913, 2003. [4] Ljubomir Lazić, D. Velasević, N. Mastorakis. A framework of integrated and optimized software testing process, WSEAS TRANSACTIONS on COMPUTERS, Issue 1, Volume 2, January 2003. [5] Ljubomir Lazić, D. Velasević. Applying simulation to the embedded software testing process, Software Testing, Verification & Reliability, Wiley, 2003, submitted for publishing. [6] Ljubomir Lazić, M. Medan. SOFTWARE QUALITY ENGINEERING versus SOFTWARE TESTING PROCESS, TELFOR 2003, 23-26 November, Beograd [7] Ljubomir Lazić, D. Velasević, N. Mastorakis. The Oracles-Based Software Testing: problems and solutions, WSEAS Multiconference program, Salzburg, Austria, February 13-15, 3rd WSEAS Int.Conf. on SOFTWARE ENGINEERING, PARALLEL & DISTRIBUTED SYSTEMS (SEPADS 2004), 2004. [8] Kai-Yuan Cai. Optimal software testing and adaptive software testing in the context of software cybernetics, Information and Software Technology, 2002; 44:841-855. [9] H. Ziv and D.J. Richardson, Constructing Bayesiannetwork Models of Software Testing and Maintenance Uncertainties, International Conference on Software Maintenance, Bari, Italy, September 1997. [10] GE's DMAIC Approach, http://www.ge.com/capital/vendor/dmaic.htm, URLs cited were accurate as of April 2002. [11] K. H. Moeuller and D. J. Paulish, "Software Metrics: A Practitioner’s Guide to Improved Product Development, Chapman & Hall, 1992. [12] R. B. Grady, D. L. Caswell, "Software Metrics: Establishing a Company-wide Program", Prentice Hall , 1987, ISBN 0-13-821844-7. [13] V. R. Basili, G. Caldiera, H. D. Rombach, ‘The Goal Question Metric Approach’, Encyclopedia of 15