Document 14538966

advertisement
DEFECT PREDICTION MODEL FOR TESTING PHASE
MUHAMMAD DHIAUDDIN BIN MOHAMED SUFFIAN
UNIVERSITI TEKNOLOGI MALAYSIA
ii
DEFECT PREDICTION MODEL FOR TESTING PHASE
MUHAMMAD DHIAUDDIN BIN MOHAMED SUFFIAN
A project report submitted in fulfillment of the
requirements for the award of the degree of
Masters of Science (Computer Science – Real Time Software Engineering)
Faculty of Computer Science and Information System
Universiti Teknologi Malaysia
MAY 2009
iv
ALHAMDULILLAH….
To my beloved parents, my wife, brothers and sisters
who have given me courage and strength
v
ACKNOWLEDGEMENT
In completing the project, there are many individuals who have contributed to the
success of this research. First and foremost, special thanks to my academic supervisor,
Prof. Dr. Shamsul Sahibuddin who has guided me throughout this research work.
Appreciation also goes to my industrial supervisor who is also the Senior Manager of
Test Centre of Excellence department, Mr. Mohamed Redzuan Abdullah for his support
and constructive comment in completing this project.
I am very grateful to my parents and parents’ in-law who always put trust and
faith in me to continue working for this research. Special gratitude goes to my wife who
continually gives her dedicated encouragement to me throughout the tough period. Not
forgotten, thank you to the members of Test COE department for their cooperation and
valuable inputs in ensuring the success of this project. Not forgotten, special thanks to my
Six Sigma coach for his constant cooperation and technical guidance.
Last but not least, great gratitude expressed to the colleagues of Part Time 9 for
Real Time Software Engineering programme. My thanks also go to staffs of Centre for
Advanced Software Engineering (CASE) who have involved directly or indirectly in the
project.
vi
ABSTRACT
The need for predicting defects in testing phase is important nowadays as part of
the improvement initiatives for software production process. Being the group that
ensuring successful implementation of verification and validation process area, all test
engineers in Test Centre of Excellence (Test COE) department are required to play their
part to discover software defects as many as possible and contain them within testing
phase. This research is aimed to achieve zero-known post release defects of the software
delivered to end-user. To achieve the target, the research effort focuses on establishing a
defect prediction model for testing phase using Six Sigma methodology. It identifies the
customer needs on the requirement for the prediction model as well as how the model can
benefits them. It also outlines the possible factors that associated to defect discovery in
testing phase. Analysis of the repeatability and capability of test engineers in finding
defects are elaborated. This research also describes the process of identifying type of data
to be collected and techniques of obtaining them. Relationship of customer needs with the
technical requirements is then explained clearly. Finally, the proposed defect prediction
model for testing phase is demonstrated via regression analysis. This is achieved by
considering faults found in phases prior to testing phase and also the code size of the
software. The achievement of the whole research effort is described at the end of this
project together with challenges faced and recommendation for next research work.
vii
ABSTRAK
Keperluan terhadap meramalkan kecacatan dalam fasa pengujian adalah penting
pada masa kini sebagai sebahagian daripada inisiatif pembaikan untuk proses penghasilan
perisian. Menjadi kumpulan yang memastikan kejayaan perlaksanaan bidang proses
verifikasi dan validasi, semua jurutera pengujian di jabatan Pusat Kecemerlangan
Pengujian adalah diperlukan dalam memainkan peranan mereka untuk menjumpai
kecacatan perisian sebanyak yang mungkin dan membendung kecacatan tersebut dalam
lingkungan fasa pengujian. Penyelidikan ini menyasarkan untuk mencapai kecacatan
sifar diketahui bagi pasca pelepasan untuk perisian yang diserahkan kepada pengguna
akhir. Untuk mencapai sasaran tersebut, usaha penyelidikan bertumpu kepada
mewujudkan model ramalan kecacatan untuk fasa pengujian dengan menggunakan
kaedah Six Sigma. Ia mengenal pasti keperluan pengguna ke atas keperluan model
ramalan dan juga bagaimana model tersebut member manfaat kepada mereka. Ia juga
menggariskan faktor-faktor yang berpotensi dikaitkan dengan penemuan kecacatan dalam
fasa pengujian. Analisa mengenai kebolehulangan dan kemampuan para jurutera
pengujian dalam menjumpai kecacatan turut dihuraikan. Penyelidikan ini juga
menerangkan proses mengenal pasti jenis data yang perlu dikumpul and teknik untuk
memperolehnya. Kaitan keperluan pengguna dengan keperluan teknikal kemudiannya
diterangkan dengan jelas. Akhirnya, cadangan model ramalan kecacatan untuk fasa
pengujian ditunjukkan melalui analisa regresi. Ini dicapai dengan menimbang kesilapankesilapan yang dijumpai dalam fasa-fasa sebelum fasa pengujian dan juga saiz kod untuk
perisian tersebut. Kejayaan untuk keseluruhan usaha penyelidikan dijelaskan di akhir
tesis bersama-sama dengan cabaran yang dihadapi dan cadangan untuk kerja
penyelidikan seterusnya.
viii
TABLE OF CONTENTS
CHAPTER
1
TITLE
PAGE
DECLARATION
iii
DEDICATION
iv
ACKNOWLEDGEMENT
v
ABSTRACT
vi
ABSTRAK
vii
TABLE OF CONTENTS
viii
LIST OF TABLES
x
LIST OF FIGURES
xi
LIST OF ABBREVIATIONS
xii
INTRODUCTION
1
1.1 Introduction
1
1.2 Introduction to Defect Prediction Model for
Software Testing
1
1.3 Background of Company
2
1.4 Background of Problem
3
1.5 Statement of Problem
5
1.6 Objectives of Study
6
1.7 Importance of Study
7
1.8 Scope of Work
7
1.9 Project Schedule
7
ix
2
1.10Project Outline
8
LITERATURE REVIEW ON DEFECT
PREDICTION MODEL FOR TESTING
PHASE
2.1 Introduction
10
10
2.2 Defect Prediction across Software
Development Life Cycle (SDLC)
10
2.3 Reviews on the Defect Prediction across SDLC
and Testing Phase
3
4
5
19
2.4 Applications and Issues of Defect Prediction
20
2.5 Summary of the Proposed Solution
30
METHODOLOGY
31
3.1 Introduction
31
3.2 Six Sigma - DMADV Methodology
31
3.3 Supporting Tools
36
PROJECT DISCUSSION
37
4.1 Introduction
37
4.2 Findings of Define Phase
37
4.3 Findings of Measure Phase
44
4.4 Findings of Analyze Phase
50
CONCLUSION
53
5.1 Achievements
53
5.2 Constraints and Challenges
55
5.3 Recommendation
56
REFERENCES
58
x
LIST OF TABLES
TABLE NO.
TITLE
PAGE
1.1
Project schedule
8
2.1
Short-term defect inflow prediction example
17
2.2
Strength and weakness of defect prediction
techniques
27
3.1
Project team
32
3.2
Customer identification
33
xi
LIST OF FIGURES
FIGURE
TITLE
PAGE
NO.
2.1
Defects detection techniques
12
2.2
Defects per life cycle phase
14
2.3
Defects based on testing metrics
15
2.4
Relationship between CMM levels and delivered
defects
15
2.5
Short-term defect inflow prediction example
16
2.6
Normalized results from the application of CDM
Model to test process
19
2.7
Process Performance Model
22
2.8
Graphical representation of Rayleigh model
parameters
24
2.9
Prediction without process metrics
25
2.10
Prediction with process metrics
25
2.11
High level schematic of whole phase BN
26
3.1
DMADV phases
32
4.1
MIMOS software production process
38
4.2
Schematic diagram
39
4.3
Detail schematic – Y to X tree diagram
40
4.4
Team charter
41
4.5
Customer need statement
42
xii
4.6
1st level of KJ analysis
43
4.7
2nd level of KJ analysis
43
4.8
Kano analysis
44
4.9
House of quality for defect prediction model
45
4.10
Test case experiment result
46
4.11
Assessment agreement
47
4.12
Assessment agreement for within appraiser
47
4.13
Assessment agreement for each appraiser against
standard
4.14
48
Assessment agreement for all appraisers against
standard
48
4.15
Operational definition
49
4.16
Data collection plan
50
4.17
Data for regression
51
4.18
Regression result
51
xiii
LIST OF ABBREVIATIONS
BN
- Bayesian Network
CMM
- Capability Maturity Model
CMMI
- Capability Maturity Model Integration
COE
- Centre of Excellence
COQUALMO - Constructive Quality Model
CUT
- Code and Unit Testing
DfSS
- Design for Six Sigma
DMADV
FMEA
- Design, Measure, Analyze, Design, Verify
- Failure Mode and Effect Analysis
FP
- Function Point
IPF
- In-Process Fault
ISP
- Internet Service Provider
JARING
KJ
KLOC
LOC
- Joint Advanced Research Integrated Networking
- Kawakita Jiro
- Kilo Lines of Code
- Lines of Code
MEMS
- Micro-Electro-Mechanical Systems
MIMOS
- Malaysian Institute for Microelectronic Systems
MOF
- Ministry of Finance
MSA
- Measurement System Analysis
NEMS
PC
PDF
- Nano-Electro-Mechanical Systems
- Personal Computer
- Probability Density Function
xiv
QFD
- Quality Function Deployment
R&D
- Research and Development
SDLC
- Software Development Life Cycle
SEI
- Software Engineering Institute
TER
- Test Effectiveness Ratio
UAT
- User Acceptance Test
V&V
- Verification and Validation
1
CHAPTER 1
INTRODUCTION
1.1
Introduction
This chapter describes the introduction of the research effort presented throughout
this project. It illustrates the overview of the research that encourages the establishment
of defect prediction model for software testing phase. The discussion continues with
background of the research, problem statements, research objectives and the importance
of the research. The scope of work and project outline is then explained in the last
sections of this chapter.
1.2
Introduction to Defect Prediction Model for Software Testing
As an organization that aims to become a premier applied research centre in
frontier technologies, MIMOS has always committed to develop, produce and release
high quality software to the market. One of the key aspects to ensure it can be achieved is
by having effective and efficient software development process throughout entire SDLC.
Thus, prediction or estimation of defects for particular software during testing phase is
very crucial to enhance the testing process as part of process improvement in SDLC.
2
Being the last gate before acknowledging that the particular software is ready to
go to the market requires strong and accurate data and metrics. The initiative on having
defect prediction model for testing phase helps in determining defects that are likely to
occur during test execution and contributes in providing relevant software quality
metrics. Defect prediction model for testing contributes to zero-known post release
defects of a software product. This is determined by defect containment in testing phase.
Predicting total number of defects at the start of testing allows for wider test coverage to
be put in place. As more defects contained within testing phase, it helps in improving
quality of software product being delivered to end user. By using testing metrics for
predicting total defects, it demonstrates the stability of development effort of releasing a
software product.
1.3
Background of Company
MIMOS or Malaysian Institute for Microelectronics System was established on 1st
January 1985 as a unit of Prime Minister’s department following the initiative by group
of academicians led by Tengku Dr. Mohd Azzman Shariffadeen. The initial objective is
to conduct microelectronics research to support the industries as well as to develop
indigenous products. After going through corporatization exercise as a company under
Ministry of Finance (MOF), MIMOS has been focusing on three (3) core functions:
Research and Development (R&D), National IT Policy Development and Business
Development. Since then, MIMOS has embarking on various initiatives and projects
including manufacture affordable personal computer (PC), commission industrial-class
water fabrications plant, launch first Malaysia’s first Internet Service Provider (ISP)
called JARING, initiate Computer Forensic Services and also launch AgriBazaar. On 1
July 2006, Dato’ AbdulWahab Abdullah was appointed as new President and Chief
Executive Officer of MIMOS replacing Tengku Dr. Mohd Azzman Sharifadden. The
appointment of Dato’ Abdul Wahab has turned MIMOS from the R&D organization in
ICT and microelectronics to world class R&D Centre of Excellence.
3
With the tagline “Innovation for Life”, MIMOS is now the premier applied
research centre in frontier technologies aimed at growing globally competitive indigenous
industries. Through smart partnerships with local and international universities, research
institutes, industries and Malaysia Government, MIMOS focuses on frontier technologies
by pursuing exploratory and industry-driven applied research. To date, research and
technology areas in MIMOS are refined into eight (8) technology clusters: Advanced
Informatics, Communication Technology, Cyberspace Security, Encryption Systems,
Grid Computing, Knowledge Technology, Micro Energy, and Micro Systems
(MEMS/NEMS) cluster.
1.4
Background of the Problem
As defect becomes the main intention of software testing, test engineers of Test
Centre of Excellence (Test COE) are expected to find and discover any errors, bugs and
faults in the software through various kinds of testing techniques or strategies. Estimation
of defects to be found upfront at the start of testing phase is very important to strategize
on executing the test for the software. As this research problem concentrates on
formulating a defect prediction model for testing, several issues contribute to this
research problem.
1.4.1
Issue on Better Resource Planning for Test Execution Across Projects
Currently, number of test engineers allocated to a particular testing project is
based on the size of the project including how detail the requirements are and complexity
of the software being developed. At the same time, one test engineer can work in more
than one testing project. Thus, there is a need to estimate the number of defect to be
found in testing phase. Appropriate number of test engineers can be planned and
4
allocated across multiple projects. Estimated number of defects is required to support the
resource planning activities within the testing department to ensure the resources are
optimized and productivity of every test engineer is high.
1.4.2
Issue on Wider Test Coverage to Find Defect
Defects are found while test execution is still in progress. Test engineer applies
various testing techniques to find and discover as many defects as possible based on the
baseline requirements. There is no specific pre-determined factor that a test engineer can
use as a basis to find defects. As a result, test engineer will continue to find defects until
end of the execution schedule. To ensure wider and better test coverage, prediction of
defects could improve the way test engineers find defects as they are now having a target
of defects to be found. This can be in a form of adding more type of testing to use or
adding more and relevant scenarios on how users will utilize the software which results in
better root cause analysis of defects found and improve engineer’s understanding of the
software under test.
1.4.3
Issue on Improving Test Execution Time to Meet Project Deadline
As test engineer need to discover as many defect as possible during text
execution, there might be slip in schedule and cause the delay in the deliverables of the
software work product from the actual planned release date. This due to necessity of
ensuring all testing requirements for the software are fulfilled and covered. Putting defect
prediction in place will reduce and overcome the schedule slippage problem contributed
by testing activities. By having a target on estimated number of defects to be discovered,
every test engineer would be able to plan test execution accordingly to ensure project
deadline is achieved.
5
1.4.4
Issue on Reliability of Software to be Delivered
Defects found during testing phase are given back to development team for bugs
fixing. The fixed software is then retested in several iterations to validate the defects have
been resolved. More defects found within testing phase means more defects are contained
from escaping to field or market that will be using the software. However, test engineers
cannot give the exact figure of defect that be contained within testing phase. This is
where the existence of defect prediction is really needed to provide a direction on how
many defects engineers should discover and contain within the phase. Having this
estimated figure contributes to the zero-known post release defects of the software. In the
long run, the metrics associated with the defect prediction will portray the stability of the
development effort in completing and releasing software product.
1.5
Statement of the Problem
This research is intended to tackle the issues with regard to system testing process
as explained in Section 1.4. The main question to address that is “How to predict the total
number of defects to be found at the start of system testing phase using a model?”.
Followings are the list of sub-questions to support main research questions:
i.
What are the key contributors to test defect prediction model?
ii.
What are the factors that contribute to defect found in system testing
phase?
iii.
How to measure the relationship between the factors of defect with the
total number of defects in system testing phase?
iv.
What is the type of defect category that needs to be considered to calculate
the total defects of the software?
v.
How can the prediction model helps in improving the testing process and
improve the software quality?
vi.
What type of data should be gathered and how to get them?
6
1.6
Objectives of the Study
The research is aimed to achieve following set of objectives to address the issues
and problems mentioned before. The objectives are:
1) To establish a defect prediction model for software testing phase
2) To demonstrate the approach in building a defect prediction model using
Design for Six Sigma (DfSS) Methodology
3) To identify the significant factors that contribute to a reliable defect prediction
model
4) To determine the importance of defect prediction model for improving testing
process
1.7
Importance of Study
As the organization is moving towards achieving CMMI Level 5 status company,
it needs to improve and refine the Verification and Validation (V&V) Process Area. For
this reason, Test Centre of Excellence (Test COE) plays the important role to ensure the
goal is achieved by improving the internal testing process. As the testing approach being
applied is based on V-Model, it is essential to refine all processes with regard to V&V
across all phases in the life cycle, starting from requirement until actual system testing
phase. By having a model to estimate and predict total number of defects in the system
testing phase, it does help the testing team to contribute in achieving the said target.
Putting the defect prediction in the process serves as the preventive mechanism in
reducing the occurrence of defects (Mohapatra and Mohanty, 2001). Furthermore, it
could be beneficial tool to reduce testing time. Reduction in testing time is accomplished
by implementing effective test strategy in minimizing escaped defects while utilizing
resources efficiently. At the same time, the development team can use the model to guide
them in implementing higher quality codes. Obviously by looking from the overall
perspective of software development life cycle, defect prediction model for test does
7
improve the verification and validation process, specifically in ensuring zero-known post
release defects of a software product.
1.8
Scope of Work
In this research, the scope is focused on exploring and establishing a model of
defect prediction specifically for system testing phase. Study on the defects means the
identification of faults, errors and bugs in the software during system testing phase of
software development life cycle. It can be from functional defects, security defects,
usability defects to performance defects. In order to do the prediction for these defects,
analysis is done to determine the factors or contributors to the introduction of defects in
the testing phase. This involves identifying all possible significant factors such as faults
in requirement phase, design phase, code and unit test phase, size of the software, fault
density and historical defects. Moreover, the scope of work also emphasizes on
measuring the capability of test engineers in discovering defects.
The work on modeling is done to establish the relationship between the identified
predictors against the defects found in testing phase. Individual analysis of each possible
predictor need to be performed to determine which factors that has strong connection
with defects. The output is going to be the proposed model that will be normalized to suit
all kind of projects.
1.9
Project Schedule
The research which is also serves as the professional training started from 20
October 2008 until 17 April 2009. However, the actual end date for this project is on 30
May 2009 since it needs to follow the agreed schedule as per Six Sigma Green Belt
project methodology, in which it starts with Define phase, Measure phase, Analyze
8
phase, Design phase and Verify phase. For the purpose of this project, the results and
discussion will be presented up until end of Verify phase schedule. The schedule is
presented as below:
Table 1.1: Project schedule
1.10
Phase
Start Date
End Date
Define
Measure
Analyze
Design
Verify
20/10/2008
01/12/2008
01/02/2009
01/04/2009
01/05/2009
30/11/2008
31/01/2009
31/03/2009
30/04/2009
30/05/2009
Project Outline
This research encompasses the discussion on the several topics and subjects
related to establishing a defect prediction model for testing phase. Thus, the research is
organized into the followings for further understanding of the subject matter.
Chapter 2:
This chapter discusses on the literature review of the defect
prediction model. Discussion involves overview of the several techniques in
predicting software defects across Software Development Life Cycle, some issues
with regard to defect prediction, strategies in predicting defects for software
testing phase as well as application of the defect prediction model in improving
software process.
Chapter 3:
It discusses the research methodology applied in analyzing the
research problem and formulating the proposed solution with regard to Six Sigma
Green Belt DMADV track.
9
Chapter 4:
This chapter outlines the discussion on the outcome of the research
activities that covers the characteristics of data being gathered, analysis of
relationship on possible factors that contributes to the prediction defect as well as
establishing the proposed model that be verified and validated to ensure it is fit to
be incorporated into the software process.
Chapter 5:
This chapter summarizes the research studies that include the
achievement that have been obtained and how the proposed outcome of defect
prediction model for test could contribute to the user. Then, it concludes with the
limitation of the proposed solution together with recommendations for the future
research work.
10
CHAPTER 2
LITERATURE REVIEW ON DEFECT PREDICTION MODEL FOR
TESTING PHASE
2.1
Introduction
This chapter outlines and describes the approaches in predicting the number of
defects to be discovered for a software product, particularly for software testing phase. It
presents the overview of various techniques and models in predicting software defects
across Software Development Life Cycle (SDLC). It then focuses on strategies in
estimating defects for software testing phase using various models. Next, it describes the
application and use of defect estimation with regard to software process improvement and
software quality. Several critiques on defect prediction model are also presented. Finally,
this chapter illustrates the proposed model in predicting and estimating defects for
software testing phase.
2.2
Defect Prediction across Software Development Life Cycle (SDLC)
This section describes the approaches of defect prediction throughout Software
Development Life Cycle (SDLC). It consists of perspectives of defect and defect
prediction, approaches and techniques of defect prediction as well as relationship of
defect prediction with reliability.
11
2.2.1 Perspectives of Defect and Defect Prediction
The term defect itself can be expressed in various ways. Defect is referred as a
flaw in a component or system that can cause the component or system to fail to perform
its required function (Graham, Veenendaal, Evans and Black, 2007). According to Fenton
and Neil (1999), when there is deviation from specifications or expectations of particular
software, then it is also called a defect. Clark and Zubrow (2001) also express defect as
any flaw or imperfection in a software work product or software process. The defect
found is referred to as fault or bug. Regardless of any of the definition, both are
expressing towards common understanding, in which when defect takes place, it may
cause failures in operation of a component or a system.
In the context of defect prediction, it is vital to analyze the defects and understand
the rationale of predicting defects. Predicting defects is important for assessing project
progress and planning the activities of defect detection. Predicting defects also helps
those involved in coming out with the software work product to decide on the work
product quality. Process performance can be assessed thus improve the capability of the
process. That is why defect is always the main subject of defect prediction.
2.2.2
Defect Detection and Defect Prediction
As defect becomes the main focus of defect prediction, we should be able to
distinguish between different defect severities, either major or minor defects. Minor
should not be taken into considerations as it will inflate the estimation of product defects.
From the observations done, most defect prediction depends on historical data.
Furthermore, the techniques used to predict defect vary especially in term of data
required (Clark and Zubrow, 2001). Prediction of defect can require little or more data. It
also can rely on some work product characteristics or only use defect data. These
12
differences in the quality of inputs used for predicting defects will determine the
strengths and weaknesses of a particular defect prediction.
To start off with estimating defects, we must first aware on how defects are
detected and generated. The purpose of understanding the defect detection is to identify
the sources of defect or how defects are discovered. Defects can be detected either from
verification and validation (V & V) process or post-deployment. Figure below
summarizes the defect detection techniques as outlined in the studies by Clark and
Zubrow.
Figure 2.1: Defects detection techniques
In general, defect prediction deals with estimating number of defects or faults.
Defect prediction is usually used interchangeably with other terms such as defect
estimation, fault prediction or fault estimation. Nayak and Naidya (2003) describe defect
estimation as a proactive process of identifying various kinds of defects in the design,
content and code of a software product with the aim to enhance product quality and
performance capability. Having the defect prediction helps in estimating the quality of
software before being released and used by the users. To answer this, again Fenton and
Neil (1999) observed the effort from three areas: predicting the number of defects in a
13
system, estimating the reliability of systems in terms of time to failure, and understanding
the impact of design and testing process on defect counts and defect densities.
This defect prediction is expressed in a form of equations describing the defect
inflow as a function of other selected measurements such as milestone completion status
or lines of code (LOC), either from a short-term or long-term standpoint (Staron and
Meding, 2007). Both standpoints help in monitoring the project status and project
progress in developing software.
2.2.3 Approaches and Techniques of Defect Prediction
Various approaches and techniques have been formulated and applied in
predicting number of defects throughout the entire SDLC. The techniques or approaches
which are presented in a form of model or equation are developed according to several
sources and metrics. Neil and Fenton (1999) presented their findings on how defects are
predicted. First approach is prediction by using size and complexity metrics, in which it
predicts defects directly based on program code, mostly towards lines of code and
McCabe’s Cyclomatic complexity. According to them, a study by Akiyama of Fujitsu,
Japan showed that linear models of some simple metrics provide reasonable estimates for
the total number of defects. From the four equations computed by him, one of them
involves equation on lines of code (LOC) as below:
Defect (D) = 4.86 + 0.018 Lines of Code (L)
They added on the argument by Gaffney that stated relationship between Defect (D) and
Lines of Code (L) was not language dependent due to optimal size for individual modules
with regard to defect. Lipow’s data is used for the prediction:
D = 4.2 + 0.0015 L4/3
14
Further analysis was then conducted by Compton and Withrow who derived the
polynomial equation, in which they concluded that the optimum size for an Ada module
is 83 source statements with respect to minimizing error density. The equation is as
below:
D = 0.069 + 0.00156 L + 0.00000047 L2
Second approach as outlined by Neil and Fenton is predicting defects using
Function Point (FP). It is a measure of number of functionality in requirements for
particular software. This Albrecth Function Point describes defect density prediction by
using metric extracted at specification stage due to believe function point-based metric is
better than lines of code and is language independent.
Figure 2.2: Defects per life cycle phase
Testing metrics is another approach given by Neil and Fenton for predicting
defects. This involves careful collection of data on defects found during inspection and
testing phases. Test coverage measure is one of the testing metrics used to predict defect
via structural testing strategy. The resulting metric is called Test Effectiveness Ratio
(TER) that covers either statement coverage, branch coverage or Linear Code Sequence
and Jump coverage. Examples of how defects are found based on testing metrics is
presented below:
15
Figure 2.3: Defects based on testing metrics
Finally in their findings, Neil and Fenton described the usage of process quality
data to predict the defects of software. This was expressed through the SEI Capability
Maturity Model (CMM) ranking. The table below outlines the relationship between
CMM levels and delivered defects.
Figure 2.4: Relationship between CMM levels and delivered defects
For large software projects, studies by Staron and Meding have produced two
types of prediction model which is defect inflow prediction (2007). One model is for
short-term defect inflow prediction and another is for long-term defect inflow prediction.
Historical data from the defect inflow trends and project plans is used to construct the
short-term prediction model. From the data, multivariate linear regression prediction
model is created, which then is applied in new projects to predict number of defects for a
particular week. This multivariate regression model for short-term prediction is
represented as an equation based on several independent variables as below:
Y = a0x0 + a1x1 + … + anxn
16
From the equation, values for ‘a’ is the coefficient calculated using statistical
regression while method while ‘x’ is the independent variable. Based on the short-term
prediction model, project team members can predict the number of inflow defects to be
found in future week of the project execution. As presented in the example below given
in their studies, project manager should pay more attention to week 13, based on current
situation of the project including number of defects reported in current and previous week
as well as status of planned and accumulated number of packages.
Figure 2.5: Short-term defect inflow prediction example
Software Engineering Institute (SEI) of Carnegie Mellon University also
conducted a study of the level of goodness for particular software. That study by Clark
and Zubrow (2001) emphasized on techniques in defect prediction. They categorized
defect prediction techniques into three areas: project management, work product
assessment and process improvement. Project management covers prediction techniques
such as empirical defect prediction, defect discovery profile, COQUALMO and
orthogonal defect classification. For work product assessment area, it involves fault
proneness evaluation and capture/recapture analysis. As for process improvement area,
defect prevention and statistical process control techniques are used. The descriptions for
all these techniques are presented in the table below:
17
Table 2.1: Short-term defect inflow prediction example
Area
Project
Technique
Description
Empirical Defect Prediction
Number of defects per size (Defect
Management
density)
-
Defect
density
(Number
of
defects/thousands line of codes)
based on historical data
-
Enhanced with historical data on
injection distribution and yield
Defect Discovery Profile
Projection based on time or phrases of
defect density found in process onto
theoretical discovery curve (Rayleigh)
COQUALMO
Defect
prediction
model
for
requirements, design and coding phases
based on sources of introduction and
discovery techniques used
Orthogonal Defect
Classification and analysis of defects to
Classification
identify
project
status
based
on
comparison of current defects with
historical patterns
Work Product Fault Proneness Evaluation
Analysis of work product attributes to
Assessment
(Size, Complexity, Prior
plan for allocation of defect detection
History)
resources (inspection and testing)
Capture/Recapture Analysis
Analysis of pattern of defects detected
within an artifact by independent defect
detention
activities
(inspectors
or
inspection versus test)
Process
Defect Prevention Program
Improvement
Root cause analysis of most frequently
occurring defects
Statistical Process Control
Use of control charts to determine
whether inspection performance was
18
consistent
with
prior
process
performance
2.2.4
Defect Prediction in Testing Phase of Software Development Life Cycle
Approaches used in the previous findings of defect prediction mostly covered the
potential number of defects to be found for all phases in Software Development Life
Cycle but no specific prediction techniques explained for Testing phase. Although there
are some techniques mentioned about defects to be found in System Test phase but the
findings also take into account the defects to be found prior to and after System Test
phase.
The main intention is to understand more on the prediction techniques of defects
to be found specifically for software testing phase of SDLC. Bertolino and Marchetti
(2003) introduced a simple model called Bemar model. This model is used to predict the
expected number of remaining failures in early test phases. It is quite simple since it
predicts number of defects based on intervals of time between subsequent failures. The
model is represented as below:
NF, k = NFTI, k . Ek[F]
NFK is the number of failures for k test intervals, NFTI, k is the number of failures test
intervals based on test information collected during k test intervals, and Ek[F] is the
expectation of failures for k intervals. This Bemar model has been applied for functional
testing and also operational test data. From the results, they concluded that the model
assumes defects detected are distributed over the whole test period. They also suggested
that the model works well to complement reliability growth models.
Sun Microsystems has come out with an approach to simulate and predict the test
process behavior, including prediction of number of remaining defects in its product
release. In the case study conducted at Sun Microsystems, Karcich, Cangussu and Earl
19
proposed a state variable model called as the CDM Model (2003). CDM came from the
developers’ name of the state variable model for their Software Test Process: Cangussu,
DeCarlo and Mathur. Besides using the model to control the test process using failure
intensity as the control variable, the CDM model is also used to calculate the estimated
number of total remaining defects. Having these figures, the manager can decide number
of testing cycles as well as when to stop the testing.
Figure 2.6: Normalized results from the application of CDM Model to test process
2.3
Reviews on the Defect Prediction across SDLC and Testing Phase
There were many studies and researches have been conducted to predict number
of defects across Software Development Life Cycle (SDLC). However, not really many
reviews conducted to specifically determine the estimated defects for System Test phase.
Most of them were really focused on estimating faults or defects for every phase of
SDLC.
20
From the studies presented, various techniques and approaches have been applied
to predict the number of defects across SDLC. The techniques can either use software
size, function point, historical defect data, process-related data, quality-related data, test
process data or by using derivation of existing model such as Rayleigh model and
Bayesian networks as the basis of estimating defects. Mostly, the results are presented in
a form of mathematical equation model.
This project is intended to analyze the various approaches and techniques that
have been put in place in order to come out with prediction model for defects, narrowing
to defects to be found in System Test phase of SDLC. First step in moving forward is to
identify internal predictors or factors contributing to the defects found in particular
software product. This will involve collection of available software, quality, process and
test data. Next action is to analyze the strength of each possible factor and the
relationship with the defects detected. From the results, further analysis will be carried
out to develop a mathematical model suitable for predicting the defects in testing phase.
2.4
Applications and Issues of Defect Prediction
This section describes the application and issues of defect prediction. It consists
of building defect prediction in practice, application of defect prediction, enhancement to
defect prediction as well as issues in defect prediction.
2.4.1
Building Defect Prediction in Practice
It is very crucial to ensure the process of collecting data for predicting the defects
is proper and accurate, so that the data that we used for analysis is correct. Generally,
most of studies follow these steps or guidelines to statistically coming out with estimated
defects:
21
1. Identify parameters or factors that have impact to defect injection in a
software product
2. Gather defect data for past projects in terms of total number of defects
detected
3. Analyze the correlation patterns between the parameters and the total defects
found in past projects
4. Estimate independent parameters for new project
5. Use Linear Regression to estimate total number of defects that may get
injected based on the estimated independent parameters
6. Calculate the total number of latent defects
7. Calculate efficiency required by project
8. Calculate estimated defect rate for each period using Rayleigh Distribution
9. Calculate estimated defect injection rate by phase based on project schedule
10. Plot the S-shaped curve for defect detection pattern
11. Compare Rayleigh curve and actual data to get quantitative estimate
Banerjee and Sekhar (2004) presented their own view for the processes that need
to be followed in establishing a defect prediction model. Their views were based on use
of Regression Analysis as the suitable basis for predicting defects due to its long proven
and established statistical techniques and its goodness can be verified via statistical
analysis. The processes of developing prediction model based on Regression Analysis are
as follows:
1. Gather data on given independent variables and correspondent dependent
variables
2. Determine the form of equation to fit by plotting the dependent and
independent data sets on a special graph such as scatter plot to shows the
existence of statistical relationship
3. Fit an equation depending on number of independent variables either simple
or multiple regression
4. Evaluate the fit using statistics such as Coefficient of Determination (R) or
Standard Error of Estimate (SE)
22
2.4.2
Application of Defect Prediction
Defect prediction is used for various purposes throughout Software Development
Life Cycle (SDLC). Defect prediction model can be used to plan for quality of a software
project based on the capability baseline (Banerjee and Sekhra, 2004). This is described in
the Process Performance Model in which defect prediction model is one of the
importance contributors. Process Performance Model predicts the effort, number or
defect and other related data based on parameters such as schedule and size.
Figure 2.7: Process Performance Model
One of the items in the quality planning as outline by the two authors with regard
to process performance is to control number in User Acceptance Testing (UAT) phase.
For this, it starts with predicting the total number of defects using defect prediction
model, which then being adjusted according to project parameters such as customer
23
quality goals, past data from similar project and type of development methodology used.
Then, the defects are distributed amongst the phases in software life cycle. Next, these
distributed defects are adjusted for three things: to distribute defects early in the life cycle
to achieve zero defects at acceptance phase, to distribute the remaining defects in other
phases as per project scope and also to be used for verification and validation strategy
which involves use of various type of test strategy to tackle more defects. From the result,
project team should be able to derive several measures such as defect per function point
per phase, defects per person month and also review effectiveness. The data will then be
recorded and tracked.
Defect prediction is also used to determine the reliability of software. This is
because defect prediction is also part of the software reliability model. Software
reliability model aims to estimate the reliability of the latent defects of software,
especially when it is available to customers. The defects estimated across the SDLC
provide a basis for describing the probability of the software operating in a given
environment within the design range of input without failure (Thangarajan and Biswas,
2000). Rayleigh Model is chosen to be the suitable software reliability model as it
predicts the expected value of defect density at different stages of life cycle of the project.
The equation presented in the Rayleigh Model is used to predict the number of defects
over time. In order to determine the accuracy of the duration and magnitude of this
Rayleigh Model, specific inputs must be selected. Having good inputs to the model
allows accurate forecast for a specified scenario. Three main factors of the model are
mentioned in several studies: source lines of code in a form of size required to build the
software functionality, productivity index in a form of product efficiency and complexity
as well as peak staffing in terms of human effort required to build and test the software.
Thangarajan and Biswas then explained the nature of the curve of Rayleigh model
indicates the defect removal pattern across the entire life cycle. The measurement of total
defects likely to be occurred from the software being constructed is represented by the
area bounded by the x-axis and the curve as depicted in the figure below:
24
Figure 2.8: Graphical representation of Rayleigh model parameters
From the above figure, an equation of Probability Density Function (PDF) is produced,
which is F (t) = f (K, tm, t). K denotes cumulative defect density, tm represents actual time
unit while t is the time at the peak of the curve.
Good software maintenance also depends on good prediction model. Selecting
good defect prediction model is important for pricing maintenance contracts and
insurance (Li, Shaw and Herbsleb, 2003). It also helps in predicting support costs for
software including maintenance staffing. Defect prediction model helps in planning the
maintenance activities and timing for resolving reported defects. This is because a good
model should be able to simulate occurrences of similar defects in the field. The essential
thing to consider here is the different type of operational setting in which the model is
applied to. The model should be able to work in environment of user-reported defects,
widely-used systems, multi-release systems or commercial systems so that suitable
maintenance activities can be adopted.
25
2.4.3
Enhancement to Defect Prediction
One approach to enhance the defect prediction is by using the process metrics.
Process metrics or process data covers the data that is gathered in and by the problem
tracking system and the configuration management system (Kaszycki, 1999). The data
can be in a form of number of changes since last release, number of faults found since
last release, number of different developers who turned over, versions of this module
since last release or number of features that were added that affected this module. By
using process metrics, it contributes to developing a higher accuracy of defect prediction
model as well as helps in earlier detection of defect in the development process. Figures
below depict the differences between prediction without process metrics and prediction
with metrics.
Figure 2.9: Prediction without process metrics
Figure 2.10: Prediction with process metrics
Another approach to the enhanced defect prediction is through advanced model.
This is achieved via phase level Bayesian Networks (BN) for defect prediction. The
objective is to predict defects and defect rates at different periods across software
development project based on information available at any stage of development and
26
testing (Neil, 2006). This advanced model takes into account several things: how big the
software is, how good the development process is, how good the testing process is and
also chances of successfully of removing defects.
Figure 2.11: High level schematic of whole phase BN
Phase level BN for defect prediction is very useful to predict defects introduction,
prevention, detection and removal. It also covers wider scope starting from specification,
design and development, testing and rework. However, the successful implementation of
this advanced technique depends on capability and maturity of the organization.
27
2.4.4
Issues in Defect Prediction
In producing and implementing defect prediction model in actual software
development practice, there are several issues and concerns have been discussed and
being put forward. In describing the common techniques used to predict defect in their
research, Clark and Zubrow (2001) also provided the strength and weakness of each
technique. The details are presented in the table below:
Table 2.2: Strength and weakness of defect prediction techniques
Technique
Empirical Defect
Strength
•
Prediction
•
Easy to use and
Weakness
•
Requires stable
understand
processes and
Can be implemented with
standardized life cycle
minimal data
•
Does not account for
changes in project,
personnel or platform
Defect Discovery Profile •
Predicts defect density by
•
No adjustment
time period enabling
mechanism for
estimation of defect to be
efficiency of discovery
found in test
processes to account for
changes in product,
personnel, platform or
project will impact
defect predictions
COQUALMO
•
Predict defects for three
•
phases
•
Quantify effects of
different discovery
techniques on detection
and removal of defects
Covers small number of
phases
•
Does not predict test or
post-deployment defects
28
Orthogonal Defect
•
Classification
Classifications linked to
•
•
Requires development of
classification scheme
process provide valuable
insight
Fault Proneness
•
•
Does not account for
Classification takes little
changes in people,
time
process or product
Efficient and effective
•
“In-process” fault
Evaluation (Size,
focus of defect detection
density by module or
Complexity, Prior
activities
component may not
History)
predict operational fault
density; effort may be
misdirected
Capture/Recapture
•
Analysis
Can be used as soon as
•
data are available
Estimates of number of
remaining defects best
when stringent
assumptions are met
Defect Prevention
•
Program
Allows for comparison of
•
Requires sampling of
defect trends overtime to
defects and in-depth
assess impact and ROI for
analysis and
defect prevention
participation by
activities
engineers to identify
root cause
Statistical Process
Control
•
Gives indication of
•
Requires stable process
inspection and
and
real
time
data
development process
collection and analysis
performance
Several critiques have been brought up with regard to current approaches of
defect prediction (Fenton and Neil, 1999). The critiques involve the unknown
relationship between defect and failures (1), problems with multivariate statistical
approach (2), problems of using size and complexity metrics as sole predictors of defects
(3), problems in statistical methodology and data quality (4) as well as false claims about
29
software decomposition and “Goldilock’s Conjecture” (5). Reasons for critique 1 are due
to difficulty to determine upfront the importance of defect by classifying them to
different classes and also variety of how different users use the system resulting in variety
in operational profiles and difficulties to predict which defects cause which failures.
Critique 2 is related to using multivariate techniques, such as factor analysis that involves
producing metrics that cannot be interpreted directly into program features. Explanations
to critique 3 are related to ignorance of the prediction on programmers and designers as
causal effects since faulty code introduced by them, poor design ability that leads to
complex programs and inconsistencies between design modules due to complex design.
Issues in critique 4 are caused by lack of attention to the essential assumptions for
particular statistical technique, removal of data points without proper justification and
less focus between model prediction and model fitting. Finally, critique 5 takes place
because of inaccurate modeling and inference due to unclear relationship between
module size and defect density.
2.4.5
Reviews and Remarks on Application and Issues in Defect Prediction
In order to build a good defect prediction model, it is imperative to have and
follow appropriate steps. It should start with identifying suitable parameters that
influence the introduction of defect in software before moving on with collecting data
with regard to the identified parameters. Then, the data collected from previous step
should be plotted in a graph against the defects found to study and establish statistical
relationship between them in a form of equation. The equation obtained from the graph is
fitted and evaluated using statistical technique such as verifying it using current and
future projects until it satisfies the purpose of the model.
From various reviews conducted, having defect prediction in place helps in
increasing the software product quality while saving maintenance cost and effort. Defect
prediction also facilitates the distribution of testing resources in parallel with defect
30
density. Sources of defects also can be identified through predicting defects. Putting
defect prediction in practice enables the creation of quantifiable metrics to aid the
decision making on software product delivery.
Several critiques towards existing defect prediction have revealed the strengths
and weaknesses of each approach and technique. It also shows the capabilities of existing
models to cater for different objective of defect prediction, thus provides the
opportunities for more improvement to be imposed on these models in improving the
quality and reliability of software.
2.5
Summary of the Proposed Solution
From the above discussion, the author draws several conclusions with regard to
new approach of defect prediction model for software testing phase in SDLC. First, it is
important to set the clear objective of what the proposed model need to achieve when it is
implemented in real software development operations, which is to be able to estimate
total number of defects to be discovered in software testing phase. Getting started with
sample technique will do. Second, identification and collection of appropriate factors’
data that has strong significance with defect need is very essential in defect prediction by
following proper steps or processes, especially historical data. Whatever data that is
available in place could help in determining the suitable prediction technique. This is
because the historical data may drive the model selection. This bring to third conclusion
in which the statistical relationship between the factors and defects must be established in
coming out with the model to determine the correlation between those parameters.
Instead of just focusing on fixing defects, analysis on the patterns against defects can be
carried out. Fourth, verification of the model, in which in a form of equation, must be
performed to ensure the model works and suitable with the internal software production
process.
31
CHAPTER 3
METHODOLOGY
3.1
Introduction
This chapter discusses the research methodology applied towards the
establishment of defect prediction model for testing phase. As this project is a Six Sigma
Green Belt project, the methodology follows the Design for Six Sigma (DfSS) approach,
which is DMADV comprises of Define phase (D), Measure phase (M), Analyze phase
(A), Design phase (D) and Verify phase (V). Then, it explains on the supporting tools
used throughout this research with regard to data gathering, data analysis and also the
establishing the proposed model.
3.2
Six Sigma - DMADV Methodology
As mentioned, the research uses Six Sigma DMADV methodology consists of
Define, Measure, Analyze, Design and Verify phases. As this research will finish in end
of May 2009, the research has been completed until end of Analyze phase. However,
complete activities in all phases are explained throughout this chapter.
3
32
Figure 3.11: DMADV phases
3.2.1
Define Ph
hase
In this ph
hase, businesss opportunitty that leadss to the estabblishment off this researcch
is ideentified. It involves prooducing busiiness scorecaard drill-dow
wn as well as building a
tree diagram.
d
These are derivved from thee organizatioon software production process. Thiis
is whhere the Big
g Y or businness target and
a detail trree diagram
m are definedd. Then, it is
i
requiired to devellop the team
m charter annd build effeective team. Building a team charteer
invollves identifyiing Project Sponsor,
S
Prooject Champion, Leader and Team Members.
M
Table 3.1:
3 Project team
t
Project
P
Type
DMAD
DV
Sponsor
Moham
med Redzuann Abdullah
Champion
C
Moham
med Redzuann Abdullah
Leader
L
Muham
mmad Dhiaudddin Moham
med Suffian
Team
T
Members
M
V.Veeraanjeneya Reddy
Mohd Khairulnizam
K
m Md Daharri
Vivek Kumar
K
Nagesw
wari Kumaraan
Project sccheduling is outlined in this phase to
t determinee start date, end date annd
mpion for eveery phase. Thhen, the actiivity moves to
t identifyinng
signooff date by Project Cham
the cuustomers thaat are directly related andd impacted by
b the projecct.
33
Table 3.2: Customer identification
Customers
Software Tester
Segments
Test COE
Software Developers Software Development
Priorities
Planning & Benchmarking
Benchmarking
Improvement
After that, analysis on voice of customer is done by identifying the customer need
statement, conducting KJ Analysis and performing Kano Analysis. Customer need
statement is important to know what is needed by the identified customers to predict the
defects and what the customers will do when the model is in place. From this exercise,
the author could conduct KJ Analysis to observe the relationship among the list of
customer need statement. Survey form on “What is the contributor for test defect
prediction model” is distributed to all test engineers to get the total score of their needs.
Then, author could perform Kano Analysis to obtain exact customer requirement and
conduct further analysis.
3.2.2
Measure Phase
Customer requirements that have been identified in Define phase is translated into
system or technical requirements. This is done using Quality Function Deployment
(QFD) or House of Quality. QFD helps to determine product development characteristics
by combining customer needs with technical requirements. After building the QFD, it
moves to performing Measurement System Analysis (MSA). MSA that is done here is for
attribute data since the result of test case execution is only PASS or FAIL. For this MSA,
ten (10) test cases with known result of PASS or FAIL are selected. Then, three (3)
testers are chosen to execute all test cases in random for three times. The result of this
will be used to determine repeatability of test engineers in finding defects and their
capability against specified standard.
34
After completing the MSA, next activity is to describe the Operational Definition.
It defines the customer needs followed by the description of each customer need and unit
of measurement that will be used for each need. Then, author outlines the data collection
plan. Data collection plan includes the data that need to be collected with regard to
defined Operational Definition. Furthermore, data collection plan also includes
identifying the sample size of data, sources of data, time to gather the data, mechanism or
ways to obtain the data, persons that will collect the data and measurement unit for data
being collected.
3.2.3
Analyze Phase
This important phase involves several activities that need to be performed. From
the data collected in Measure phase, author identifies significant factors that contribute to
the defect detection in software testing phase. Thus, author selects the data from a few
numbers of projects for analyzing the factors. Author needs to quantify any issues with
regard to the data that has been collected and determine the significant factors by using
comparative methods or by quantifying the design relationship. In this case of
establishing the defect prediction model, author performs regression and correlation
analysis of identified factors against the number of defects.
After completing the regression, author could observe the strong factors that lead
to the defect discovery in testing phase. From the regression, an equation of the
relationship between significant factors against the defects is generated using statistical
software tool. Next, author need to identify and prevent design failure modes. Failure
Mode and Effect Analysis (FMEA) is done to identify potential failure modes, potential
effect to customers, potential causes of failures and probability of failures occurrences
with regard to defect discovery. Author then identifies the design alternatives and
conceptual design of the proposed model. This is done using Pugh Method technique in
35
which alternative design concepts are evaluated and compared against the proposed
model.
3.2.4
Design Phase
As this research focusing on establishing a model and not a product, several
activities in Design phase are skipped. Outcome from the Analyze phase is used to
perform tolerance analysis. The predicted defects from the model are compared with the
actual defects found. Tolerance from that comparison is recorded and analyzed. It has
been set that the actual defects found must be within 10% less or 10% more of the
predicted defect. Then, the performance of the proposed model is evaluated using
scorecard.
3.2.5
Verify Phase
In this last phase, reliability of the proposed model is assessed using statistical
method. Then, author need to perform capability flow-up and scorecard to ensure
customer requirements are met. If there is a need, FMEA will be updated. Finally,
transition plan will be prepared. This is to ensure that the proposed defect prediction
model for test is implemented and incorporated into the software production process.
Final sign-off will be obtained from the relevant parties including Project Sponsor,
Project Champion, Leader and the Process Owner.
36
3.3
Supporting Tools
Throughout the research on establishing the defect prediction model for testing
phase, several software tools or software application are used. The tools or applications
serve as the basis in gathering related data, conducting survey, building graphs and also
performing regression. Followings are the tools used throughout the research:
i)
Rational Clear Case – software for acquiring Test Summary Report that
contains the list of defects for particular projects
ii)
Rational Clear Quest – software for obtaining defect data for particular
projects
iii)
Microsoft Excel – software for recording survey results and MSA result
iv)
Microsoft Power Point – software for documenting the results of each
phase in slides format
v)
Minitab 15 – statistical software tool for building related graphs and
performing regression
vi)
MyMetrics – centralized repository of software quality metrics
37
CHAPTER 4
PROJECT DISCUSSION
4.1
Introduction
This chapter discusses the results and outcome of each phase. It explains the
MIMOS software production process, list of schematic diagram, team charter, customer
need statement, KJ Analysis of customer need and Kano Analysis of Define phase.
Discussion on outcome of Measure phase covers explanation House of Quality for Defect
Prediction Model, MSA results, Operational Definition and Data Collection Plan. Then, it
describes the outcome of Analyze phase including the data for regression, regression
analysis, FMEA result and Pugh Method result.
4.2
Findings of Define Phase
4.2.1
MIMOS Software Production Process
As presented in Figure 4.1, testing team involves in all review session for each
phase, starting from planning until end of system testing phase throughout the software
production process. Test engineers involve in reviewing planning document, requirement
38
analysis document, design document, test planning document and test cases. The software
production process is governed by project management, quality management,
configuration and change management, integral and support as well as process
improvement initiatives, which CMMi. From Figure 4.1, the area of study is the
functional or system test phase. In order to perform further analysis and establish defect
prediction model for system test phase, faults and errors captured in previous phases prior
to testing phase must be considered and investigated.
Figure 4.1: MIMOS software production process
4.2.2
Schematic Diagram
There are two (2) schematic diagrams that have been produced, which are high
level schematic diagram and detail schematic diagram. High level schematic diagram
deals with establishing the Big Y or business target, little Ys, vital Xs and the goal
statement against the business scorecard. In this research, Big Y is to produce software
with zero-known post release defects. As for little Ys, elements that contribute in
39
achieving Big Y are defect containment in test phase, customer satisfaction, quality of the
process being imposed to produce the software and project management. From the little
Ys, it is obvious that testing team involves in ensuring the defect containment in test
phase. There two (2) aspects involved related to this litte Y: potential number of defects
before test phase which is the research interest and number of defects after completing
test phase. The goal statement for this research is “To achieve and implement Defect
Prediction Model for Test in Test Centre of Excellence by 30th May 2009”. This is
presented in Figure 4.2 as below:
Figure 4.2: Schematic diagram
Going into detail schematic diagram, from the Vital X which is potential number
of defects before test, possible factors that contribute to the defect prediction are defined
and summarized in a Y to X tree diagram. Basically, author defines seven (7) main
factors associated to defect prediction. They are software complexity, developer, tester,
test process, fault, historical defects and projects. Software complexity could be in a form
of requirement, programming language used or code size. Developer’s factor involves
knowledge they have in developing the software. Knowledge of tester in testing the
software product is also considered. As for the test process factor, it includes test case
design coverage, test case execution productivity, test tool used and test strategy being
40
applied. Fault factor comprises of requirement fault, design fault, code and unit testing
(CUT) fault, integration fault and test case fault. Historical defect factor consists of defect
severity, defect category and defect validity, while for project factor involves type of
project domain and project thread, either it is application based software or component
based software. These descriptions are exhibited in the Figure 4.3 as below:
Figure 4.3: Detail schematic – Y to X tree diagram
4.2.3 Team Charter
In team charter, author defines the business case, opportunity statement, specific
goal statement, project scope, in scope and out of scope for the project. Business case
explains the relevancy of why defect prediction is needed and how it can improve the
business. Opportunity statement outlines the customer of the project, potential volume
and market share for the project. Specific goal statement is similar with the one that has
been defined in the high level schematic diagram. Project scope defines the application of
Design for Six Sigma (DfSS) using DMADV in the Testing phase of Software
41
Development Life Cycle. For in scope and out of scope section, it details out the location
of project and business that is related and not related to the project.
Figure 4.4: Team charter
4.2.4
Customer Need Statement
Customer need statement involves two main things. First, author identifies
customer need from the point of what are required or factors that could help in predicting
total number of defects in testing phase. Second thing is author observe what the
customer of the project will do once the defect prediction model is established and
incorporated in the process. This is explained in Figure 4.5.
42
Figure 4.5: Customer need statement
4.2.5 KJ Analysis and Kano Analysis
From the customer need statement, author establishes the relationship between the
lists of customer needs. For this purpose, author prepares a survey form and distributes
the form to all test engineers. The survey is meant for collecting the scores from all test
engineers on key contributors for test defect prediction model. The scores are put
accordingly to the related customer needs. Using KJ Analysis, author establishes the
relationship between the customer needs and observes the significance between them.
The outcomes of KJ Analysis are presented below in Figure 4.6 and Figure 4.7.
43
Figure 4.6: 1st level of KJ analysis
Figure 4.7: 2nd level of KJ analysis
After completing KJ Analysis, author performs Kano Analysis. This is where
author determines Kano identifier of the customer needs. Since the project is focused on
establishing defect prediction model for testing phase, the customer need that is
considered is “Estimated total number of defects to be discovered per project”. Kano
identifier is “Must Be” to proceed with further analysis. Figure 4.8 shows Kano Analysis.
44
Figure 4.8: Kano analysis
4.3
Findings of Measure Phase
4.3.1 House of Quality
From the customer needs, it is translated into technical requirement using Quality
Function Deployment (QFD). QFD determines the model characteristics by combining
customer needs with technical requirements. QFD consists of customer requirements,
direction of goodness, system or technical requirements, competitive analysis,
importance, technical analysis and relationship matrix. From the QFD, author observed
that project name, Problem Report number, submission date of defect, fault and inprocess fault are strong factors for defect prediction.
Project scheduling is outned in this phase to determine start date, end date and signoff
date by Project Champion for every phase. Then, the activity moves to identifying the
customers that are directly related and impacted by the project.
45
Figure 4.9: House of Quality for defect prediction model
4.3.2 Measurement System Analysis
Measurement System Analysis (MSA) that has been conducted is the attribute
MSA since result of test case execution is either PASS or FAIL. MSA begins with
identifying ten (10) test cases with known result of PASS and FAIL. Then, three (3) test
engineers are selected and execute the test cases in random. This is repeated three (3)
times for every engineer. The result is recorded in Microsoft Excel as below:
46
Figure 4.10: Test case experiment result
The result as above is then transferred to Minitab software for further attribute
assessment agreement. The assessment is done to evaluate the agreement within
appraisers, each appraiser against standard and all appraisers against standard. For
attribute agreement within appraisers, the MSA result is PASS since it shows 100%
assessment agreement and shows Kappa value of 1 which demonstrates perfect
agreement. Thus, it proves strong repeatability in achieving test result within tester. As
for attribute agreement for each appraiser against standard, the result is also PASS since
Kappa value shows the value of more than 0.7 or more than 70%. This demonstrates
acceptable result of accuracy against standard. Finally, for MSA of all appraisers against
standard, the result is PASS. Thus, it summarizes that overall MSA being conducted is
PASS with Kappa value of more than 0.7 or 70%. These results are shown in following
figures.
47
Figure 4.11: Assessment agreement
Figure 4.12: Assessment agreement for within appraiser
48
Figure 4.13: Assessment agreement for each appraiser against standard
Figure 4.14: Assessment agreement for all appraisers against standard
49
4.3.3
Operational Definition and Data Collection Plan
Operational definition describes the type of data that need to be collected,
definition for each data as well as unit of measurement used for each data. The
operational definition that has been prepared is summarized as below:
Figure 4.15: Operational definition
From the operational definition, a plan has been established to determine when to
collect and obtain the data from the respective sources. The data collection plan consists
of data that need to be collected as specified in operational definition, description of each
data, sample size, sources of data, time to collect the data, methods to extract the data,
responsible person to extract the data and unit of measurement that will be used for every
data extracted. The plan is presented in following figure:
50
Figure 4.16: Data collection plan
4.4
Findings of Analyze Phase
4.4.1
Regression
To perform the regression, right data must be obtained to ensure correct
regression is performed. Below is the data collected and used to perform the regression
analysis.
51
Figure 4.17: Data for regression
Using the above data, author performs regression using Minitab. In Minitab,
multiple regression option is chosen to do the regression. For the regression, factors that
being considered are faults in requirement, faults in design, faults in CUT, total faults, inprocess fault which is faults divided by code size or KLOC and code size itself. Defects
in this case are the defects raised as functional defects. Non-functional defects such as
usability or performance defects are not considered to conduct the regression. The
regression result is presented below:
Figure 4.18: Regression result
52
From the regression result, since total fault is highly correlated with other factors,
it is removed from the equation. With R-Square values of 80.2%, the model equation to
predict the defects is summarized as:
Defect = -1.27 - 0.025 Requirement – 0.026 Design + 0.320 CUT +
0.207 + 0.604 KLOC
It is also observed that KLOC and CUT are the strong factors in predicting defects
by looking at the P-value of 0.009 and 0.091 respectively. However, all factors are
considered to avoid bias in establishing the defect prediction model. This regression
result will be used to complete another two phases: Design and Verify phases.
53
CHAPTER 5
CONCLUSION
5.1
Achievements
Although the research is not yet completed due to end date specified is in end of
May 2009, several beneficial achievements have been observed and obtained with regard
to establishing a defect prediction model for testing phase. Towards the end of the
project, the objectives outlined beforehand have been achieved. The mathematical
equation generated from the regression analysis has demonstrated that defect prediction
model could be constructed with the existence of identified factors. From the model
equation, author able to discover the strong factors that contribute to the number of
defects in testing phase. In addition, author also realized that other important factors are
also need to be considered and incorporated since those factors are also significant in
predicting defects in testing phase.
Moreover throughout the research studies, author has been able to demonstrate the
success of Six Sigma methodology in building a defect prediction model for testing
phase. Each and every phase of the Six Sigma approach has allowed the research to be
conducted in a very structured and systematic ways by having proper planning and
analysis for every deliverables. Design for Six Sigma (DfSS) methodology provides
opportunities to the author to clearly determine what needs to be achieved from the
research, issues to be addressed, data to be collected, what needs to be measured and how
the model is generated and constructed.
54
Technically, in building the defect prediction model, it is observed that many
factors contribute to the defect discovery in testing phase. Obviously, faults in
requirement, design and coding as well as in-process faults have their own relationship
with defects. Code size in a form of kilo lines of code also affects the number of defects
found in testing phase. By extracting the correct data from right sources, author able to
conduct proper and details analysis on the identified factors while at the same time,
proves that all factors must be considered in predicting defects for testing phase. On the
other hand, while performing the measurement analysis and regression, author has been
exposed to the usage of Minitab software, a powerful statistical solution. This has
allowed author to have in-depth knowledge on the statistical knowledge and how
importance the statistic is in improving the internal process.
As outlined in the business case of the team charter, this research has
demonstrated the importance of defect prediction model in improving the internal
software production process, specifically the testing process. Although the project is still
on going until end of May 2009, the research shows that defect prediction model provides
strong contribution to zero-known post release defects of particular software product
since testing is the last gate in the process before the software can be said as fit for release
and use. Test engineers will discover as many defects as possible to ensure all defects are
contained within the testing phase and not escaping to the end-user. Additionally, having
a predicted number of defects allows for better resource utilization of test engineers for a
project by allocating appropriate number of testers to test the software. Better test
strategy and wider test coverage could be implemented by having predicted number of
defects. This can be achieved practically since every test engineer will be aware of the
potential defects that they will discover. The tolerance of 10% lesser or 10% greater of
actual defects found against the estimated defects could be their guide in testing the
software product. Indirectly, having estimated number of defects in testing phase
promotes the initiatives of the whole software development process, especially in
ensuring stability of development effort in releasing a software product.
55
Furthermore, this project also shows the importance of effective communication
between the team members as well as related parties that involved in gathering the
software quality metrics. Author has successfully delegated related tasks to respective
team members and ensuring they are completed successfully. These can be seen from
performing data gathering, measurement system analysis and identification of customer
needs. Besides that, effective communication is also applied when acquiring the data on
software quality metrics from MyMetrics application. This is very crucial to ensure the
data acquired are correct and reliable.
5.2
Constraints and Challenges
Throughout the research period, several constraints and challenges have been
faced by the author. However, those challenges have been tackled accordingly to ensure
the success of research effort until end of Analyze phase. First challenge that took place
is when the author needs to collect the historical defects data of the selected projects.
There are two sources to obtain the data: Test Summary Report and Rational Clear Quest.
Author needs to extract only valid defects data from Rational Clear Quest, which means
defects data with rejected status or defects that were raised out of testing phase are not
considered. Author need to go through the defects data one by one for each selected
project with assistance of the query provided in the system. Next, author also needs to
compare that defects data extracted from the system are tally with the one reported in the
Test Summary Report. However, sometimes the data are not matched between these two
sources. Thus, author has to verify with respective test leads for that particular project to
get the correct results.
Other challenge that has been faced was during measurement system analysis of
the test case result. The MSA has to be conducted twice due to FAIL result in the first
MSA activity. Author has to identify the reason on why inconsistency happened in
executing the random test cases that leads to wrong test case result against the standard
56
result. Next constraint or challenge is difficulty in obtaining the data on software quality
metrics. This due to the no full access given to access MyMetrics system thus resulting in
less quality data can be extracted to perform further analysis. Author has to wait for
quality engineers to give the data and due sometimes it caused delay to the schedule.
One more challenge happened when conducting regression analysis on the
extracted data. First round of regression consists of data that involves outliers due to
bigger number of requirements fault recorded. The regression result looks promising but
since the data used involves outliers, it cannot be considered as the best model. Second
round of regression is done and the result also looks promising. To agree with this latest
equation, author has to get consensus from the Project Champion so that the author can
proceed with the next phases.
5.3
Recommendation
To date, analysis of the proposed defect model is still being done until its
completion in end of May 2009. However, from the research effort being done since start
of this project, author already observes the improvements and recommendations that
could be done. First recommendation is to consider more factors besides current factors
that have been identified. Next research effort can focus on considering other factors with
detail analysis. As of current effort, author only considers code size factor, in-process
fault (IPF) and faults found in phases prior to testing phase. Moving forward, author can
consider test case fault, test case coverage, test case productivity and defect severity as
other factors that lead to defect discovery in testing phase. Other than, as current research
area focuses on predicting total number defects regardless on severity or duration of
testing activities, future effort can focus on improving the defect prediction model to
predict defect severity in testing phase. For example, the model can predict how many
critical or major defects can be found in testing phase. The model also can focus on
predicting number of defect found over time until end of test execution activities.
57
Other recommendation may include incorporation of this defect prediction model
with other established software reliability model, such as Musa model or Shooman’s
model. This could help in enhancing the confidence and reliability of the software being
released to the customer or end-user. Finally, this model can be improved by splitting it
to accommodate different project thread. Current model serves as generalized model to
govern the prediction of defects for all project threads. In the future, specific defect
prediction model can be constructed to cater for different project thread. It means that
there will be a defect prediction model for application-based project and another model
for component-based project.
58
REFERENCES
1. Clark, B. and Zubrow, D. (2001). How Good is the Software: A Review of Defect
Prediction Techniques. Software Engineering Symposium. Carnegie Mellon
University.
2. Fenton, N.E. and Neil, M. (1999). A Critique of Software Defect Prediction
Models. IEEE Transactions On Software Engineering. Volume 25, No.5.
3. Grottke, M. and Dussa-Zieger, K (2001). Prediction of Software Failures Based
on Systematic Testing. Ninth European Conference on Software Testing Analysis
and Review. Stockholm.
4. Mohanty, B. and Mohapatra, S. (2001). Defect Prevention Through Defect
Prediction: A Case Study at Infosys. Proceedings of IEEE International
Conference on Software Maintenance.
5. Nayak, V. and Naidya, D. (2003). Defect Estimation Strategies. Patni Computer
Systems Liited. Mumbai.
6. Neuendorf, S. (2004). Prediction of Software Defects. SASQAG 2004.
7. Ostrand, J.T. and Weyuker, E.J. (2007). How to Measure Success of Fault
Prediction Models. SOQUA ‘07. 25-30.
8. Ostrand, T.J., Weyuker, E.J., Bell, R.M. and Ostrand, R.C. (2005). A Different
View of Fault Prediction. Proceedings of the 29th Annual International Computer
Software and Applications Conference (COMPSAC ’05).
9. Rana, Z.A., Shamail, S. and Awais, M.M. (2008). Towards a Generic Model for
Software Quality Prediction. WoSQ ’08. Leipziq.
10. Thangarajan, M. and Biswas, B. (2002). Software Reliability Prediction Model.
Tata Elxsi Whitepaper.
59
APPENDICES
60
SURVEY: DEFECT PREDICTION MODEL FOR TEST Name: ____________________________________________________________
From your point of view, what is the key contributor for test defect prediction model? (Please rank from most important to least important) Requirements for Software Programming Language Used Software Size/Code Size (KLOC) Errors/Mistakes Captured in Phase Prior to Testing Historical data of defects logged (Historical PRs) Others (Please identify): ___________________________________________ Thank you 
Download