Notes_04 -- Metrics
SE 3730 / CS 5730 – Software Quality
1 Software Quality Metrics
1.1
Test Report Metrics
1.1.1 Test Case Status (Completed, Completed with Errors, Not Run Yet).
–
–
Some of the Not Run Yet code is also be further divided into

blocked – functionality not yet available or test process cannot be run for some
reason

not blocked – just haven’t gotten around to testing these yet.
See the first graph on the next page.
1.1.2 Defect Gap Analysis

Looks and the distance between (total uncovered defects and corrected defects) – which is a
measure how the bug fixers are doing and when will the product be ready to ship.
–
The Gap is the difference between Uncovered and Corrected defects.
–
At first, there is a latency in correcting defects and defects are uncovered faster than
fixed.

Uncovered are all defects that are known and include those found and those
fixed.
©2011 Mike Rowe
Page 1
2/6/2016 3:00 PM
Notes_04 -- Metrics
SE 3730 / CS 5730 – Software Quality
–
As time goes on, the gap should narrow (Hopefully). If it does not, your maintenance
and/or development teams are losing ground in that defects are still being found faster
than they are being fixed.
–
See the second graph on this page for a Gap Analysis Chart.
From Lewis, Software Testing and Continuous Quality Improvement, 2000
The line with the Gap should be exactly vertical representing the distance (Gap) at one specific
time.
©2011 Mike Rowe
Page 2
2/6/2016 3:00 PM
Notes_04 -- Metrics
SE 3730 / CS 5730 – Software Quality
1.1.3 Defect Severity –

Defect severity by percentage of total defects. Defect Severity helps determine how close to
release the software is and can help in allocating resources.
–
Critical – blocking other tests from being run and alpha release,
–
Severe – blocking tests and beta release,
–
Moderate – testing workaround possible, but blocking final release
–
… very minor – fix before the “Sun Burns Out”, USDATA 1994.
–
See the first graph on the next page.
1.1.4 Test Burnout

Chart of cumulative total defects and defects by period over time periods. It is a measure of
the rate at which new defects are being found.
–
Test Burnout helps project the point at which most of the defects will be found using
current test cases and procedures, and therefore when (re)testing can halt.
–
Burnout is projection or an observation of when no more or only a small number of new
defects are expected to be found using current practices.
–
Beware, it doesn’t project when your system will be bug free, just when your current
testing techniques are not likely find additional Defects.
©2011 Mike Rowe
Page 3
2/6/2016 3:00 PM
Notes_04 -- Metrics
SE 3730 / CS 5730 – Software Quality
–
See the second graph on this page.
From Lewis, Software Testing and Continuous Quality Improvement, 2000
1.1.5 Defects by Function

tracks number of defects per function, component or subsystem
–
useful in determining where to target additional testing, and/or redesign and
implementation.
©2011 Mike Rowe
Page 4
2/6/2016 3:00 PM
Notes_04 -- Metrics
SE 3730 / CS 5730 – Software Quality
–
Often use a Pareto Chart/Analysis.
–
See the table on this page.
From Lewis, Software Testing and Continuous Quality Improvement, 2000
©2011 Mike Rowe
Page 5
2/6/2016 3:00 PM
Notes_04 -- Metrics
SE 3730 / CS 5730 – Software Quality
1.1.6 Defects by tester

This tracks the number of defects found per tester. (not shown) This is only quantitative and
not qualitative analysis,
–
Reporting this may lead to quota filling by breaking defects into many small nits rather
that one comprehensive report. – Remember Deming’s 14 Quality Principles.
–
Many nits are harder to manage and may take more time to fix than having all related
issues rolled into one bigger defect.
1.1.7 Root cause analysis

What caused the defect to be added to the system – generally try to react to this by evolving
the software development process.

Sometimes this is also referred to Injection Source, although Injection Source is sometimes
limited to Internal or External.
–
Internal refers to defects caused by the development team (from Requirements
Engineers, Designers, Coders, Testers, …).
–
External refers to defects caused by non-development team people (customers gave
you wrong information, 3rd party software came with defects, etc.)
1.1.8 How defects were found

Inspections, walkthroughs, unit tests, integration tests, system tests, etc. If a quality assurance
technique isn’t removing defects, it is a waste of time and money.
1.1.9 Injection Points

In what stage of the development cycle was the defect put into the system. This can help evolve
a process to try to prevent defects.
1.1.10 Detection Points

In what stage of the development cycle was the defect discovered.

Want to look at the difference between the Injection Point and Detection Point
– If there is a significant latency between Injection and Detection, then the process needs to
evolve to reduce this latency.
 Remember defect remediation costs increase significantly as we progress through
the development stages.
©2011 Mike Rowe
Page 6
2/6/2016 3:00 PM
Notes_04 -- Metrics
SE 3730 / CS 5730 – Software Quality
From Lewis, Software Testing and Continuous Quality Improvement, 2000
JAD: Joint Application Development – a predecessor of the Agile process
1.1.11 Who found the defects

Developers (in requirement, code, unit test, … reviews), QA (integration and system testing),
Alpha testers, Beta testers, integrators, end customers.
©2011 Mike Rowe
Page 7
2/6/2016 3:00 PM
Notes_04 -- Metrics
SE 3730 / CS 5730 – Software Quality
From Lewis, Software Testing and Continuous Quality Improvement, 2000
1.2 Software Complexity and have been used to estimate testing time and or
quality
1.2.1 KLOCS -- CoCoMo

Real-time embedded systems, 40-160 LOC/P-month

Systems programs , 150-400 LOC/P-month

Commercial applications, 200-800 LOC/P-month

http://csse.usc.edu/tools/COCOMOSuite.php

http://sunset.usc.edu/research/COCOMOII/expert_cocomo/expert_cocomo2000.html
1.2.2 Comment Percentage

The comment percentage can include a count of the number of comments, both on line (with
code) and stand-alone.
– http://www.projectcodemeter.com/cost_estimation/index.php?file=kop1.php

The comment percentage is calculated by the total number of comments divided by the total lines
of code less the number of blank lines.
©2011 Mike Rowe
Page 8
2/6/2016 3:00 PM
Notes_04 -- Metrics
SE 3730 / CS 5730 – Software Quality

Comment percentage of about 30 percent have been mentioned as most effective. Because
comments help developers and maintainers, this metric is used to evaluate the attributes of
understandability, reusability, and maintainability.
1.2.3 Halstead’s Metrics
Have been associated with maintainability of code

Programmers use operators and operands to write programs

Suggests program comprehension requires retrieval of tokens from mental dictionary via binary
search mechanism

Complexity of a piece of code, and hence the time to develop it, depends on:
–
n1, number of unique operators
–
n2, number of unique operands
–
N1, total number of occurrences of operators
–
N2, total number of occurrences of operands
SUBROUTINE SORT (X, N)
INTEGER X(100), N, I, J, SAVE, IM1
IF (N .LT. 2) GOTO 200
DO 210 I = 2, N
IM1 = I – 1
DO 220 J = 1, IM1
IF (X(I) .GE. X(J)) GOTO 220
SAVE = X(I)
X(I) = X(J)
X(J) = SAVE
220
CONTINUE
210
CONTINUE
200
RETURN
Operators
Occurrences
Operands
Occurrences
SUBROUTINE
1
SORT
1
()
10
X
8
,
8
N
4
INTEGER
1
100
1
IF
2
I
6
.LT.
1
J
5
GOTO
2
SAVE
3
DO
2
IM1
3
=
6
2
2
©2011 Mike Rowe
Page 9
2/6/2016 3:00 PM
Notes_04 -- Metrics
SE 3730 / CS 5730 – Software Quality
-
1
200
2
.GE.
1
210
2
CONTINUE
2
1
2
RETURN
1
220
3
End-of-line
13
n1 = 14
N1 = 51
n2 = 13
N2 = 42

Program Length, N = N1 + N2 = 93 {Total number of Operators and Operands }

Program Vocabulary, n = n1 + n2 = 27 {number of unique Operands and Operators}

Program Volume, V = N * log2 n = 93 * log2 27 = 442 {Program Length * log(Vocab) }
o Represents storage required for a binary translation of the original program
o Estimates the number of mental comparisons required to comprehend the program

Length estimate, N* = n1 * log2 n1 + n2 * log2 n2 = 101.4 { Unique Operand and Operators }
14 * log2 (14) + 13 * log2 (13) = 55 + 45 = 101.4

Potential volume V* = (2 + n2) log2 (2 + n2)
o Program of minimum size
o For our example, V* = (2+ 13) log2 (2+13) = 15 log2 (15) = 58.6
o Note: that as the Program Volume approaches the Potential Volume we are reaching an
optimized theoretical solution.
o And, in theory there is no difference between theory and practice, but in practice there
is. – Yogi Berra

Program (complexity) Level, L = V* / V = 58.6 / 442 = 0.13{ Potential Vol. / ACTUAL Program
Vol. }. How close are we to theoretical optimal program.

Difficulty, 1 over program complexity level, D = 1 / L = 1 / 0.13 = 7.5 Can contrast two
solutions and compare them for Difficulty.

Difficulty estimate, D* = (n1 / 2) * (N2 / n2) = (14 / 2) * (42 / 13) = 22.6
o Programming difficulty increases if additional operators are introduced (i.e., as n1
increases) and if an operands are repeatedly used (i.e., as N2/n2 increases)

Effort, E = V / L* = D* * V = n1* N2 * N * log2 n / (2 * n2) = 9989
22.6 * 442 = 9989
o Measures ‘elementary mental discriminations’
©2011 Mike Rowe
Page 10
2/6/2016 3:00 PM
Notes_04 -- Metrics
SE 3730 / CS 5730 – Software Quality
o Two solutions may have very different Effort estimates.

A psychologist, John Stroud, suggested that human mind is capable of making a limited number
of mental discrimination per second (Stroud Number), in the range of 5 to 20.
o Using a Stroud number of 18,
o Time for development, T = E/ 18 discriminations/seconds
= 9989/18 discriminations/seconds = 555 seconds = 9 minutes
1.1.1.1 Simplification of Programs to which Halstead’s Metric is sensitive
Below are constructs that can alter program complexity
o Complementary operations: e.g.
=i+1-j-1+j
v. = i
Reduces N1, N2, Length, Volume, and Difficulty estimate.
o Ambiguous operands: Identifiers refer to different things in different parts of the program –
reuse of operands.
r := b * b - 4 * a * c;
.....
r := (-b + SQRT(r)) / 2.0; // r is redefined in this statement
o Or -- Synonymous operands: Different identifiers for same thing
o Common sub-expressions: failure to use variables to avoid redundant re-computation
y := (i + j) * (i + j) * (i + j);
..... can be rewritten
x := i + j;
y := x * x * x;
o Or -- Unwarranted assignment: e.g. over-doing solution to common subexpressions, thus
producing unnecessary variables
o Unfactored expressions:
y := a * a + 2 * a *b * b + b * b;
..... can be rewritten
y := (a + b) * (a + b);
©2011 Mike Rowe
Page 11
2/6/2016 3:00 PM
Notes_04 -- Metrics
SE 3730 / CS 5730 – Software Quality
1.2.4 Function Points

CoCoMo II

Based on a combination of program characteristics
o
external inputs and outputs
o
user interactions
o
external interfaces
o
files used by the system

A weight is associated with each of these

The function point count is computed by multiplying each raw count by the weight and
summing all values

FPs are very subjective -- depend on the estimator. They cannot be counted automatically
“In the late 1970's A.J. Albrecht of IBM took the position that the economic output unit of software
projects should be valid for all languages, and should represent topics of concern to the users of
the software. In short, he wished to measure the functionality of software.
Albrecht considered that the visible external aspects of software that could be enumerated
accurately consisted of five items: the inputs to the application, the outputs from it, inquiries by
users, the data files that would be updated by the application, and the interfaces to other
applications.
After trial and error, empirical weighting factors were developed for the five items, as was a
complexity adjustment. The number of inputs was weighted by 4, outputs by 5, inquiries by 4,
data file updates by 10, and interfaces by 7. These weights represent the approximate
difficulty of implementing each of the five factors.
In October of 1979, Albrecht first presented the results of this new software measurement
technique, termed "Function Points" at a joint SHARE/GUIDE/IBM conference in Monterey,
California. This marked the first time in the history of the computing era that economic
software productivity could actually be measured.
Table 2 provides an example of Albrecht's Function Point technique used to measure either
Case A or Case B. Since the same functionality is provided, the Function Point count is also
identical.
©2011 Mike Rowe
Page 12
2/6/2016 3:00 PM
Notes_04 -- Metrics
SE 3730 / CS 5730 – Software Quality
Table 2. Sample Function Point Calculations
Raw Data
Weights
Function Points
1 Input
X4
=
4
1 Output
X5
=
5
1 Inquiry
X4
=
4
1 Data File
X 10
=
10
1 Interface
X7
=
7
----
Unadjusted Total
30
Compexity Adjustment
None This is used for the type of system be
developed – Embedded is most complex.
Adjusted Function Points
30
Table 3. The Economic Validity of Function Point Metrics
Case A
Case B
Asssembler
Fortran
Activity
Version
Version
(30 F.P.)
(30 F.P.)
Difference
Requirements
2 Months
2 Months
0
Design
3 Months
3 Months
0
Coding
10 Months
3 Months
-7
Integration/Test
5 Months
3 Months
-2
User Documentation
2 Months
2 Months
0
Management/Support
3 Months
2 Months
-1
Total
25 Months
15 Months
-10
Total Costs
$125,000
$75,000
($50,000)
Cost Per F.P.
$4,166.67
$2,500.00
($1,666.67)
1.2
2
+ 0.8
F.P. Per Person Month
©2011 Mike Rowe
Page 13
2/6/2016 3:00 PM
Notes_04 -- Metrics
SE 3730 / CS 5730 – Software Quality
The Function Point metrics are far superior to the source line metrics for expressing normalized
productivity data. As real costs decline, cost per Function Point also declines. As real
productivity goes up, Function Points per person month also goes up.
In 1986, the non-profit International Function Point Users Groups (IFPUG) was formed to assist in
transmitting data and information about this metric. In 1987, the British government adopted a
modified form of Function Points as the standard software productivity metric. In 1990, IFPUG
published Release 3.0 of the Function Point Counting Practices Manual, which represented a
consensus view of the rules for Function Point counting. Readers should refer to this manual for
current counting guidelines. “
Table 1 - SLOC per FP by Language
Language
©2011 Mike Rowe
SLOC per FP
Assembler
320
C
150
Algol
106
Cobol
106
Fortran
106
Jovial
106
Pascal
91
RPG
80
PL/I
80
Ada
71
Lisp
64
Page 14
2/6/2016 3:00 PM
Notes_04 -- Metrics
SE 3730 / CS 5730 – Software Quality
Basic
64
4th Generation Database
40
APL
32
Smalltalk
21
Query Languages
16
Spreadsheet Languages
6
2 QSM Function Point Programming Languages Table
Version 3.0 April 2005
© Copyright 2005 by Quantitative Software Management, Inc. All Rights Reserved.
http://www.qsm.com/FPGearing.html#MoreInfo
The table below contains Function Point Language Gearing Factors from 2597 completed function
point projects in the QSM database. The projects span 289 languages from a total of 645
languages represented in the database. Because mixed-language projects are not a reliable
source of gearing factors, this table is based upon single-language projects only. Version 3.0
features the languages where we have the most recent, high-quality data.
The table will be updated and expanded as additional project data becomes available. As an
additional resource, the David Consulting Group has graciously allowed QSM to include their
data in this table.
Environmental factors can result in significant variation in the number of source statements per
function point. For this reason, QSM recommends that organizations collect both code counts
and final function point counts for completed software projects and use this data for
estimates. Where there is no completed project data available for estimation, we provide the
following gearing factor information (where sufficient project data exists):



the average
the median
the range (low - high)
We hope this information will allow estimators to assess the amount of variation, the central
tendency, and any skew to the distribution of gearing factors for each language.
©2011 Mike Rowe
Page 15
2/6/2016 3:00 PM
Notes_04 -- Metrics
SE 3730 / CS 5730 – Software Quality
Language
David
Consultin
g
Data
QSM SLOC/FP Data
Access
Ada
Avg
35
154
Median
38
-
Low
15
104
High
47
205
Advantage
38
38
38
38
-
APS
86
83
20
184
-
ASP
69
62
32
127
-
Assembler**
172
157
86
320
575 Basic/400
C **
C++ **
C#
Clipper
COBOL **
Cool:Gen/IEF
Culprit
DBase III
DBase IV
Easytrieve+
Excel
Focus
FORTRAN
FoxPro
HTML**
Ideal
IEF/Cool:Gen
Informix
J2EE
Java**
148
60
59
38
73
38
51
52
33
47
43
32
43
66
38
42
61
60
104
53
59
39
77
31
34
46
42
35
42
52
31
31
50
59
9
29
51
27
8
10
25
31
32
25
35
34
10
24
50
14
704
178
66
70
400
180
41
63
56
35
53
203
180
57
100
97
225
80
60
175
60
55
60
210
-
JavaScript**
56
54
44
65
50
JCL**
JSP
Lotus Notes
60
59
21
48
22
21
15
115
25
400
-
©2011 Mike Rowe
Page 16
-
Macro
80
2/6/2016 3:00 PM
Notes_04 -- Metrics
SE 3730 / CS 5730 – Software Quality
Mantis
Mapper
Natural
Oracle**
Oracle Dev 2K/FORMS
Pacbase
PeopleSoft
Perl
PL/1**
PL/SQL
Powerbuilder**
REXX
RPG II/III
Sabretalk
SAS
Siebel Tools
Slogan
Smalltalk**
SQL**
VBScript**
Visual Basic**
VPF
Web Scripts
71
118
60
38
41/42
44
33
60
59
46
30
67
61
80
40
13
81
35
39
45
50
96
44
27
81
52
29
30
48
32
58
31
24
49
89
41
13
82
32
35
34
42
95
15
22
16
22
4
21/23
26
30
22
14
7
24
54
33
5
66
17
15
27
14
92
9
250
245
141
122
100
60
40
92
110
105
155
99
49
20
100
55
143
50
276
101
114
100
60
50
126
120
50
50
-
Note: That the applications that a Language is used for may differ significantly. C++, Assembly, Ada
… may be used for much more complex projects than Visual Basic, Java, etc. – Rowe’s 2 cents
worth.
2.1.1
“A Metrics Suite for Object Oriented Design”
S.R. Chidanber and C.F. Kemerer, IEEE Trans. Software Eng., vol 20, no. 6, pp476-493, June 1994.
See metrics below
2.1.2 “A validation of Object-Oriented Design Metrics as Quality Indicators”
V.R. Basili, L.C.Briand, W.L Melo, IEEE Trans. On Software Engineering, vol. 22, no. 10, Oct. 1996
©2011 Mike Rowe
Page 17
2/6/2016 3:00 PM
Notes_04 -- Metrics
SE 3730 / CS 5730 – Software Quality

WMC – Weighted Methods per Class is the number of methods and operators in a method
(excluding those inherited from parent classes). The higher the WMC the higher the probability
of fault detection.

DIT – Depth of Inheritance Tree, number of ancestors of a class. The higher the DIT the higher
the probability of fault detection.

NOC – Number of Children of a Class, the number of direct descendants for a class. Was
inversely related to fault detection. This was believed to result from high levels of reuse by
children. Maybe also if inheritance fan-out is .wide rather than deep, then we have fewer
levels of inheritance.

CBO – Coupling Between Object Classes, how many member functions or instance variables of
another class does a class use and how many other classes are involved. Was significantly
related to probability of finding faults.

RFC – Response For a Class, the number of functions of a class that can directly be executed by
other classes (public and friend). The higher the RFC the higher the probability of fault
detection.
Many coding standards address these either directly or indirectly. For instance, limit DIT to 3or 4,
provide guidance against coupling, provide guidance for methods per class.
2.2
Use of SPC in software quality assurance.
o Pareto for function. 80-20%; 80 of defect found in 20% of modules
o Control and run charts – if error rates increase above some control level, we need to take
action
o Look for causes, modify process, modify design, reengineer, rewrite, …
2.3
Questions about Metrics
Is publishing metrics that relate to program composition actually Quality beneficial?
©2011 Mike Rowe
Page 18
2/6/2016 3:00 PM