Notes_04 -- Metrics SE 3730 / CS 5730 – Software Quality 1 Software Quality Metrics 1.1 Test Report Metrics 1.1.1 Test Case Status (Completed, Completed with Errors, Not Run Yet). – – Some of the Not Run Yet code is also be further divided into blocked – functionality not yet available or test process cannot be run for some reason not blocked – just haven’t gotten around to testing these yet. See the first graph on the next page. 1.1.2 Defect Gap Analysis Looks and the distance between (total uncovered defects and corrected defects) – which is a measure how the bug fixers are doing and when will the product be ready to ship. – The Gap is the difference between Uncovered and Corrected defects. – At first, there is a latency in correcting defects and defects are uncovered faster than fixed. Uncovered are all defects that are known and include those found and those fixed. ©2011 Mike Rowe Page 1 2/6/2016 3:00 PM Notes_04 -- Metrics SE 3730 / CS 5730 – Software Quality – As time goes on, the gap should narrow (Hopefully). If it does not, your maintenance and/or development teams are losing ground in that defects are still being found faster than they are being fixed. – See the second graph on this page for a Gap Analysis Chart. From Lewis, Software Testing and Continuous Quality Improvement, 2000 The line with the Gap should be exactly vertical representing the distance (Gap) at one specific time. ©2011 Mike Rowe Page 2 2/6/2016 3:00 PM Notes_04 -- Metrics SE 3730 / CS 5730 – Software Quality 1.1.3 Defect Severity – Defect severity by percentage of total defects. Defect Severity helps determine how close to release the software is and can help in allocating resources. – Critical – blocking other tests from being run and alpha release, – Severe – blocking tests and beta release, – Moderate – testing workaround possible, but blocking final release – … very minor – fix before the “Sun Burns Out”, USDATA 1994. – See the first graph on the next page. 1.1.4 Test Burnout Chart of cumulative total defects and defects by period over time periods. It is a measure of the rate at which new defects are being found. – Test Burnout helps project the point at which most of the defects will be found using current test cases and procedures, and therefore when (re)testing can halt. – Burnout is projection or an observation of when no more or only a small number of new defects are expected to be found using current practices. – Beware, it doesn’t project when your system will be bug free, just when your current testing techniques are not likely find additional Defects. ©2011 Mike Rowe Page 3 2/6/2016 3:00 PM Notes_04 -- Metrics SE 3730 / CS 5730 – Software Quality – See the second graph on this page. From Lewis, Software Testing and Continuous Quality Improvement, 2000 1.1.5 Defects by Function tracks number of defects per function, component or subsystem – useful in determining where to target additional testing, and/or redesign and implementation. ©2011 Mike Rowe Page 4 2/6/2016 3:00 PM Notes_04 -- Metrics SE 3730 / CS 5730 – Software Quality – Often use a Pareto Chart/Analysis. – See the table on this page. From Lewis, Software Testing and Continuous Quality Improvement, 2000 ©2011 Mike Rowe Page 5 2/6/2016 3:00 PM Notes_04 -- Metrics SE 3730 / CS 5730 – Software Quality 1.1.6 Defects by tester This tracks the number of defects found per tester. (not shown) This is only quantitative and not qualitative analysis, – Reporting this may lead to quota filling by breaking defects into many small nits rather that one comprehensive report. – Remember Deming’s 14 Quality Principles. – Many nits are harder to manage and may take more time to fix than having all related issues rolled into one bigger defect. 1.1.7 Root cause analysis What caused the defect to be added to the system – generally try to react to this by evolving the software development process. Sometimes this is also referred to Injection Source, although Injection Source is sometimes limited to Internal or External. – Internal refers to defects caused by the development team (from Requirements Engineers, Designers, Coders, Testers, …). – External refers to defects caused by non-development team people (customers gave you wrong information, 3rd party software came with defects, etc.) 1.1.8 How defects were found Inspections, walkthroughs, unit tests, integration tests, system tests, etc. If a quality assurance technique isn’t removing defects, it is a waste of time and money. 1.1.9 Injection Points In what stage of the development cycle was the defect put into the system. This can help evolve a process to try to prevent defects. 1.1.10 Detection Points In what stage of the development cycle was the defect discovered. Want to look at the difference between the Injection Point and Detection Point – If there is a significant latency between Injection and Detection, then the process needs to evolve to reduce this latency. Remember defect remediation costs increase significantly as we progress through the development stages. ©2011 Mike Rowe Page 6 2/6/2016 3:00 PM Notes_04 -- Metrics SE 3730 / CS 5730 – Software Quality From Lewis, Software Testing and Continuous Quality Improvement, 2000 JAD: Joint Application Development – a predecessor of the Agile process 1.1.11 Who found the defects Developers (in requirement, code, unit test, … reviews), QA (integration and system testing), Alpha testers, Beta testers, integrators, end customers. ©2011 Mike Rowe Page 7 2/6/2016 3:00 PM Notes_04 -- Metrics SE 3730 / CS 5730 – Software Quality From Lewis, Software Testing and Continuous Quality Improvement, 2000 1.2 Software Complexity and have been used to estimate testing time and or quality 1.2.1 KLOCS -- CoCoMo Real-time embedded systems, 40-160 LOC/P-month Systems programs , 150-400 LOC/P-month Commercial applications, 200-800 LOC/P-month http://csse.usc.edu/tools/COCOMOSuite.php http://sunset.usc.edu/research/COCOMOII/expert_cocomo/expert_cocomo2000.html 1.2.2 Comment Percentage The comment percentage can include a count of the number of comments, both on line (with code) and stand-alone. – http://www.projectcodemeter.com/cost_estimation/index.php?file=kop1.php The comment percentage is calculated by the total number of comments divided by the total lines of code less the number of blank lines. ©2011 Mike Rowe Page 8 2/6/2016 3:00 PM Notes_04 -- Metrics SE 3730 / CS 5730 – Software Quality Comment percentage of about 30 percent have been mentioned as most effective. Because comments help developers and maintainers, this metric is used to evaluate the attributes of understandability, reusability, and maintainability. 1.2.3 Halstead’s Metrics Have been associated with maintainability of code Programmers use operators and operands to write programs Suggests program comprehension requires retrieval of tokens from mental dictionary via binary search mechanism Complexity of a piece of code, and hence the time to develop it, depends on: – n1, number of unique operators – n2, number of unique operands – N1, total number of occurrences of operators – N2, total number of occurrences of operands SUBROUTINE SORT (X, N) INTEGER X(100), N, I, J, SAVE, IM1 IF (N .LT. 2) GOTO 200 DO 210 I = 2, N IM1 = I – 1 DO 220 J = 1, IM1 IF (X(I) .GE. X(J)) GOTO 220 SAVE = X(I) X(I) = X(J) X(J) = SAVE 220 CONTINUE 210 CONTINUE 200 RETURN Operators Occurrences Operands Occurrences SUBROUTINE 1 SORT 1 () 10 X 8 , 8 N 4 INTEGER 1 100 1 IF 2 I 6 .LT. 1 J 5 GOTO 2 SAVE 3 DO 2 IM1 3 = 6 2 2 ©2011 Mike Rowe Page 9 2/6/2016 3:00 PM Notes_04 -- Metrics SE 3730 / CS 5730 – Software Quality - 1 200 2 .GE. 1 210 2 CONTINUE 2 1 2 RETURN 1 220 3 End-of-line 13 n1 = 14 N1 = 51 n2 = 13 N2 = 42 Program Length, N = N1 + N2 = 93 {Total number of Operators and Operands } Program Vocabulary, n = n1 + n2 = 27 {number of unique Operands and Operators} Program Volume, V = N * log2 n = 93 * log2 27 = 442 {Program Length * log(Vocab) } o Represents storage required for a binary translation of the original program o Estimates the number of mental comparisons required to comprehend the program Length estimate, N* = n1 * log2 n1 + n2 * log2 n2 = 101.4 { Unique Operand and Operators } 14 * log2 (14) + 13 * log2 (13) = 55 + 45 = 101.4 Potential volume V* = (2 + n2) log2 (2 + n2) o Program of minimum size o For our example, V* = (2+ 13) log2 (2+13) = 15 log2 (15) = 58.6 o Note: that as the Program Volume approaches the Potential Volume we are reaching an optimized theoretical solution. o And, in theory there is no difference between theory and practice, but in practice there is. – Yogi Berra Program (complexity) Level, L = V* / V = 58.6 / 442 = 0.13{ Potential Vol. / ACTUAL Program Vol. }. How close are we to theoretical optimal program. Difficulty, 1 over program complexity level, D = 1 / L = 1 / 0.13 = 7.5 Can contrast two solutions and compare them for Difficulty. Difficulty estimate, D* = (n1 / 2) * (N2 / n2) = (14 / 2) * (42 / 13) = 22.6 o Programming difficulty increases if additional operators are introduced (i.e., as n1 increases) and if an operands are repeatedly used (i.e., as N2/n2 increases) Effort, E = V / L* = D* * V = n1* N2 * N * log2 n / (2 * n2) = 9989 22.6 * 442 = 9989 o Measures ‘elementary mental discriminations’ ©2011 Mike Rowe Page 10 2/6/2016 3:00 PM Notes_04 -- Metrics SE 3730 / CS 5730 – Software Quality o Two solutions may have very different Effort estimates. A psychologist, John Stroud, suggested that human mind is capable of making a limited number of mental discrimination per second (Stroud Number), in the range of 5 to 20. o Using a Stroud number of 18, o Time for development, T = E/ 18 discriminations/seconds = 9989/18 discriminations/seconds = 555 seconds = 9 minutes 1.1.1.1 Simplification of Programs to which Halstead’s Metric is sensitive Below are constructs that can alter program complexity o Complementary operations: e.g. =i+1-j-1+j v. = i Reduces N1, N2, Length, Volume, and Difficulty estimate. o Ambiguous operands: Identifiers refer to different things in different parts of the program – reuse of operands. r := b * b - 4 * a * c; ..... r := (-b + SQRT(r)) / 2.0; // r is redefined in this statement o Or -- Synonymous operands: Different identifiers for same thing o Common sub-expressions: failure to use variables to avoid redundant re-computation y := (i + j) * (i + j) * (i + j); ..... can be rewritten x := i + j; y := x * x * x; o Or -- Unwarranted assignment: e.g. over-doing solution to common subexpressions, thus producing unnecessary variables o Unfactored expressions: y := a * a + 2 * a *b * b + b * b; ..... can be rewritten y := (a + b) * (a + b); ©2011 Mike Rowe Page 11 2/6/2016 3:00 PM Notes_04 -- Metrics SE 3730 / CS 5730 – Software Quality 1.2.4 Function Points CoCoMo II Based on a combination of program characteristics o external inputs and outputs o user interactions o external interfaces o files used by the system A weight is associated with each of these The function point count is computed by multiplying each raw count by the weight and summing all values FPs are very subjective -- depend on the estimator. They cannot be counted automatically “In the late 1970's A.J. Albrecht of IBM took the position that the economic output unit of software projects should be valid for all languages, and should represent topics of concern to the users of the software. In short, he wished to measure the functionality of software. Albrecht considered that the visible external aspects of software that could be enumerated accurately consisted of five items: the inputs to the application, the outputs from it, inquiries by users, the data files that would be updated by the application, and the interfaces to other applications. After trial and error, empirical weighting factors were developed for the five items, as was a complexity adjustment. The number of inputs was weighted by 4, outputs by 5, inquiries by 4, data file updates by 10, and interfaces by 7. These weights represent the approximate difficulty of implementing each of the five factors. In October of 1979, Albrecht first presented the results of this new software measurement technique, termed "Function Points" at a joint SHARE/GUIDE/IBM conference in Monterey, California. This marked the first time in the history of the computing era that economic software productivity could actually be measured. Table 2 provides an example of Albrecht's Function Point technique used to measure either Case A or Case B. Since the same functionality is provided, the Function Point count is also identical. ©2011 Mike Rowe Page 12 2/6/2016 3:00 PM Notes_04 -- Metrics SE 3730 / CS 5730 – Software Quality Table 2. Sample Function Point Calculations Raw Data Weights Function Points 1 Input X4 = 4 1 Output X5 = 5 1 Inquiry X4 = 4 1 Data File X 10 = 10 1 Interface X7 = 7 ---- Unadjusted Total 30 Compexity Adjustment None This is used for the type of system be developed – Embedded is most complex. Adjusted Function Points 30 Table 3. The Economic Validity of Function Point Metrics Case A Case B Asssembler Fortran Activity Version Version (30 F.P.) (30 F.P.) Difference Requirements 2 Months 2 Months 0 Design 3 Months 3 Months 0 Coding 10 Months 3 Months -7 Integration/Test 5 Months 3 Months -2 User Documentation 2 Months 2 Months 0 Management/Support 3 Months 2 Months -1 Total 25 Months 15 Months -10 Total Costs $125,000 $75,000 ($50,000) Cost Per F.P. $4,166.67 $2,500.00 ($1,666.67) 1.2 2 + 0.8 F.P. Per Person Month ©2011 Mike Rowe Page 13 2/6/2016 3:00 PM Notes_04 -- Metrics SE 3730 / CS 5730 – Software Quality The Function Point metrics are far superior to the source line metrics for expressing normalized productivity data. As real costs decline, cost per Function Point also declines. As real productivity goes up, Function Points per person month also goes up. In 1986, the non-profit International Function Point Users Groups (IFPUG) was formed to assist in transmitting data and information about this metric. In 1987, the British government adopted a modified form of Function Points as the standard software productivity metric. In 1990, IFPUG published Release 3.0 of the Function Point Counting Practices Manual, which represented a consensus view of the rules for Function Point counting. Readers should refer to this manual for current counting guidelines. “ Table 1 - SLOC per FP by Language Language ©2011 Mike Rowe SLOC per FP Assembler 320 C 150 Algol 106 Cobol 106 Fortran 106 Jovial 106 Pascal 91 RPG 80 PL/I 80 Ada 71 Lisp 64 Page 14 2/6/2016 3:00 PM Notes_04 -- Metrics SE 3730 / CS 5730 – Software Quality Basic 64 4th Generation Database 40 APL 32 Smalltalk 21 Query Languages 16 Spreadsheet Languages 6 2 QSM Function Point Programming Languages Table Version 3.0 April 2005 © Copyright 2005 by Quantitative Software Management, Inc. All Rights Reserved. http://www.qsm.com/FPGearing.html#MoreInfo The table below contains Function Point Language Gearing Factors from 2597 completed function point projects in the QSM database. The projects span 289 languages from a total of 645 languages represented in the database. Because mixed-language projects are not a reliable source of gearing factors, this table is based upon single-language projects only. Version 3.0 features the languages where we have the most recent, high-quality data. The table will be updated and expanded as additional project data becomes available. As an additional resource, the David Consulting Group has graciously allowed QSM to include their data in this table. Environmental factors can result in significant variation in the number of source statements per function point. For this reason, QSM recommends that organizations collect both code counts and final function point counts for completed software projects and use this data for estimates. Where there is no completed project data available for estimation, we provide the following gearing factor information (where sufficient project data exists): the average the median the range (low - high) We hope this information will allow estimators to assess the amount of variation, the central tendency, and any skew to the distribution of gearing factors for each language. ©2011 Mike Rowe Page 15 2/6/2016 3:00 PM Notes_04 -- Metrics SE 3730 / CS 5730 – Software Quality Language David Consultin g Data QSM SLOC/FP Data Access Ada Avg 35 154 Median 38 - Low 15 104 High 47 205 Advantage 38 38 38 38 - APS 86 83 20 184 - ASP 69 62 32 127 - Assembler** 172 157 86 320 575 Basic/400 C ** C++ ** C# Clipper COBOL ** Cool:Gen/IEF Culprit DBase III DBase IV Easytrieve+ Excel Focus FORTRAN FoxPro HTML** Ideal IEF/Cool:Gen Informix J2EE Java** 148 60 59 38 73 38 51 52 33 47 43 32 43 66 38 42 61 60 104 53 59 39 77 31 34 46 42 35 42 52 31 31 50 59 9 29 51 27 8 10 25 31 32 25 35 34 10 24 50 14 704 178 66 70 400 180 41 63 56 35 53 203 180 57 100 97 225 80 60 175 60 55 60 210 - JavaScript** 56 54 44 65 50 JCL** JSP Lotus Notes 60 59 21 48 22 21 15 115 25 400 - ©2011 Mike Rowe Page 16 - Macro 80 2/6/2016 3:00 PM Notes_04 -- Metrics SE 3730 / CS 5730 – Software Quality Mantis Mapper Natural Oracle** Oracle Dev 2K/FORMS Pacbase PeopleSoft Perl PL/1** PL/SQL Powerbuilder** REXX RPG II/III Sabretalk SAS Siebel Tools Slogan Smalltalk** SQL** VBScript** Visual Basic** VPF Web Scripts 71 118 60 38 41/42 44 33 60 59 46 30 67 61 80 40 13 81 35 39 45 50 96 44 27 81 52 29 30 48 32 58 31 24 49 89 41 13 82 32 35 34 42 95 15 22 16 22 4 21/23 26 30 22 14 7 24 54 33 5 66 17 15 27 14 92 9 250 245 141 122 100 60 40 92 110 105 155 99 49 20 100 55 143 50 276 101 114 100 60 50 126 120 50 50 - Note: That the applications that a Language is used for may differ significantly. C++, Assembly, Ada … may be used for much more complex projects than Visual Basic, Java, etc. – Rowe’s 2 cents worth. 2.1.1 “A Metrics Suite for Object Oriented Design” S.R. Chidanber and C.F. Kemerer, IEEE Trans. Software Eng., vol 20, no. 6, pp476-493, June 1994. See metrics below 2.1.2 “A validation of Object-Oriented Design Metrics as Quality Indicators” V.R. Basili, L.C.Briand, W.L Melo, IEEE Trans. On Software Engineering, vol. 22, no. 10, Oct. 1996 ©2011 Mike Rowe Page 17 2/6/2016 3:00 PM Notes_04 -- Metrics SE 3730 / CS 5730 – Software Quality WMC – Weighted Methods per Class is the number of methods and operators in a method (excluding those inherited from parent classes). The higher the WMC the higher the probability of fault detection. DIT – Depth of Inheritance Tree, number of ancestors of a class. The higher the DIT the higher the probability of fault detection. NOC – Number of Children of a Class, the number of direct descendants for a class. Was inversely related to fault detection. This was believed to result from high levels of reuse by children. Maybe also if inheritance fan-out is .wide rather than deep, then we have fewer levels of inheritance. CBO – Coupling Between Object Classes, how many member functions or instance variables of another class does a class use and how many other classes are involved. Was significantly related to probability of finding faults. RFC – Response For a Class, the number of functions of a class that can directly be executed by other classes (public and friend). The higher the RFC the higher the probability of fault detection. Many coding standards address these either directly or indirectly. For instance, limit DIT to 3or 4, provide guidance against coupling, provide guidance for methods per class. 2.2 Use of SPC in software quality assurance. o Pareto for function. 80-20%; 80 of defect found in 20% of modules o Control and run charts – if error rates increase above some control level, we need to take action o Look for causes, modify process, modify design, reengineer, rewrite, … 2.3 Questions about Metrics Is publishing metrics that relate to program composition actually Quality beneficial? ©2011 Mike Rowe Page 18 2/6/2016 3:00 PM