Evaluating the Quality of Software Engineering Performance Data James Over Software Engineering Institute Carnegie Mellon University July 2014 © 2014 Carnegie Mellon University Copyright 2014 Carnegie Mellon University This material is based upon work funded and supported by Industry cost recovery under Contract No. FA8721-05-C-0003 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center sponsored by the United States Department of Defense. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of Industry cost recovery or the United States Department of Defense. NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN “AS-IS” BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT. This material has been approved for public release and unlimited distribution. This material may be reproduced in its entirety, without modification, and freely distributed in written or electronic form without requesting formal permission. Permission is required for any other use. Requests for permission should be directed to the Software Engineering Institute at permission@sei.cmu.edu. Carnegie Mellon® is registered in the U.S. Patent and Trademark Office by Carnegie Mellon University. Team Software ProcessSM and TSPSM are service marks of Carnegie Mellon University. DM-0001405 Team Software Process 2 © 2014 Carnegie Mellon University Managing Software Development Managing software engineering is challenging due to the nature of the work. Intuition The ability to understand something immediately without the need for conscious reasoning Counterintuitive Contrary to intuition or commonsense, but often true Team Software Process 3 © 2014 Carnegie Mellon University Management Visibility Data can provide visibility into a software project’s past and future performance. • Did the project meet key milestones? • Is the project on schedule? • Are costs under control? • How does project performance compare to other projects? • Will the software work when released? Team Software Process 4 © 2014 Carnegie Mellon University Can We Trust the Data? Manufacturing data • standard operational definitions • machines measure, collect, and report • precise and accurate Software engineering project data • lack of standards and operational definitions • people often measure, collect, and report • imprecise and inaccurate “In God we trust, all others bring data” - Deming Team Software Process 5 © 2014 Carnegie Mellon University TSP Measurement Framework TSP Measures Five direct measures Team and team member data Work Product Size Estimated during planning Measured while working Time on Task Defects Resource Availability Schedule Evaluated weekly and when • task completed • phase completed • component completed • cycle completed • project completed Team Software Process 6 © 2014 Carnegie Mellon University Measurement Context Supports Evaluation The five direct measures include a reference to • Who – team and team member • What – project, product or component, size measure, process, process phase, and task • When – project cycle, calendar, time of day The five direct measures have dependencies that can be evaluated to test for internal consistency, for example: • Work or tasks are ordered and time on task cannot overlap • Defects are found while performing a task and the timestamp and time to fix cannot exceed the start and stop time for the task • Product size, time to perform a task, and defects injected and removed are generally correlated Team Software Process 7 © 2014 Carnegie Mellon University Performance Evaluation Process Import Data • Assess submission • Store data • Add to catalog Data Quality Check • Consistency Analysis • Statistical Analysis • Falsified values Baseline Benchmark • Project Facts • Product Facts • Process Facts • Projects • Organizations Produce Certificate Team Software Process 8 © 2014 Carnegie Mellon University TSP Data Database 1 Projects Time Log Entries Database 2 114 100,466 Projects 109 Time Log Entries 103,023 Defect Log Entries 10,860 Defect Log Entries 18,408 Tasks 73,000 Tasks 11,499 Components 8,412 Components 7,464 Team Software Process 9 © 2014 Carnegie Mellon University Example Data Quality Tests Check for missing or incorrect value • time log entry without a start or stop date • defect log entry without a reference to the phase where the defect was discovered Check for internal inconsistency • Time log entry outside of the project start and end date, or overlaps with another entry, or violates process sequencing • Defect log entry timestamp inconsistent with outside the start and end date for the associated task Check for statistical anomalies and expected distributions • Test the distribution of each direct measures to expected values • Outlier evaluation • Intentional and unintentional incorrect values Team Software Process 10 © 2014 Carnegie Mellon University Time Log Leading Digit Analysis The TSP Time Log tracks time spent on tasks in the plan. Time Log Leading Digit Analysis 35.00% 30.00% Accurate data are important. How many task hours this week? 25.00% • How many task hours to complete a module or component? 20.00% Tracking in real time improves accuracy. Frequency • 92.3% Expected 15.00% Actual 10.00% 5.00% The leading digit analysis compares data to the ideal distribution. 0.00% 1 2 3 4 5 6 7 8 9 Leading Digit Team Software Process 11 © 2014 Carnegie Mellon University Defect Log and Size Log Analysis Defect Log Leading Digit Analysis Size Log Leading Digit Analysis 40.00% 45.00% 35.00% 40.00% 30.00% 35.00% 66.7% 25.00% Expected 20.00% Actual 90.2% 25.00% Frequency Frequency 30.00% 20.00% Expected Actual 15.00% 15.00% 10.00% 10.00% 5.00% 5.00% 0.00% 0.00% 1 2 3 4 5 6 7 Leading Digit (fix time) 8 9 1 2 3 4 5 6 7 8 9 Leading Digit Team Software Process 12 © 2014 Carnegie Mellon University Baseline and Benchmark Data Team Software Process 13 © 2014 Carnegie Mellon University Team Size 18 Mean = Median = 16 8.2 8.0 14 12 Frequency 10 8 6 4 2 0 3 6 9 12 Team Size 15 18 Team Software Process 14 © 2014 Carnegie Mellon University Mean Team Member Weekly Task Hours 18 Mean = 10.3 Median = 9.0 16 14 12 Frequency 10 8 6 4 2 0 2.0 5.6 9.2 12.8 16.4 20.0 Mean Team Member Weekly Task Hours 23.6 Team Software Process 15 © 2014 Carnegie Mellon University Productivity 40 Mean = 10.3 Median = 7.1 30 Frequency 20 10 0 0 15 30 45 60 Productivity (LOC/Hr) 75 90 Team Software Process 16 © 2014 Carnegie Mellon University Plan Vs. Actual Hours for Completed Parts 10,000 9,000 R² = 0.952 8,000 7,000 6,000 Actual Hours for 5,000 Completed Parts 4,000 3,000 2,000 1,000 0 0 2,000 4,000 6,000 8,000 Plan Hours for Completed Parts 10,000 Team Software Process 17 © 2014 Carnegie Mellon University Plan Task Hours Vs. Actual Task Hours 9000 8000 R² = 0.8038 7000 6000 Actual Task Hours 5000 4000 3000 2000 1000 0 0 2000 4000 6000 Plan Task Hours 8000 10000 Team Software Process 18 © 2014 Carnegie Mellon University Defect Density – Median of Defects Per KLOC DLD Review 2.2 Code Review 5.2 Code Inspection 3.3 Unit Test 3.8 Build/Integration Test 0.7 System Test 0.15 0 1 2 3 4 Defects Per KLOC (Median) 5 6 Team Software Process 19 © 2014 Carnegie Mellon University Summary Management should be part instinct and part fact-based, but the lack of data and facts in software engineering slows learning. Deming urged us to trust data more than our intuition, but can the data be trusted? This presentation has shown a measurement system that • can be evaluated to determine the accuracy of project data. • has built-in checks and balances to guard against intentional and unintentional errors. • produces data that supports estimating, planning, tracking, baselines, and benchmarking. Team Software Process 20 © 2014 Carnegie Mellon University Questions? Team Software Process 21 © 2014 Carnegie Mellon University Presenter Information Presenter James Over Technical Director Software Solutions Division Telephone: +1 412-268-5800 Email: info@sei.cmu.edu U.S. Mail Software Engineering Institute Customer Relations 4500 Fifth Avenue Pittsburgh, PA 15213-2612 USA Web www.sei.cmu.edu www.sei.cmu.edu/contact.cfm Customer Relations Email: info@sei.cmu.edu Telephone: +1 412-268-5800 SEI Phone: +1 412-268-5800 SEI Fax: +1 412-268-6257 Team Software Process 22 © 2014 Carnegie Mellon University