Company X Due Diligence Prep Health Check Spring 2020 DRAFT, PRELIMINARY, CONFIDENTIAL About this document • Client: PE-backed software company preparing for sale. Wanting to know what might be asked in technical due diligence • <1 hour of client setup: repository hookup, plus explanation of “platform” vs. “product” organizational structure (optional). Otherwise no interviews were conducted • Sema’s SaaS solution analyzed the code and teams in ~3 hours- all commits, all time • Sema’s Professional Services team wrote the commentary • This document is a sanitized excerpt from the complete document. Where it was not possible to sanitize examples, screenshots from other publicly available (open source) code was used instead to demonstrate the analysis 2 DRAFT, PRELIMINARY, CONFIDENTIAL Strength Executive Summary - Code High Risk Medium Risk Low Risk Element of Code quality Discussion Size • 100 total products with 290 total repositories, including 30 products part of PLATFORM, 70 products are Legacy Language composition • Strength – Written in modern languages with sufficient global access to developers • High Risk - Cost per Line of Code for the organization is at $12.01, for PLATFORM repositories is $14.62. >$5 is high risk. 91% of tech debt due to lack of unit testing- deeper understanding of testing methods required. Strength - low excessive complexity, code duplication among Legacy products. Core technical debt • • Line Level Warnings • Medium Risk - Investigate and as needed remediate 95 Performance warnings, 1,880 Security, 25,326 Potential Bugs, in PLATFORM and Legacy products Low Risk - Investigate and as needed remediate 47,068 Smells, 3,977 Environment Sensitive, and 30,080 Misleading warnings in PLATFORM products Third party code • Low Risk - 7.8 % of code is from third party libraries, however for Repository X 41.3% of the code is from a third party, the majority from sagacity. Package dependencies • Strength - 3 of the 3 largest repositories among PLATFORM developed products follow modern architectural practices for package dependencies: higher concentration of incoming packages than outgoing. 3 DRAFT, PRELIMINARY, CONFIDENTIAL Strength Executive Summary – Process and Team High Risk Medium Risk Low Risk Element of Code quality Discussion Distributed repositories and smaller packages • • Low Risk - 2 of the 3 PLATFORM developed products contain a very large repository Low Risk – At least 2 very large packages identified among the PLATFORM repositories • Medium Risk – there appears to be a decline in development activity since Summer 2019, excluding what appears to be administrative changes • High Risk- 1.57% of historical commits were made from personal Gmail accounts, including 1.05% in last three months Commit Management • Low Risk - 2 of 3 of the products within PLATFORM average above 10 files per commit across repositories Ticket Referencing • All repositories within the PLATFORM set meet the recommended ticket reference guidance- this is excellent Test Referencing • Medium Risk - 2 of 8 of the PLATFORM repositories meet the test reference standards of 40% Developer Contribution and Skill • Pending responses by COMPANY team of the Team analysis- Due Diligence checklist Commit Analysis Repository Access Management 4 DRAFT, PRELIMINARY, CONFIDENTIAL • Code • Process • Team 5 DRAFT, PRELIMINARY, CONFIDENTIAL Basic facts Item Amount Total size 60,585,292 lines, 196,564 files, and 6,037,520,854 bytes Number of Products - total 100 PLATFORM (PLATFORM Developed) 30 Legacy 70 Number of Repositories - total 290 PLATFORM (PLATFORM Developed) 90 Legacy 200 Number of contributors All time 372 Current (Last 90 Days) 73 6 DRAFT, PRELIMINARY, CONFIDENTIAL Repository Summary for PLATFORM products A 1 B 2 B 3 B 4 C 5 C 6 C 7 C 8 7 DRAFT, PRELIMINARY, CONFIDENTIAL Repository Summary for Legacy products D 9 D 10 D 11 E 12 E 13 F 14 F 15 G 16 G 17 G 18 G 19 G 20 G 21 H 22 H 23 H 24 H 25 I 26 I 27 J 28 Per client, Legacy products are not being developed but still have meaningful customer usage. Sema has analyzed these repositories and will identify strengths and potential areas of improvement. However, the standard for investment/ Risk is different. 8 DRAFT, PRELIMINARY, CONFIDENTIAL Product and Repository Sizing Sema analyzed 29 repositories with 10 products. PLATFORM is the platform, including three products. There are also seven Legacy products still used by customers. The product Product A has the most amount of files and the second largest lines of code. The product Product B has the most lines of code and second largest amount of files. The repository Repository 3 is the largest single repository in terms of files and lines of code. Note: Repository 29 is empty in GitHub, so that is not represented in these findings. Question for COMPANY: is there a reason for this? If not, could the repo be deleted for repository management hygiene? 9 DRAFT, PRELIMINARY, CONFIDENTIAL Size by Product- Files and Lines Product A Product A Product B Product B Product C Product C Product D Product D Product E Product E Product F Product F Product G Product G Product H Product H Product J Product I Product I Product J 10 DRAFT, PRELIMINARY, CONFIDENTIAL Language Composition Across the organization, Java is the largest language by count of files and lines of code. For the platform PLATFORM, the most files are written in Java, and the most lines are written in JavaScript. Legacy repositories featured 95% of the HTML files for the organization. Moving forward PLATFORM holds almost 100% of the Ruby files, and is also the 3rd most common file in PLATFORM behind Java and JavaScript. Sema recommends to look at re-prioritizing language expertise amongst developers. Strength – Written in modern languages with sufficient global access to developers 11 DRAFT, PRELIMINARY, CONFIDENTIAL Language Composition for the organization 12 DRAFT, PRELIMINARY, CONFIDENTIAL Core Technical Debt Sizing and Resolution Cost EstimateOverview This section includes four components of technical debt: • Files with too much complexity • Duplicated blocks of code • Unit test coverage • Substantive line level warnings, covered in more detail in the next section Then, it estimates the time in person-days to reduce the technical debt at a cost of $500/day based on US averages for benchmarking. Sema does not recommend reducing technical debt to $0 as the cost does not outweigh the effort. However, technical debt should be monitored and optimized for continued cost-effective development or maintenance of a code base. 13 DRAFT, PRELIMINARY, CONFIDENTIAL Technical Debt Summary • High Risk - Cost per Line of Code for the organization is at $12.01 • High Risk - Cost per Line of Code for PLATFORM repositories is $14.62 • Medium Risk - Cost per Line of Code for Legacy repositories is $10.11 • High Risk - 19 of the 28 repositories have cost per line of code above $5. • 91% of the technical debt is a result of lack of unit testing. Recommended to evaluate implement more testing or determining rationale for lower testing standards. • Questions for COMPANY: explain your testing methods. Do you use unit tests in each or most repositories to test your code? • Repository XX has a testing repository, do others? • Are you using a different method for testing other products and repositories? • If testing is actually low … • What does customer feedback (reported bugs, NPS, retention) say about Legacy products? • What does customer feedback (reported bugs, NPS, retention) say about PLATFORM? 14 DRAFT, PRELIMINARY, CONFIDENTIAL Technical Debt for PLATFORM Repositories Repository 3 Repository 4 Repository 1 Repository 2 Repository 5 Repository 6 Repository 7 Repository 8 • COMPANY: See above discussion referring to testing. • Medium Risk - Excessive code duplication in Repository 3, 11,032 person-days to address • Question for COMPANY: can you investigate and explain? 15 DRAFT, PRELIMINARY, CONFIDENTIAL Technical Debt for Legacy Products Repository 9 Repository 10 Repository 11 Repository 12 Repository 13 Repository 14 Repository 15 Repository 16 Repository 17 Repository 18 Repository 19 Repository 20 Repository 21 Repository 22 Repository 23 Repository 24 Repository 25 Repository 26 Repository 27 • Strength - low excessive complexity, code duplication among Legacy products. • COMPANY: See above discussion referring to testing. 16 DRAFT, PRELIMINARY, CONFIDENTIAL Line Level Warnings Sema identifies over 1000 line-level warnings across multiple languages, grouped into seven categories: Environment Sensitive, Misleading, Potential Bug, Smell, Stylistic, Security, and Performance 1,482,990 instances of line-level warnings were identified, with 496,505 in PLATFORM . Stylistic warnings are 81.5% of the line-level warnings and are not considered technical debt. Smells is the next highest at 11.5% of the line-level warnings. Medium Risk - Investigate and as needed remediate 95 Performance, 1,880 Security, and 25,326 Potential Bugs warnings in PLATFORM and Legacy products Low Risk - Investigate and as needed remediate 47,068 Smells, 3,977 Environment Sensitive, and 30,080 Misleading warnings in PLATFORM products Not recommended to investigate Stylistic warnings for either PLATFORM or Legacy products. Not recommended to remediate remaining warnings for Legacy products– 123,693 Smells, 7,276 Environmentally Sensitive, and 35,702 Misleading. 17 DRAFT, PRELIMINARY, CONFIDENTIAL Total Line Level Warnings Across All Repositories 18 DRAFT, PRELIMINARY, CONFIDENTIAL Line Level Warnings for PLATFORM by Repo and Language Repository 1 Repository 2 Repository 3 Repository 4 Repository 6 Repository 5 Repository 7 19 DRAFT, PRELIMINARY, CONFIDENTIAL Third Party Libraries Product A Product B Product C Product D Product E Product G Product H Product I Product F • Third party libraries are appropriate for efficient programming, modern code design. Sema recommends however that dependency management tools be put in place to minimize risk and ease maintenance of code, rather than including the code itself. • Low Risk - 7.8 % of code is from third party libraries, however for Product A 41.3% of the code is from a third party, the majority from sagacity. • Questions for COMPANY: are there substantive reasons to have included some or all of this third party code? If not, recommend removing and using a dependency management tool instead. If you do that, this would be considered a “Strength” 20 DRAFT, PRELIMINARY, CONFIDENTIAL Package Dependencies – Repository 1 Outgoing Incoming • Sema generally recommends that an application should have few classes with high outgoing dependencies, several classes with high incoming, and key classes with middle values that represent the divisions of business logic within the application. • Strength - 3 of the 3 largest repositories among PLATFORM developed products follow modern architectural practices for package dependencies: higher concentration of incoming packages than outgoing. • See above for publicly available example (workday-elastic-search) 21 DRAFT, PRELIMINARY, CONFIDENTIAL Distributed repositories and smaller packages Sema generally recommends that both repositories and individual packages not get too large, for ease of maintenance, adding additional features, and team management. What “too large” means for each organization is highly dependent on individualized circumstances, but Sema recommends a regular review and consideration of the largest repositories and packages. Low Risk - 2 of the 3 PLATFORM products contain a very large repository Low Risk – At least 2 very large packages identified among the PLATFORM repositories 22 DRAFT, PRELIMINARY, CONFIDENTIAL Distributed repositories and smaller packages– Repository 5 • Strength – Repository 5 is a modern repository with a primary controller package in both the outgoing and incoming dependency graphs • Repository 5 has smaller distributed repositories 23 DRAFT, PRELIMINARY, CONFIDENTIAL Distributed repositories and smaller packages– Repository 1 • Low Risk – Repository 1 (workdayelastic-search) is a monolithic repository • This repository is a candidate to break into smaller service level repositories • 24 Question for COMPANY: • Are there performance issues or maintenance/ development issues associated with being a monolithic repository? If not, it would become low Risk for acquisition. • Sema still recommends breaking them up for staff performance management, however. DRAFT, PRELIMINARY, CONFIDENTIAL Distributed repositories and smaller packages– Repository 1 25 • Low Risk- COMPANY has at least one large package in PLATFORM products that are candidates to be broken up • As an example, org.elasticsearch.search.aggregatio ns.metrics has more than 15,000 self-referential or outgoing dependencies • Question for COMPANY: • Are there performance issues or maintenance/ development issues associated with large packages? • Once you complete a review and have an articulatable reason for keeping large packages (or fix them), this would become a Strength DRAFT, PRELIMINARY, CONFIDENTIAL Architectural Quality Indicators Architectural Quality Indicator Definition Computation A design with low coupling and high cohesion is easily reused by other designs. 0.25 * Coupling + 0.25 * Cohesion + 0.5 *Messaging + 0.5 * Design Size The degree of allowance of changes in the design. 0.25 * Encapsulation - 0.25 * Coupling + 0.5 * Composition + 0.5 * Polymorphism Understandability The degree of understanding and the easiness of learning the design implementation details. 0.33 * Abstraction + 0.33 Encapsulation -0.33 * Coupling + 0.33 * Cohesion - 0.33 *Polymorphism - 0.33 * Complexity - 0.33 *Design Size Functionality Classes with given functions that are publicly stated in interfaces used by others. 0.12 * Cohesion + 0.22 * Polymorphism +0.22 * Messaging + 0.22 * Design Size + 0.22 * Hierarchies Extendibility Measurement of design’s allowance to incorporate new functional requirements. 0.5 * Abstraction - 0.5 * Coupling + 0.5 * Inheritance + 0.5 * Polymorphism Design efficiency in fulfilling the required functionality. 0.2 * Abstraction + 0.2 Encapsulation + 0.2 * Composition + 0.2 * Inheritance + 0.2 * Polymorphism Reusability Flexibility Effectiveness 26 DRAFT, PRELIMINARY, CONFIDENTIAL Design Quality Indicators Design Metric Design Property Description Design Size in Classes (DSC) Design Size Total number of classes in the design. Number of Hierarchies (NOH) Hierarchies Total number of “root” classes in the design. (count(MaxinheritenceTree (class)=0)) Average Number of Ancestors (ANA) Abstraction Average number of classes in the inheritance tree for each class. Direct Access Metric (DAM) Encapsulation Ratio of the number of private and protected attributes to the total number of attributes in the class. Direct Class Coupling (DCC) Coupling Number of other classes a class relates to, either through a shared attribute or a parameter in a method. Cohesion Among Methods of Class (CAMC) Cohesion Measure of how related methods are in a class in terms of used parameters. It can be computed by: 1 - LackOfCohesionOfMethods() Measure of Aggregation (MOA) Composition Count of number of attributes whose type is user defined classes. Measure of Functional Abstraction (MFA) Inheritance Ratio of the number of inherited methods per the total number of methods within a class. Number of Polymorphic Methods (NOP) Polymorphism Any method that can be used by a class and its descendants. Counts of the number of methods in a class excluding private, static, and final ones. Class Interface Size (CIS) Messaging Number of public methods in a class. Number of Methods (NOM) Complexity Number of methods declared in a class. 27 DRAFT, PRELIMINARY, CONFIDENTIAL Architectural Quality Indicators – Repo X 28 • Extendibility: • Measurement of design’s allowance to incorporate new functional requirements. • Calculation: 0.5 * Abstraction - 0.5 * Coupling + 0.5 * Inheritance + 0.5 * Polymorphism • Low Risk – Repo X an decrease in Extendibility in July of 2017 and again in Nov of 2018 • Investigate – Both drops are due in large part to the Number of Polymorphic Methods (NOP) – See next slide DRAFT, PRELIMINARY, CONFIDENTIAL Design Quality Indicators – Repo X 29 • Investigate – Polymorphism dropped in July of 2017 and again in Nov of 2018 • Discuss - was there an architectural design decision for this? DRAFT, PRELIMINARY, CONFIDENTIAL • Code • Process • Team 30 DRAFT, PRELIMINARY, CONFIDENTIAL Commit Analysis Sema recommends setting consistent commit activity goals over time and manage towards them, with respect to code priorities. Investigate - There are spikes in file changes in January and February of 2019, and they were mainly based in Repository 3. Commit activity started to increase in January 2019, peaked in October 2019 and has exhibiting a declining trendline. Medium Risk – there appears to be a decline in development activity since Summer 2019, excluding what appears to be administrative changes. COMPANY: can you review and explain? Were there changes to development practices or other administrative code changes, that could explain this apparent change? 31 DRAFT, PRELIMINARY, CONFIDENTIAL Commit Analysis – File Changes Jan 2016-Present Repository 3 32 Commit Analysis – Commits Jan 2016 to Present Repository 3 33 Commit Analysis – File Changes June 2019 - Present Repository 3 34 Commit Analysis – Commits June 2019 to Present Repository 3 35 Repository Access Management As a risk avoidance measure, employees and contractors should only be allowed access to the repository with approved accounts. Organization aliases- such as the current company or previous acquisitions- are appropriate. GitHub aliases are also appropriate. Commits with GitHub in their email address have performed command line comments with their email address set to private. Generic email addresses such as Gmail, Yahoo, and Microsoft live should be avoided. 36 DRAFT, PRELIMINARY, CONFIDENTIAL Email Aliases Used in Commits- all time and last 3 months Commits by Email type- all time Commits by Email type- since Jan 1, 2020 • Sema recommends management of commits based on organizations/domains, and limiting commits outside of the organization • High Risk- 1.57% of historical commits were made from personal Gmail accounts, Including 1.05% from last three months 37 DRAFT, PRELIMINARY, CONFIDENTIAL Commit Management A good process for committing code improves the quality and ease of maintenance of a code base. Sema analyzes an aspect of this with looking at the amount of files are included into each commit on average. Sema recommends keeping files per commit low (<5), to make maintenance/ future changes easier. Sema considers above 10 files per commit as a risk, and organizations should look to investigate further. Low Risk - 2 of 3 of the products within PLATFORM average above 10 files per commit across repositories 38 DRAFT, PRELIMINARY, CONFIDENTIAL File Changes per Commit Ratio – Product A– 3/1/19 to present Repository 2 • Product A overall is above the 10 files per commit, and should be investigated • Repository 2 and Repository 1 are above the recommended 10 files per commit, while Repository 4 is in the investigate category above 5 Repository 1 Repository 4 Repository 3 39 DRAFT, PRELIMINARY, CONFIDENTIAL Average File Changes per Commit Per Developer – Product A 3/1/19 to present • Higher counts for administrative or repository management tasks are expected and not an issue Developer A Developer B Developer D • Low Risk- review commit practices for developers with files per commit > 10 who are not carrying out administrative changes – Developers A and B, in this example – and see if there is a development rationale. If not, coach towards fewer files per commit Developer C Developer E Developer F Developer G Developer H Developer I Developer J Developer K Developer L Developer M 40 DRAFT, PRELIMINARY, CONFIDENTIAL Ticket Reference and Test Reference Management Adherence to a process of committing code based on associated tickets allows for easier troubleshooting and the ability to understand the reason for code changes. Adding testing as development occurs is the one of the easiest ways to minimize technical debt and drive quality in the code. Sema recommends that over 60% of commits have ticket references and over 40% of commits have test references so that testing is included as development occurs. Strength – All repositories within the PLATFORM set meet the recommended ticket reference guidance COMPANY: how did you do this, manually or automatically? If manual, even more impressive process discipline. Either way it is excellent. Medium Risk - 2 of 8 of the PLATFORM repositories meet the test reference standards of 40% 41 DRAFT, PRELIMINARY, CONFIDENTIAL Commits with Tickets Reference and Test Commits – Product B 03/1/2019 - Present Repository 2 Repository 2 Repository 1 Repository 1 Repository 4 Repository 4 Repository 3 Repository 3 • Strength - All repositories have above 60% ticket references • Medium Risk - All repositories are under 40% ‘test’ commits, with Repository 3 being the closest at 39% 42 DRAFT, PRELIMINARY, CONFIDENTIAL • Code • Process • Team 43 DRAFT, PRELIMINARY, CONFIDENTIAL Team analysis- Due Diligence checklist • Note: this applies to both PLATFORM and Legacy Products • Is individual developer activity monitored? • Are low activity developers coached on improvement? • Is type of developer activity monitored (create vs. change), and is coaching provided? • Is skill of developers measured quantitatively as well as qualitatively? • Among developers identified as highest contribution (top 10-20%) by Sema’s analysis… • Do you agree or can you explain? (e.g. administrative changes) • Among developers with actual high knowledge, how many are still at COMPANY? If they are no longer there was there a knowledge transfer before they left? What risk mitigation methods do you have in place for protecting current subject matter expertise? • Among developers identified as highest skill (top 10-20%) by Sema’s analysis… • Do you agree or can you explain? • If you agree and the assessment is accurate, what is the status of those developers? Do they still work at COMPANY? Have they been recognized/ compensated? • Among developers identified as lowest skill and lowest contribution (10-20%) by Sema’s analysis… • Do you agree or can you explain (e.g. new developers)? • If you agree and the assessment is accurate, what is the status of those developers? Have they been coached and or evaluated? Can you provide stats, e.g. “XX of the YY low skill developers have been exited.” • In particular is there a training program in place for new-to-COMPANY developers (low contribution, varied skill) 44 Developer Activity and Expertise Sema recommends setting consistent commit activity goals over time and manage towards them, with respect to code priorities. Based on current agile methodology, Sema recommends maximizing the number of full stack developers on a team. Strengths - Multiple developers with significant contributions in each repository, similarly multiple developers with across languages (front-end and back-end) 45 Commits and File Changes per Developer (Top 20) – 90 Days Developer A Developer A Developer B Developer B Developer D Developer D Developer C Developer C Developer E Developer E Developer F Developer F Developer G Developer G Developer H Developer H Developer I Developer I Developer K Developer K Developer J Developer J Developer H Developer H Developer I Developer I Developer J Developer J Developer K Developer K Developer L Developer L Developer N Developer N Developer O Developer O Developer M Developer M Developer P Developer P 46 DRAFT, PRELIMINARY, CONFIDENTIAL Developer Expertise (Top 20) Developer A • Strength - top contributing developers have experience in front-end (JS and HTML) and back-end (Java) languages • Developer A and Developer E have the most Ruby experience • Developer D has significant SQL experience Developer B Developer D Developer C Developer E Developer F Developer G Developer H Developer I Developer K Developer J Developer H Developer I Developer J Developer K Developer L Developer N Developer O Developer M Developer P 47 DRAFT, PRELIMINARY, CONFIDENTIAL Contribution Pattern – Repository 3 • • Investigate – Ryan Q is an “Editor” = has changed others’ code 38 times more than changing his own code – is this in line with responsibilities? Aditi N is a “Self Perfecter” relative to peers, though not relative to objective standards (>5) = changed own code 2X more than creating code 48 DRAFT, PRELIMINARY, CONFIDENTIAL Contribution Detail Top Contributors – Repository 1 Developer A Developer B Developer C Developer D Developer F Developer E Developer J Developer G Developer I Developer H • 6 of the 10 top contributors for Repository 1 have contributed to the repository in the last 90 days- this is an indication that there is sufficient subject matter expertise “in the building” 49 DRAFT, PRELIMINARY, CONFIDENTIAL Overview of Developer Coaching Grid: 23 sets of metrics What we measure How we calculate • Contribution: how much work is each developer doing, and how are they spending their time? • Time series review of which developers have written which code • Skill: who writes the most/ least clean code? Who is making the most/ least positive impact on code architecture? • Delta of the quality of the code at each commit compared to previous code • Rollup into objective and comparative ranking for each developer 50 Metrics Definitions: 9 for Contribution 51 Metrics Definitions: Up to 14 for Skill (language-specific) 52 Contribution – Product B • Developer L is the top contributor to Product B- has the created the most code, and edited his and others • Developer M is a substantial editor of others’ code 53 Skill - Line-level warnings – Product B 54 • Note – Total does not include Stylistic warnings • Developer I and Developer N have the highest levels of substantive warnings attributed to them in the code Summary Developer Coaching Grid – Product B • Developer A ranks first for code contribution and near the top for skill as well • Developer P is low on both Skill and contribution 55 Summary Developer Coaching Grid – Product C • Developer A likely carrying out administrative tasks, vs. actual coding • There is a range of skill scores among low contributors- what coaching/ support mechanisms are in place? 56