T-76.5613 Software testing and quality assurance Static code analysis and code review Mika Mäntylä SoberIT HELSINKI UNIVERSITY OF TECHNOLOGY Different static methods Mathematical verification ¾ Code reviews ¾ Read the source code in order to find defects and quality issues Tool based static analysis ¾ Prove the program correctness with mathematical methods “Automated code inspections” Subjective design and code quality indicators and improvement 2 Introduction – Static versus Systematic testing Systematic software testing ¾ Requires a large set of tests ¾ Each test tend to discover only one or few faults ¾ Cannot be used in early stages Static methods ¾ Does not require execution ¾ Each error can be considered in isolation ¾ ¾ No problems where you don't know whether anomalies are from new fault or side-effects of existing one Can consider other quality attributes as maintainability, reusability, security 3 Contents Static methods ¾ Mathematical verification ¾ Code review defects ¾ Tool based analysis ¾ Subjective design and code quality indicators and improvement 4 Mathematical verification Research in this area has been going on 40 years This can offer error free programs ¾ Assuming that programs specification is correct Mathematical formal verification requires Program itself must be formally specified ¾ Semantics of programming language must be formally defined ¾ ¾ ¾ ¾ The formal semantics of a language is given by a mathematical model to represent the possible computations described by the language Syntax is always formally defined Semantics of most programming languages are not formally defined — We must use only a subset of language 5 Mathematical verification cont’d Hoare’s triplets are example of mathematical verification ¾ {P} S {Q} ¾ ¾ Mathematical verification is too time consuming and expensive for many programs ¾ Can be used in systems or parts of systems that are ¾ ¾ ¾ ¾ P is a precondition, S is program statements, Q is a post condition Example { x = 0 } x := x + 2 { x ≥ 1 }, safety critical require high reliability E.g. telecom systems, space-shuttle software, embedded systems ISDN-protocol bugs enabled free calls -> discovered with mathematical verification Automated mathematical verification, i.e. model checking ¾ Used to verify and debug systems specified with temporal logic ¾ ¾ Software needs to be abstracted for the language of the model checker State explosion problem ¾ Algorithms: symbolic algorithms, partial order reduction 6 Design-By-Contract (DBC) DBC is practical application of Hoare’s triplets Uses contract as program specifications ¾ ¾ ¾ Each routine has “two contracts”, pre and post conditions ¾ Caller must satisfy preconditions ¾ The routine must satisfy post conditions Inheritance and polymorphism is also supported through “subcontracting” ¾ Pre condition may be kept or weaken ¾ Post condition may be kept or strengthen Classes can have contracts. E.g. invariants that must hold for all instances Contract code not used in production 7 Design-By-Contract (DBC) – Example Routine put (x: ELEMENT; key: STRING) is -- Insert x so that it will be retrievable through key. require count <= capacity not key.empty do ... Some insertion algorithm ... ensure has (x) item (key) = x count = old count + 1 end 8 Design-By-Contract (DBC) Benefits: Errors are immediately detected and located -> Better error handling ¾ Programmers must specify their intentions -> Better software design and structure ¾ Specified intentions serve as documentation -> Better documentation ¾ Drawbacks ¾ Lack programming language support ¾ ¾ Native support: Eiffel, D, Choreme, Nice 3rd party tools: Java, C/C++, C#, Perl, Python, Ruby, Scheme — ¾ Wikipedia has good list of tools Assertions can be used in DBC fashion — Discipline required 9 CleanRoom CleanRoom is development approach utilizing statistic / mathematical techniques. ¾ Bugs are not allowed to enter the software CleanRoom has ¾ ¾ Incremental development Programs are seen as mathematical functions ¾ ¾ ¾ ¾ ¾ Stepwise refinement (blackbox->state-boxes->clear-boxes (procedural design)) Formal specification of the programs Statistic verification of program correctness Increment components are not executed or tested in any way Integrated system is not tested to find defects but to ensure reliability with operational profile ¾ ¾ ¾ Operational profile = Probable pattern of the usage of the software All possible usage patterns defined Defined based on box structure specifications of system function — plus information on system usage probabilities obtained from prospective users, actual usage of prior versions, etc. 10 CleanRoom Cases Ericsson OS-32 (350Kloc) operating system project ¾ ¾ ¾ ¾ 70% improvement in development productivity, 100% improvement in testing productivity, Testing error rate of 1.0 errors per KLOC (all errors found in all testing) Project that contributed the most to the company in 1993 NASA Satellite Control (40Kloc) ¾ 80% improvement in productivity ¾ ¾ ¾ 780 LOC/person month 50% improvement in quality 4,5 errors per KLOC Other ClearRoom project domains ¾ ¾ ¾ ¾ Flight control software Lan Software Real-time embedded tape drive software Medical device software (2004) 11 Summary of Mathematical Methods Mathematical verification is expensive, but can be used when high reliability is required Design-by-Contract is a feasible option ¾ Technique based on pre and post conditions CleanRoom is development approach utilizing statistical techniques. 12 Contents Static methods ¾ Mathematical verification ¾ Code review defects ¾ Tool based analysis ¾ Subjective design and code quality indicators and improvement 13 What is a defect? No universally adopted definition Three approaches in code review context ¾ anomalies that will cause a system failure ¾ anomalies, which may (or may not) result in a system failure ¾ all deviations from quality are defects In this case the last definition is used 14 Code Review - Defect types 90.00 % 80.00 % 70.00 % 60.00 % Industry case Q 50.00 % Industry case S 40.00 % Students 30.00 % 20.00 % 10.00 % 0.00 % Evolvability Functional Defects FP 15 What are Evolvability Defects? (1/3) Not just simple issues dealing with layout Three types of evolvability defects ¾ Visual Representation ¾ Documentation ¾ Structure Visual Representation has roughly 10% share of evolvability defects ¾ This number was consistent in the three data sets (IQ 10%, Stu 11%, IS 12%) ¾ Examples of VR: Indentation, Blank line usage, Grouping... 16 What are Evolvability Defects? (2/3) Documentation - Communicates the intent of the code ¾ ¾ Textual ¾ Code comments ¾ Naming of code elements ¾ Debugging information Supported by language ¾ Declare immutable (keyword: const, final) ¾ Visibility of code elements (keyword: static, private) 17 What are Evolvability Defects? (3/3) Structure ¾ ¾ Re-organize ¾ Move Functionality, Long sub-routine, ¾ Dead code, Duplicate code Alternative Approach ¾ ¾ Change function, Magic number, Create new functionality, Use Exceptions Semantic dead code, Semantic duplicate code 18 What are functional defects (1/2)? Data & Resource ¾ ¾ ¾ Variable initialization Memory management Data & Resource manipulation ¾ ¾ Checks ¾ Check return value ¾ ¾ ¾ ¾ Valid value returned No error code given Check memory allocation Check variable ¾ ¾ ¾ Parameters are valid Checking loop variables Check pointer Interface ¾ Defect made when interacting with other parts of the software or with code library Incorrect or missign function call Incorrect or missign parameter Logic ¾ ¾ ¾ ¾ ¾ Comparison Computation Wrong location Off by one Algorithm / Performance 19 What are functional defects (2/2)? Timing Only possible in application with multiple thread using shared resources ¾ Deadlocks, starvation ¾ Support Defects in support systems and libraries or their configurations ¾ Faulty build scripts may result failure to deliver all libraries ¾ Larger Defects Cannot be pinpointed to a small set of code lines ¾ Functionality is missing or incorrectly implemented ¾ Require additional code or larger modifications to the existing solution ¾ 20 Functional Defects Combined Basili and Selby (1987) Chillarege et al. (1992)} Humphrey (1995) Beizer (1990) Kaner et al. (1999) Resource Initialization, Data Assignment Assignment, Data Data Initial and later states, Handling data, Load conditions Check - Checking Checking - Error handling Interface Interface Interface Interface Integration, System and software architecture Hardware Logic Control, Computation Algorithm Function Structural bugs Calculation, Control flow, Boundary-errors Timing - Timing/ serialization System Timing Race conditions Support - Build/package/ merge Packaging, Enviroment - Source and version control Larger defects - Function Functionality as implemented User interface errors 21 Defect types, why and what Presented defect types offer a starting point for company’s own defect taxonomy. Understanding defects can be used to ¾ Measure benefits of chosen quality assurance method ¾ ¾ E.g. Automatic static analysis vs. Code review Used in root cause analysis to prevent such defect from reappearing 22 Using defect database to improve QA Review Manual testing Automatic tests Finder Symptom Root Cause Manual testing Crash Faulty Interface Defect Database Improve automatic interface tests Needed for finding critical bugs Summary of Code review data What is a defect 1:3 ratio funtional defects: evolvability defects Evolvability defect ¾ Visual representation (layout) ¾ Documentation (Naming & Commenting) ¾ Structure Functional defects ¾ Resource, Checks, Interface, Logic, Timing, Support, Larger Defects 24 Contents Static methods ¾ Mathematical verification ¾ Code review defects ¾ Tool based analysis ¾ Subjective design and code quality indicators and improvement 25 Tool Based Static analysis “Automated code inspections” Descended from compiler technology A compiler statically analyses the code and knows a lot about it, e.g. Variable usage; finds syntax faults ¾ Static analysis tools extend this knowledge ¾ Reverse-engineering tools also extend compiler knowledge ¾ ¾ But they have different target e.g. Source Code -> UML Goals: ¾ Check violation of code ¾ ¾ Documentation Visual Representation Ensure the system design and structure quality ¾ Detect defects ¾ 26 Layout & Visual Representation Visual representation has great impact to program comprehension ¾ Usage of white space, indentation and other layout issues ¾ Different layout styles: K&R Style, Allman Style, GNU Style What can static analysis do to enforce ¾ Layout / Visual representation ¾ Layout can be completely enforced ¾ Pretty printers can even restore the wanted layout 27 Documenting Code (Naming) Descriptive naming greatly increase the program comprehension Different naming styles ¾ Sun’s Java Coding Conventions ¾ ¾ ¾ ¾ Hungarian naming style (“Microsoft Style”) ¾ ¾ Class names are nouns Methods are verbs Variable names should reflect the variables intended use Identifiers prefixes state their type lprgst -> long pointer to array of zero terminated strings Figuring good names right away is difficult Name should be improved as better one is discovered ¾ Rename is most used “refactoring” feature in Eclipse ¾ 28 Documenting Code (Commenting) Studies have shown that too much commenting is harmful ¾ Commenting is often abused ¾ Comment are used as deodorant — ¾ Comments re-stating the method name — Using comments to explain a complex code rather than improving the code Maybe caused by outside pressure to comment all code Commenting should be used wisely ¾ Provide extra information not obvious from the naming ¾ “Check needed because of bug in Java virtual machine 1.4.1” ¾ “Used by X” ¾ “If parameter is <0 this disables all buttons” 29 Static Analysis – Documenting What can static analysis do to enforce ¾ Documenting ¾ Naming — Regular expression checks — ¾ Commenting — — — ¾ E.g int’s are lower case, starting with ‘i’, length< 6 letters Check that classes, variables, and methods have comments and Existence of different java doc tags can be checked (@return, @param) No tool can say anything about the sanity of the comments Tool CheckStyle http://checkstyle.sourceforge.net/ Peer pressure and standards are the most effective tools 30 Structure Does the following look acceptable This method has over 500 lines of code and cyclocmatic complexity over 100 Writing for example a unit test or a description of the methods behavior is impossible 31 Structure The software structure affects many software quality attributes ¾ Maintainability, portability, reliability, and functionality Reasons for controlling internal software structure ¾ ¾ The future development depends on the current structure Agreement on what is acceptable software structure ¾ ¾ To stop bad programmers ¾ ¾ What is acceptable size for method, how big can class be? Anecdotal evidence suggest that some programmers actually do more harm than good To prevent side effects of software evolution. ¾ Laws of software evolution from 1970’s by Lehman & Belady at IBM with OS/360 state that — — Software which is used in a real-world environment must change or become less and less useful in that environment As an evolving program changes, its structure becomes more complex, unless active efforts are made to avoid this phenomenon 32 Case: Construction Quality Measurement at HP Purpose ¾ Measure combined three dimension of maintainability ¾ Create construction quality (maintainability) metric for software developers in the trenches 1) control structure, 2) information structure, 3) typography, naming, commenting Polynomial metric equation ¾ Procedure quality: 171 − 5 , 2 × ln( aveVol ) − 0 , 23 × aveV ( g ' ) − 16 , 2 × ln( aveLOC ) + ( 50 × sin( ¾ 2 , 46 × perCM )) Equation was adjusted against developers opinions Analysis assisted HP in ¾ ¾ ¾ ¾ Buy-versus-build decision, Controlling software entropy over several versions Identify change prone subcomponents Finding targets and assessing the efforts of reengineering 33 Summary: Static Analysis – Structure Code metrics are used to control internal software quality ¾ Based on change rate data: Cyclomatic complexity not >14 Code metrics can protect software from the harmful side-effects of software evolution By measuring you send a message of it’s importance Problems You should to define your own limits ¾ Historical data needed ¾ ¾ ¾ Lack qualitative elements ¾ ¾ If you want to use hard evidence to base the measures “You may replace that entire function with one function call” “Measures don’t really tell whether it is well programmed” Programs: ¾ ¾ CCCC, Metrics (Eclipse plugin) To measure duplication: Same and CloneFinder (commercial) 34 Static analysis – Error detection Static analysis can detect anomalies in the code ¾ unreachable code, ¾ undeclared variables, ¾ parameter type mismatches, ¾ uncalled functions & procedures, ¾ array bound violations ¾ thread synchronization problems, ¾ misuse of variables 35 Example: Errors void input_reporter(){ Where is the bug? int i; while (i == 0) { sleep(); i = doWeHaveInput(); Can this bug occur? C -> Yes Java -> No } weHaveInput(i); } 36 Data flow analysis Study of program variables ¾ Variable defined where a value is stored into it ¾ Variable used where the stored value is accessed ¾ Variable is undefined before it is defined or when it goes out of scope x is defined, y and z are used X=y+z IF a>b THEN read(S) a and b are used, S is defined 37 Data flow analysis faults n:=0 Data flow anomaly: n is redefined without being used read(x) n:=1 while x>y do Data flow fault: y is used before it has been defined (first time around the loop) begin read(y) write(n*y) x:=x-n end 38 Static analysis - Examples Static analysis can find real and difficult to spot defects Microsoft ¾ To handle bugs of Windows 2000 OS ¾ ¾ OS division has such tools as part of their build process ¾ ¾ A tool company (Intrinsa maker of Prefix) was bought with $60 million Whenever warning is found it is treated as a build breaker Their developers are using such tools in their desktops Open Source (Homeland Security bug hunt 2006, Coverity) ¾ Xfree86 bug allowed any user to get root access ¾ if (getuid() != 0 && geteuid == 0) — — ¾ getuid becomes a function pointer, always zero Should be ”geteuid()==0” ”source code analysis has proven to be and effective step towards furthering the quality and security of linux” – Morton A. lead kernel maintainer 39 Problems of static defect detectors False positives ¾ ¾ ¾ A reported defect where there is none Developers lose confidence to the tool Light weight static analysis tools (Lints) often report high rate of FP Noise = “messages people don’t care about” (not just “bogus” messages) ¾ Reporting a bug that does not really matter ¾ ¾ ¾ Too much noise ¾ ¾ Extra white space Possible null pointer exception in a situation that cannot occur, or if it does everything is already lost people won’t use the tool missing all the defects Trying to reduce false positives and noise reduces recall ¾ i.e. Leaves possible defects to the system. 40 Some solutions and improvements Filtering mechanisms to reduce output ¾ Automatically exclude unfeasible defects ¾ Reduce noise Based on boolean satisfiability analysis Extensions to provide extra rules and checks ¾ Similar idea is in Design by Contract ¾ Developers write them ¾ Possibility to find domain specific defects Automatically detect rules from source code (Engler et al 2001) ¾ Lock() is followed by UnLock() 95% of the time ¾ Missing UnLock() call is likely to be a bug 41 Summary on tool based static analysis Static analysis is “Automated code inspections” Descended from compiler technology Goal: ¾ Check problems of code ¾ Documentation (Naming & Documenation) — ¾ Layout / Visual representation — ¾ ¾ Automatic detection can find many hard to spot defects Problems: Noise, False postives ¾ Can good or poor design be measured? Yes and No Prevent the code erosion caused by software evolution Detect defects ¾ This is not currently so big problem Ensure the system design and structure quality ¾ ¾ Important but difficult to automate People can starting ignoring the messages and miss all defects detected Tools often can focus one or more of these viewpoints ¾ ¾ ¾ ¾ Programming errors (Lint, Findbugs, Coverity, Prefix, Prefast) Documentation (Checkstyle) Layout (IDE’s auto reformat) Structure (Code metrics tools) 42 Contents Static methods ¾ Mathematical verification ¾ Code inspections ¾ Tool based static analysis ¾ Subjective design and code quality indicators and improvement 43 Anti-Patterns and Bad Code Smells Describe some of the common mistakes in software development 44 Pattern History Design Patterns ¾ Represent reusable designs ¾ ¾ History ¾ ¾ ¾ Based on pattern languages which focused on traditional architecture (1977) Rediscovered by software people 1987 and published by GoF in 1994 Motivation ¾ Strategy, Command, Singleton, etc Capsulate years of industry experience on software development Anti-Patterns Represent frequently occurring problems ¾ History ¾ ¾ ¾ ¾ ¾ Have been around longer than Design Patterns Fred Brooks and The Mythical Man-Month 1979 — “Adding manpower to a late software project makes it later” Term Anti-Pattern appeared soon after the book by GoF in 1994 Motivation ¾ Learning from mistakes 45 Anti-Patterns & Bad Code Smells Anti-Pattern cover wider range of topics ¾ Development ¾ ¾ Bad code smells ¾ Are in fact development level anti-patterns Software Development Architectural ¾ ¾ Systems Architecture Managerial ¾ Organization and Process 46 Anti-Pattern examples – Golden Hammer Also know as: ¾ Old Yeller, Head-in-the sand Causes: ¾ Several successes with the particular approach Description: ¾ High knowledge on particular solution or vendor product ¾ Large investment for the product ¾ This knowledge is applied everywhere ¾ Isolated Group 47 Anti-Pattern examples – One Size Fits All Also know as: ¾ Silver Bullet Software Process Description: ¾ Software process does not fit in the business environment or lifecycle ¾ Too bureaucratic process for small project ¾ Own process created under the official one Causes: ¾ IT management wants to have same process for all projects to avoid overrunning time and budget 48 Anti-Patterns & Bad Code Smells Anti-Pattern cover wider range of topics ¾ Development ¾ ¾ Bad code smells ¾ Are in fact development level anti-patterns Software Development Architectural ¾ ¾ Systems Architecture Managerial ¾ Organization and Process 49 When to Refactor Bad smells in the code tell us when to apply refactorings Smells are structures that indicate bad software design ¾ Idea introduced by Fowler and Beck (2000) ¾ Of course, the list of bad smells can never be complete ¾ They are introduce with the set of refactoring that can remove the smells Why are they called smells? ¾ Fowler & Beck think that when it comes to refactoring decision: “no set of metrics rivals informed human intuition” 50 Bad Code Smells - Taxonomy Bloaters ¾ When something is too large ¾ ¾ These smells likely grow little bit a time ¾ Hopefully nobody designs e.g. Long Methods Object-Orientation abusers ¾ Object Oriented concept not fully understood and utilized ¾ Examples: Long Method, Large Class, Long Parameter List, Examples: Switch statements, Temporary Field,, Alternative Classes with Different Interfaces Change Preventers ¾ These smells make changing the system unnecessarily difficult ¾ ¾ Example: Shotgun surgery Violate principle one external change should effect to one class only ¾ Change the database from Oracle to SQLServer 51 Bad Code Smells - Taxonomy Cont’d Dispensables ¾ All code needs effort to understand and maintain ¾ If code is not used or redundant it needs to be removed ¾ Examples: Duplicate & Dead code, Speculative Generality Couplers ¾ Low coupling between objects/classes is one the desirable goals of OO software ¾ ¾ Examples: Feature Envy, Message Chains Too much delegation (= reduction of coupling) can be bad as well ¾ Examples: MiddleMan 52 Refactoring Example – Switch statement Class Engine{ … int cylinderCount(){ switch (type){ case FAMILY_CAR_ENGINE: return getBaseSylinderNumber(); case EXECUTIVE_CAR_ENGINE: return getBaseSylinderNumber() * 2; case RACE_CAR_ENGINE: return 8; Engine } } +cylinderCount() : int FamilyCarEngine ExecutiveCarEngine RaceCarEngine +cylinderCount() : int +cylinderCount() : int +cylinderCount() : int 53 Summary subjective design and code quality indicators Refactoring is improving the code structure with out changing it’s external behavior Bad code smells and AntiPatterns are subjective quality indicators ¾ They cannot be exactly measured and human mind provides the final judgment ¾ It is possible that people have quite different opinions on them 54 References Anon (CheckStyle) http://checkstyle.sourceforge.net/ Anon (Eiffel Software Inc.) Building bug-free O-O software: An introduction to Design by ContractTM http://archive.eiffel.com/doc/manuals/technology/contract/ Bansiya, J., David, C.G. 2002, "A Hierarchical Model for Object-Oriented Design Quality", Software Engineering, IEEE Transactions on, vol. 28, no. 1, pp. 4-17. Barnard, J., Price, A. 1994, "Managing code inspection information", Software, IEEE, vol. 11, no. 2, pp. 59-69. Brown, W. J., Malveau, R. C., McCormick, H. W., & Mowbray, T. J. 1998, AntiPatterns Refactoring Software, Architectures, and Projects in Crisis Wiley, New York.. Fowler, M. 2000, Refactoring: Improving the Design of Existing Code Addison-Wesley, Canada. Coleman, D., Ash, D., Lowther, B. & Oman, P.W. 1994, "Using Metrics to Evaluate Software System Maintainability", Computer, vol. 27, no. 8, pp. 44-49. Dunsmore, A., Roper, M. & Wood, M. 2003, "The development and evaluation of three diverse techniques for object-oriented code inspection", Software Engineering, IEEE Transactions on, vol. 29, no. 8, pp. 677-686. Dunsmore, A., Roper, M. & Wood, M. 2003, "Practical code inspection techniques for object-oriented systems: an experimental comparison", Software, IEEE, vol. 20, no. 4, pp. 21-29. Enseling Oliver, iContract: Design by Contract in Java, http://www.javaworld.com/javaworld/jw-02-2001/jw-0216-cooltools.html Hoare, C.A.R, 1969 ”An Axiomatic Basis for Computer Programming”, Communications of the ACM 12, Laitenberger,Oliver; DeBaud,Jean-Marc, An encompassing life cycle centric survey of software inspection, J.Syst.Software, 2000, 50, 1, 5-31 Li, W., Henry, S.M. 1993, "Object-Oriented Metrics that Predict Maintainability", Journal of Systems and Software, vol. 23, no. 2, pp. 111-122. Linger,Richard C. 1993, “Cleanroom software engineering for zero-defect software”, 2-13, IEEE Computer Society Press M. Lowry, M. Boyd, and D. Kulkarn Towards a Theory for Integration of Mathematical Verification and Empirical Testing p. 322, 13th IEEE International Conference on Automated Software Engineering (ASE'98), 1998. Martin, John C. ”Formal methods software engineering for the CARA system”, International-Journal-on-Software-Tools-for-Technology-Transfer. 2004; 5(4): 301-7 McArthy J., 1962, “Towards a mathematical science of computation”, In Proceedings of IFIP 62, Munich, 21,8 McConnell, S. 1993. Code complete. Redmond, Washington, USA: Microsoft Press. Porter, A.A., Siy, H.P., Toman, C.A. & Votta, L.G. 1997, "An experiment to assess the cost-benefits of code inspections in large scale software development", Software Engineering, IEEE Transactions on, vol. 23, no. 6, pp. 329-346. Pincus Jon 2002 “PREfix, PREfast, and other tools and technologies“ presentation at ICSM 2002 Montreal Roush, W. 2003, "Writing Software Right", MIT Technology Review, vol. 106, no. 3, pp. 26-28 Shull, F., et al. 2002. What we have learned about fighting defects. , 249-258., Software Metrics, 2002.Proceedings.Eighth IEEE Symposium on SEI, ClearnRoom, http://www.sei.cmu.edu/str/descriptions/cleanroom_body.html Sethi, R. 1996, Programming Languages - Concepts and Constructs, Addison & Wesley. Sommerville, I. 2001, Software Engineering, Addison-Wesley / Pearson, Reading, MA, USA. 55