Announcements • Homework #6 due this Friday 11:59pm • Please fill out the course evaluations! • Final project deliverables and peer reviews due next Tues Dec 16 11:59pm • Please submit final iteration report as well as code, user guide, and technical doc • Regular office hours this week except for Weds Looking for Extra Credit? • Course Project Fair! Thursday 12-2pm • An opportunity for students in “project courses” to show off their work! • No need to create a poster or anything, just show up with your laptop or device • Please let me know if you’d like to participate Previously: Security • Definitions of security, threat, vulnerability • Rules of thumb • Protecting against DoS attacks • Cleaning data: validation, sanitization, normalization, canonicalization • Protecting against internal attacks: final, clone Today • A little bit more about security • Final grading stuff • Review for final exam is software getting better? does any of this even matter? ACM SIGSOFT Impact Project www.sigsoft.org/impact Impact of Software Engineering Research • Middleware • Effect on programming languages • Project management methods • Design methods, models, and notations • Runtime assertion checking • Software configuration management • Walkthroughs and inspections Observations • “Impact” can be measured in a variety of ways – Qualitative more than quantitative – Difficult to know how ideas get transferred • SE research does affect practice – But can take 10-20 years – Is most effective when research and industry work together closely Final grading Final Course Grading Midterm exam (15%) Final exam (25%) Homework assignments (35%) Project (25%) Current weighted average: 87.6% Last year 89.0% at this time, finished at 92.0% 97+ A+ 87-90 B+ 93-97 A 83-87 B 90-93 A- 80-83 B- Final Course Grading • Homework #5 should be graded by this weekend • Homework #6 should be graded by Dec 16 • Final exam should be graded by Dec 16 • You will be able to see your graded exam in my office on Dec 17 • Projects should be graded by Dec 18 • Course grades should be posted by Dec 19 Final Exam Overview • Monday, December 15, 3-5pm, DRL A1 • Closed-book, closed-notes, etc. • No electronic devices! • You may bring one sheet of 8.5”x11” or A4 • The exam is comprehensive • About 75-80% from the second half of the course • Practice questions and solutions available in Canvas Final Exam Topics • Everything covered in lectures except: • Android specifics • “So you think you know Java?” • Guest speakers • Everything covered in assigned readings since midterm exam Reading Assignments (since midterm) • Fault-based testing: Andrews et al., “Is mutation an appropriate tool for testing experiments? • Property-based testing: Clarke and Rosenblum, “A historical perspective on runtime assertion checking in software development” • Debugging: McConnell, Code Complete, ch. 23 • Regression Testing: Elbaum et al., “Prioritizing test cases for regression testing” • Reliability: Lyu, Handbook of Software Reliability Engineering, ch. 1 • Fault-tolerant computing: Xie et al., “A survey of software fault tolerance techniques” • Efficiency: McConnell, Code Complete, ch. 25-26 • Usability: Noyes, “The Human Factors Toolkit” • Security: Secure Coding Guidelines for Java (online) • Security Testing: Thompson, “Why security testing is hard” Emphasis on Second-Half Topics • More testing • Fault-based testing and mutation analysis • Property-based testing • Verification and model checking • Fault localization and regression testing • Integration testing and mock objects • External quality • Reliability and fault-tolerant computing • Efficiency • Usability • Security Part 1: Internal Quality and Refactoring Software Quality: ISO 9126 • External Quality • Functionality, Reliability, Efficiency, Usability, Portability • Internal Quality (Maintainability) • Stability, Changeability, Testability, Analyzability (Readability, Understandability) • Internal quality affects external quality! Analyzability • Readability: how easy is it to identify and recognize tokens and know their syntactic meaning? • Java coding conventions • Understandability: how easy is it to know the semantic meaning of a piece of code? • Chapin data flow • McCabe cyclomatic complexity Refactoring • What is refactoring? • Why should you refactor? • When should you refactor? • How should you refactor? • Possible problems caused by refactoring Code Smells • Duplicate code • Long method • Large class • Primitive obsession • Message chain Refactoring Patterns • Extract Method • Pull Up Method • if duplicate code is found in two separate classes • Extract Class, Extract Superclass • composition vs. inheritance • Hide Delegate • to break a message chain Intra-Component Code Complexity • For a method • Chapin data flow • McCabe Cyclomatic Complexity • For a class • Lack of Cohesion of Methods Inter-Component Code Complexity • Henry-Kafura structural complexity (fan-in, fan-out) • Object-oriented complexity metrics – Depth of Inheritance Tree – Number of Children – Instability Part 2: Testing Software Testing Basics • What is the definition of “software testing”? • Executing a program in an attempt to reveal defects • Failure: when there is a difference between the actual output and the expected output (as reported by the test oracle) • Error: deviation in internal state (from correct state) that led to failure • Fault: static defect in code that led to error Test Oracles • False positive: thinking there’s a bug when really there isn’t • False negative: thinking there’s no bug when really there is • Accuracy: percent of “true” results • Precision: percent of “true” positive results Test Case Generation • Exhaustive testing: all possible inputs • Generally not feasible • Random testing: choose inputs randomly • Easy to automate • No indication of progress (how do you know you’re done?) • Hard to know expected outputs • Specification-based: based on specification (representative inputs, inputs likely to lead to failure) • Code-based: execute as much code as possible • Fault-based: show that program does not exhibit certain types of faults Testing Coverage • Amount of testing to be done is stated in terms of measurable criteria • A test set (collection of individual test cases) covers all or some part of the criteria • The percentage of criteria that are covered is the coverage level • Testing continues until an adequate level is achieved Black-Box Testing • Criteria: how much of the specification is covered • Assumption #1: if a failure is revealed for a given value of input variable v, then it is likely to be revealed for similar values of v • As a result of this assumption, we can split up the specification space into equivalence classes • Assumption #2: if a failure is revealed for a given value of input variable v, then it is likely to be revealed regardless of the value of other variables (single fault assumption) Black-Box Coverage Criteria • Weak: what percentage of the separate equivalence classes are covered? • Pairwise: what percentage of the pairs of equivalence classes are covered? • Strong: what percentage of the combinations of equivalence classes are covered? • Normal: only considers nominal values explicitly stated in specification • Robust: also considers values not mentioned in spec but that may lead to failure White-Box Testing • “Treat the code as a graph and try to cover it” • How do you know that this code works if you haven’t tested it? • Coverage metrics: Statement, Branch, Path • Path coverage subsumes statement and branch coverage • If you’ve covered 100% of the paths, you’ve necessarily covered 100% of the statements and branches Symbolic Execution • Replace all variables with symbols • For each path, determine its path condition as an expression in terms of the input • Any input that satisfies that path condition will cause that path to be covered • If the path condition is not satisfiable, the path is infeasible • Can also be used to state the output in terms of the input White-Box Adequacy Criteria • Structural Coverage Criteria • Path/Statement/Branch coverage • Edge-Pair coverage • Prime Path coverage • Data Flow Coverage Criteria • Def-Use, All-Uses • These criteria can be used to generate a test set or to measure a test set (regardless of how it was generated) Part 3: More Testing! Fault-Based Testing • We can’t show that the program is free of all faults but we can try to show that it is free of certain faults • Given a program P, we create a faulty version P’ • If a test input produces the correct output for P and an incorrect output for P’ then it shows that P != P’ and thus does not contain that fault Assumptions • Competent Programmer Hypothesis: the program we’re testing is nearly correct and any bugs are likely to be small variations from the correct program • Coupling Effect Hypothesis: a test case that is able to find small/subtle faults is likely to find more complex/egregious ones Mutation Analysis • Systematically insert faults into the code by making a single, localized change (“mutation”) • For each fault, if a test case passes in the original version and fails in the mutated version, we say that the mutant is killed • Mutation Analysis: given an existing test set, determine the percentage of mutants that it kills • Mutation Testing: given a set of mutations, derive a test set that kills all of them Mutation Testing • To identify a test case that will kill a given mutant, represent P and P’ as expressions written as implications in terms of the input and output • Then find inputs that satisfy the conjunction of those expressions such that the outputs of P and P’ are different • A mutant may survive if it is not covered or if it is an equivalent mutant Property-based Testing • Sufficient: if always satisfied, code is correct • Necessary: if ever violated, code is incorrect • Runtime assertion checking • Using Java “assert” keyword • Throw AssertionError if violated • Assertions need to be enabled Model Checking • Testing tries to show the existence of bugs • Verification tries to prove the absence of bugs • Model checking: show correctness by proving that the software conforms to a “model of correctness” • Use proof-by-contradiction to demonstrate that a property cannot be violated • Practical limits: path explosion, property soundness and completeness, path condition satisfiability Debugging basics • Find it, fix it, make sure you didn't break anything • Levels of fault localization • Deduction: look at code and reason about it • Observation: observe single execution of code with respect to invariants • Induction: observe multiple executions of code (with passing and failing tests) in order to determine likely cause of failure • Experimentation: systematically modify inputs/code to prove/disprove hypothesis Program Slicing & Dicing • Create dynamic slices by looking at the paths covered by each test case • Create program dices by looking for differences between the dynamic slices • If you have more than two, look for statistical correlation between failing test cases and certain statements/paths Regression testing • Test case selection • Choose only those tests that cover “dangerous entities” • There may be dangerous entities even if the code hasn't changed • Test case prioritization • Coverage (highest first, “additional”) • Fault exposing potential: based on mutation analysis • Fault index: based on code churn Integration Testing • Top-down integration • Test stub: simple implementation of called method that provides just enough functionality to test caller • Bottom-up integration • Test driver: simple implementation of caller method that allows for testing of integration with called method Mock Objects • Object used to “mock” (or substitute for) some dependency only for purposes of testing • For instance if dependency is slow, non-deterministic, hard to control, etc. • Use dependency injection to create an anonymous class within the unit test that mocks the behavior of the dependency for this test only Part 4: External Quality Reliability • “The probability of failure-free operation” • Can be expressed as a likelihood/probability • Can also be expressed as MTBF (or MTTF) Fault-Tolerant Software • Single-Version • Forward recovery: exception handling • Backward recovery: rollback • Multi-Version (“design diversity”) • Recovery Block • N-version programming • Multi-Data (“data diversity”) • Retry Block Efficiency • Tradeoffs • Use the right data structure or algorithm • Measure, don’t guess • Avoid unnecessary work • Lazy evaluation, short-circuit operations • Avoid unnecessary memory allocation • Compiler optimizations Usability • How is it defined? Why is it important? • Therac-25 case study • User-Centered Design • Task analysis • Information Visualization • Evaluating Usability • Heuristics • User Studies • Metrics Secure Java Programming • Freeing resources • Bounds checking • Data cleaning: validation, sanitization, canonicalization, normalization • Visibility and mutability • Changing private fields • SecurityManager Final Exam Overview • Monday, December 15, 3-5pm, DRL A1 • Closed-book, closed-notes, etc. • No electronic devices! • You may bring one sheet of 8.5”x11” or A4 • The exam is comprehensive: • all lecture topics • reading asignments since the midterm • Practice questions and solutions available in Canvas Lessons Learned #1 Software Quality is Quantifiable #2 There are Tradeoffs in Software Quality #3 Internal Quality affects External Quality Anything else? The end. (thanks!)