Fault-Based Testing

advertisement
Announcements
• Homework #6 due this Friday 11:59pm
• Please fill out the course evaluations!
• Final project deliverables and peer reviews due next
Tues Dec 16 11:59pm
• Please submit final iteration report as well as
code, user guide, and technical doc
• Regular office hours this week except for Weds
Looking for Extra Credit?
• Course Project Fair! Thursday 12-2pm
• An opportunity for students in “project courses” to
show off their work!
• No need to create a poster or anything, just show up
with your laptop or device
• Please let me know if you’d like to participate
Previously: Security
• Definitions of security, threat, vulnerability
• Rules of thumb
• Protecting against DoS attacks
• Cleaning data: validation, sanitization, normalization,
canonicalization
• Protecting against internal attacks: final, clone
Today
• A little bit more about security
• Final grading stuff
• Review for final exam
is software
getting better?
does any of this
even matter?
ACM SIGSOFT
Impact Project
www.sigsoft.org/impact
Impact of Software Engineering Research
• Middleware
• Effect on programming languages
• Project management methods
• Design methods, models, and notations
• Runtime assertion checking
• Software configuration management
• Walkthroughs and inspections
Observations
• “Impact” can be measured in a variety of ways
– Qualitative more than quantitative
– Difficult to know how ideas get transferred
• SE research does affect practice
– But can take 10-20 years
– Is most effective when research and industry work
together closely
Final grading
Final Course Grading
 Midterm exam (15%)
 Final exam (25%)
 Homework assignments (35%)
 Project (25%)
 Current weighted average: 87.6%
 Last year 89.0% at this time, finished at 92.0%
97+
A+
87-90
B+
93-97
A
83-87
B
90-93
A-
80-83
B-
Final Course Grading
• Homework #5 should be graded by this weekend
• Homework #6 should be graded by Dec 16
• Final exam should be graded by Dec 16
• You will be able to see your graded exam in my
office on Dec 17
• Projects should be graded by Dec 18
• Course grades should be posted by Dec 19
Final Exam Overview
• Monday, December 15, 3-5pm, DRL A1
• Closed-book, closed-notes, etc.
• No electronic devices!
• You may bring one sheet of 8.5”x11” or A4
• The exam is comprehensive
• About 75-80% from the second half of the course
• Practice questions and solutions available in Canvas
Final Exam Topics
• Everything covered in lectures except:
• Android specifics
• “So you think you know Java?”
• Guest speakers
• Everything covered in assigned readings since
midterm exam
Reading Assignments (since midterm)
• Fault-based testing: Andrews et al., “Is mutation an appropriate tool for testing
experiments?
• Property-based testing: Clarke and Rosenblum, “A historical perspective on
runtime assertion checking in software development”
• Debugging: McConnell, Code Complete, ch. 23
• Regression Testing: Elbaum et al., “Prioritizing test cases for regression testing”
• Reliability: Lyu, Handbook of Software Reliability Engineering, ch. 1
• Fault-tolerant computing: Xie et al., “A survey of software fault tolerance
techniques”
• Efficiency: McConnell, Code Complete, ch. 25-26
• Usability: Noyes, “The Human Factors Toolkit”
• Security: Secure Coding Guidelines for Java (online)
• Security Testing: Thompson, “Why security testing is hard”
Emphasis on Second-Half Topics
• More testing
• Fault-based testing and mutation analysis
• Property-based testing
• Verification and model checking
• Fault localization and regression testing
• Integration testing and mock objects
• External quality
• Reliability and fault-tolerant computing
• Efficiency
• Usability
• Security
Part 1:
Internal Quality
and Refactoring
Software Quality: ISO 9126
• External Quality
• Functionality, Reliability, Efficiency, Usability,
Portability
• Internal Quality (Maintainability)
• Stability, Changeability, Testability, Analyzability
(Readability, Understandability)
• Internal quality affects external quality!
Analyzability
• Readability: how easy is it to identify and recognize
tokens and know their syntactic meaning?
• Java coding conventions
• Understandability: how easy is it to know the
semantic meaning of a piece of code?
• Chapin data flow
• McCabe cyclomatic complexity
Refactoring
• What is refactoring?
• Why should you refactor?
• When should you refactor?
• How should you refactor?
• Possible problems caused by refactoring
Code Smells
• Duplicate code
• Long method
• Large class
• Primitive obsession
• Message chain
Refactoring Patterns
• Extract Method
• Pull Up Method
• if duplicate code is found in two separate classes
• Extract Class, Extract Superclass
• composition vs. inheritance
• Hide Delegate
• to break a message chain
Intra-Component Code Complexity
• For a method
• Chapin data flow
• McCabe Cyclomatic Complexity
• For a class
• Lack of Cohesion of Methods
Inter-Component Code Complexity
• Henry-Kafura structural complexity (fan-in, fan-out)
• Object-oriented complexity metrics
–
Depth of Inheritance Tree
–
Number of Children
–
Instability
Part 2:
Testing
Software Testing Basics
• What is the definition of “software testing”?
• Executing a program in an attempt to reveal defects
• Failure: when there is a difference between the
actual output and the expected output (as reported
by the test oracle)
• Error: deviation in internal state (from correct state)
that led to failure
• Fault: static defect in code that led to error
Test Oracles
• False positive: thinking there’s a bug when really
there isn’t
• False negative: thinking there’s no bug when really
there is
• Accuracy: percent of “true” results
• Precision: percent of “true” positive results
Test Case Generation
• Exhaustive testing: all possible inputs
• Generally not feasible
• Random testing: choose inputs randomly
• Easy to automate
• No indication of progress (how do you know you’re done?)
• Hard to know expected outputs
• Specification-based: based on specification
(representative inputs, inputs likely to lead to failure)
• Code-based: execute as much code as possible
• Fault-based: show that program does not exhibit
certain types of faults
Testing Coverage
• Amount of testing to be done is stated in terms
of measurable criteria
• A test set (collection of individual test cases)
covers all or some part of the criteria
• The percentage of criteria that are covered is
the coverage level
• Testing continues until an adequate level is
achieved
Black-Box Testing
• Criteria: how much of the specification is
covered
• Assumption #1: if a failure is revealed for a
given value of input variable v, then it is likely to
be revealed for similar values of v
• As a result of this assumption, we can split up the
specification space into equivalence classes
• Assumption #2: if a failure is revealed for a
given value of input variable v, then it is likely to
be revealed regardless of the value of other
variables (single fault assumption)
Black-Box Coverage Criteria
• Weak: what percentage of the separate
equivalence classes are covered?
• Pairwise: what percentage of the pairs of
equivalence classes are covered?
• Strong: what percentage of the combinations
of equivalence classes are covered?
• Normal: only considers nominal values explicitly
stated in specification
• Robust: also considers values not mentioned in
spec but that may lead to failure
White-Box Testing
• “Treat the code as a graph and try to cover it”
• How do you know that this code works if you
haven’t tested it?
• Coverage metrics: Statement, Branch, Path
• Path coverage subsumes statement and branch
coverage
• If you’ve covered 100% of the paths, you’ve necessarily
covered 100% of the statements and branches
Symbolic Execution
• Replace all variables with symbols
• For each path, determine its path condition as an
expression in terms of the input
• Any input that satisfies that path condition will cause
that path to be covered
• If the path condition is not satisfiable, the path is
infeasible
• Can also be used to state the output in terms of the
input
White-Box Adequacy Criteria
• Structural Coverage Criteria
• Path/Statement/Branch coverage
• Edge-Pair coverage
• Prime Path coverage
• Data Flow Coverage Criteria
• Def-Use, All-Uses
• These criteria can be used to generate a test set or to
measure a test set (regardless of how it was generated)
Part 3:
More Testing!
Fault-Based Testing
• We can’t show that the program is free of all faults
but we can try to show that it is free of certain faults
• Given a program P, we create a faulty version P’
• If a test input produces the correct output for P and
an incorrect output for P’ then it shows that P != P’
and thus does not contain that fault
Assumptions
• Competent Programmer Hypothesis: the program
we’re testing is nearly correct and any bugs are likely
to be small variations from the correct program
• Coupling Effect Hypothesis: a test case that is
able to find small/subtle faults is likely to find more
complex/egregious ones
Mutation Analysis
• Systematically insert faults into the code by making a
single, localized change (“mutation”)
• For each fault, if a test case passes in the original
version and fails in the mutated version, we say that
the mutant is killed
• Mutation Analysis: given an existing test set,
determine the percentage of mutants that it kills
• Mutation Testing: given a set of mutations, derive a
test set that kills all of them
Mutation Testing
• To identify a test case that will kill a given mutant,
represent P and P’ as expressions written as
implications in terms of the input and output
• Then find inputs that satisfy the conjunction of those
expressions such that the outputs of P and P’ are
different
• A mutant may survive if it is not covered or if it is an
equivalent mutant
Property-based Testing
• Sufficient: if always satisfied, code is correct
• Necessary: if ever violated, code is incorrect
• Runtime assertion checking
•
Using Java “assert” keyword
•
Throw AssertionError if violated
•
Assertions need to be enabled
Model Checking
• Testing tries to show the existence of bugs
• Verification tries to prove the absence of bugs
• Model checking: show correctness by proving that
the software conforms to a “model of correctness”
•
Use proof-by-contradiction to demonstrate that a
property cannot be violated
• Practical limits: path explosion, property soundness
and completeness, path condition satisfiability
Debugging basics
• Find it, fix it, make sure you didn't break anything
• Levels of fault localization
•
Deduction: look at code and reason about it
•
Observation: observe single execution of code with
respect to invariants
•
Induction: observe multiple executions of code (with
passing and failing tests) in order to determine likely
cause of failure
•
Experimentation: systematically modify inputs/code
to prove/disprove hypothesis
Program Slicing & Dicing
• Create dynamic slices by looking at the paths
covered by each test case
• Create program dices by looking for differences
between the dynamic slices
• If you have more than two, look for statistical
correlation between failing test cases and certain
statements/paths
Regression testing
• Test case selection
•
Choose only those tests that cover “dangerous
entities”
•
There may be dangerous entities even if the code
hasn't changed
• Test case prioritization
•
Coverage (highest first, “additional”)
•
Fault exposing potential: based on mutation analysis
•
Fault index: based on code churn
Integration Testing
• Top-down integration
• Test stub: simple implementation of called method
that provides just enough functionality to test caller
• Bottom-up integration
• Test driver: simple implementation of caller method
that allows for testing of integration with called
method
Mock Objects
• Object used to “mock” (or substitute for) some
dependency only for purposes of testing
• For instance if dependency is slow, non-deterministic,
hard to control, etc.
• Use dependency injection to create an anonymous
class within the unit test that mocks the behavior of
the dependency for this test only
Part 4:
External
Quality
Reliability
• “The probability of failure-free operation”
• Can be expressed as a likelihood/probability
• Can also be expressed as MTBF (or MTTF)
Fault-Tolerant Software
• Single-Version
•
Forward recovery: exception handling
•
Backward recovery: rollback
• Multi-Version (“design diversity”)
•
Recovery Block
•
N-version programming
• Multi-Data (“data diversity”)
•
Retry Block
Efficiency
• Tradeoffs
• Use the right data structure or algorithm
• Measure, don’t guess
• Avoid unnecessary work
• Lazy evaluation, short-circuit operations
• Avoid unnecessary memory allocation
• Compiler optimizations
Usability
• How is it defined? Why is it important?
• Therac-25 case study
• User-Centered Design
• Task analysis
• Information Visualization
• Evaluating Usability
• Heuristics
• User Studies
• Metrics
Secure Java Programming
• Freeing resources
• Bounds checking
• Data cleaning: validation, sanitization,
canonicalization, normalization
• Visibility and mutability
• Changing private fields
• SecurityManager
Final Exam Overview
• Monday, December 15, 3-5pm, DRL A1
• Closed-book, closed-notes, etc.
• No electronic devices!
• You may bring one sheet of 8.5”x11” or A4
• The exam is comprehensive:
• all lecture topics
• reading asignments since the midterm
• Practice questions and solutions available in Canvas
Lessons
Learned
#1
Software Quality
is
Quantifiable
#2
There are
Tradeoffs in
Software Quality
#3
Internal Quality
affects
External Quality
Anything else?
The end.
(thanks!)
Download