Testing Metrics and Measurement (not in textbook)

advertisement
Why Metrics in Software Testing?
• How would you answer questions such as:
– Project oriented questions
• How long would it take to test?
• How much will it cost to test?
– Product oriented questions
• How bad/good is the product ?
• How many problems still remain in the software?
– Test activities oriented questions
• Will testing be completed on time?
• Was the testing effective?
• How much effort went into testing
• All these Questions require some type of
measurements and record keeping in order to
answer properly.
Some Basic Concepts on Measurement
•
What do we need before we can measure
something?
–
Clear understanding and definition of the
attribute/characteristic that we are trying to gauge
–
The metric that may be used to gauge that
attribute
–
The methodology for performing the
measurement. (often forgotten once we get the
first two done ---- including yours truly.)
1.
Clarifying & Defining the Attribute to be Measured
• Characterizing the attribute of interest
– Size Attribute:
• Physical height is a size sub-attribute of many items.
– Height of a building, person, tree - - - not a problem
– Height of a ball or ocean ? - - - not comfortable? Why?
• Physical weight is a size sub-attribute of many items
• What is the size attribute for software? What does it
address?
– The source statements - - - with screens? with db tables?
– The storage space that the object code occupies in memory ?
– Quality Attribute:
• For a car ? - - - how fast it can accelerate? Number of times
the car stalled? Number times the lights don’t work?
• For software? - - - how many times we need to “re-boot”?,
how good does the screen look? How many times do we
need to call help-line? Or (# of times not Meeting customer
requirements)
2. Metric for Gauging the Attribute
• Metric – a unit used for describing or for measuring
an attribute
– Inches is a metric used for measuring the length attribute
(simple metric)
– Miles per hour is a metric for measuring the speed attribute
(complex metric – requires 2 metrics)
– Lines of code is a metric for measuring the size attribute of
software (not a very good one)
– Problems found per thousand lines of source code is a
metric for defect discovery rate attribute of software. (or is
this for software quality attribute?
3. Conducting the Measurement
• Once the attribute is defined and the associated
metric is defined, the actual methodology to determine
the extent of an attribute using that metric has to be
spelled out.
– How do you measure the length of a person using inches?
– How do you measure the distance from earth to the moon
using inches?
– How do you measure the size of the computer program using
bytes?
– How do you measure the defects in a program using problems
found during program testing? ( note: problems found may be
counted in many ways - - - unique ones, accepted ones, etc.)
Some General, Test Measurements
• Time is used to measure the length of period
expended for testing
– Time to setup and conduct (run) a test or a set of tests
• Units of measurement in minutes or hours
– Time to design and document test cases
• Units of measurement in minutes or hours
• Keeping track of time gives us one parameter to help
us plan for future testing; but time must be balanced
with the “size” of the test.
– 2 seconds to run a simple query
– 5 seconds to run a complete purchase transaction with
confirmation
• “Size” of test is needed to make “time of test” more
meaningful or conversely can amount of “test time”
be used as a metric for size of test attribute?
Size of Test
• Test size attribute may use different metrics:
– Amount of time to run test: (bit convoluted ?)
• Small size : less than or equal to 3 seconds
• Medium size: between 3 seconds and 1 minute
• Large size: 1 minute or above
– Number of lines of statements to document the
test case:
• Small size: less than or equal to 3 statements
• Medium size: between 4 and 7 statements
• Large size: 8 or more statements
Any suggestions - - - - ? Number of test cases? --- or
--- type of test such as unit test versus integration test ?----
Quality : # of Problems
• The attribute , Quality, is often measured with
the metric of number of problems found; but
number of problems alone does not tell the
whole story - - - consider
– Severity of problems
• High
• Medium
• low
– Type of problems
•
•
•
•
UI
Database
Network outage
Etc.
Quality (cont.)
• Both Severity and Type are important
–
–
–
–
–
# of problems found by severity
# of problems found by type
# of problems found when (when during development)
# of problems found when (months after release)
# of problems found where (UI,DB, Logic, Network, etc.)
• Quality Information is relevant to both:
– Software providers
– Customers/users
Why important to users? What would they do with it?
Problem Find Rate
Problem
Find Rate
# of Problems
Found per hour
The Weibull probability density curve:
f(t) = (m/t) (t/2)m e –z where z = (t/c)m
- for m= 1, the curve looks as dotted line
- for m = 2, the curve looks as solid line
and is called Rayleigh
Time
Day
1
Day Day Day
2
3
4
Day
5
Does severity of problem matter here?
(it should , but not considered here)
Problem Fix Rate
Problem
Fix Rate
Problem Find Rate
During Functional Test
# of Problems
Fixed per hour
Problem Fix Rate
During Functional Test
Time
Day
1
Day Day Day
2
3
4
Day
5
Would this fix rate present a problem ?
Would you also want to keep a backlog # by day ?
Problem Density
Density
Note: Just the # of problems
found by area does not
normalize the measurement;
we need the per KLOC.
6
5
# of
problems
found per
KLOC
4
3
2
1
Area
Module 1
Module 2 Module 3 Module 4
Test Coverage Rate
• Not all the planned test cases are actually run.
– # of test cases executed / # of test cases planned
• By functional areas
• By test phases
– # of source statements executed / total # of source statements
• By functional areas
• By modules
Test Activity Effectiveness
• Defect discovery and eradication activities
occur at all phases of development. To see
which is more effective one may use:
– # of problems found / total # of problems found
• By development phase (req. rev., design rev., func. test,
system, etc.)
– # of problems found / person-days of effort
• By test activities (e.g. boundary value testing, branch
testing, d-u testing, etc.)
Fix Effectiveness
• Not all problem fixes resolve the problems.
– # of fixes that worked / total # of fixes
• The first time
–
# of fixes that required more than 1 fix / total
number of fixes
Fix Cost
• Fix cost is usually measured by amount of
effort expended.
– # of person-hours expended / fix
• By severity
• By areas
• By phase type (including post-release)
If the fix cost for post-release is higher than that of all of the pre-release phases,
then that will be one reason for test and reviews.
Problem Cost Comparison
• Effort expended in discovering a problem and the
effort expended in fixing that problem is the “test”
cost during pre-release.
• Effort expended in fixing a problem and releasing it to
the customer is the “support” (problem resolution)
cost during post-release.
• Compare: (effort in people hours)
effort expended / problem found and “fixed” (pre-release)
.vs.
effort expended / problem “resolved” (post-release)
Post-release resolution usually cost more
How “Big” is it (testing w/o fix) ?
How would you answer this?
1.
Assume --- # of test cases planned by size (or
complexity):
•
•
•
2.
large – 35 test cases
Medium – 200 test cases
small – 40 test cases
Assume --- average effort required to design and test
•
•
•
large – 1 person hour
Medium – 15 person minutes
small – 5 minutes
Then ---- “How Big is Testing?” may be answered
3.
•
(35X60) + (200x15) + (40x5) = 5,330 person-minutes or 88.33
person-hours
So, In this case --- how big is testing?
- It is 275 test cases.
- It is 88.33 person hours of effort.
How Long Would it take?
• Use the same example of 88.33 person-hours
of test planning and execution effort.
• You need to make some assumptions:
– assume 2 testers of about equal ability
– split the work effort evenly
– 88.33people-hours/2 people = 44.17 hours
– further assume that each person works 6 hours a
day
– 44.17 hours/ 6hours-perday = 7.3 days
• So this will take 2 testers working 6 hours a
day for 7.3 days
Download