Week 3 – Complexity Metrics and Models
INFO631 Week 3 1 www.ischool.drexel.edu
• Complexity metrics were developed by computer scientists and software engineers
• Strongly based on empirical (real world) measurement, with little theory
• Primarily broken into internal and external measures
INFO631 Week 3 2 www.ischool.drexel.edu
• Internal measures describe the complexity within a module (number of decisions, loops, calculations, etc.)
• External measures describe relationships among modules (program or function calls, external file activities, input/output, etc.)
INFO631 Week 3 3 www.ischool.drexel.edu
INFO631 Week 3
4 www.ischool.drexel.edu
• Size measures
– Input to prediction models
– Normalizing factor for cost, productivity, etc.
– Progress during development
• Typically use lines of code (LOC) or function point counts;
– LOC is a better measure for predicting cost and schedule
INFO631 Week 3 5 www.ischool.drexel.edu
• Simple complexity metric, often based on number of executable statements or instruction statements
– Highest defect rates often occurs in small modules
– Larger modules have a smaller defect rate
(if they exist at all) - until too cumbersome
– Optimum module size ~ 250 lines
INFO631 Week 3 6 www.ischool.drexel.edu
• Function points help avoid biases due to the programming language(s) used
• Provide a more “fair” basis for comparing different environments
• Focuses on how much work the program accomplishes, not how concisely it is expressed
INFO631 Week 3 7 www.ischool.drexel.edu
• Also known as Software Science, 1977
• Examine program as compilable “tokens”
• Tokens are either operators (+, -) or operands
(variables)
• Derive metrics such as Vocabulary, Length, Volume,
Difficulty, etc.
• Not widely used
INFO631 Week 3 8 www.ischool.drexel.edu
• Halstead’s
2
- number of distinct operands in a module
– Operands include: number of variables, number unique constants, and number of labels
• Operand usage (OU)
– OU =
2
/N
2 where N
2 operand references is the total number of
INFO631 Week 3 9 www.ischool.drexel.edu
• Is a characteristic that influences the resources needed to build and maintain it
• Many different characteristics of software relate to complexity
• These complexity characteristics revolve around the structure of the software
INFO631 Week 3 10 www.ischool.drexel.edu
• Control flow
– Addresses sequence in which instructions are executed
– Iteration and looping
• Data flow
– Follows trail of data as it is created and handled
– Depicts behavior of data as it interacts with the program
INFO631 Week 3 11 www.ischool.drexel.edu
• Data structure
– Concerned with organization of data itself
– Provides information about difficulties in handling data and in defining test cases
INFO631 Week 3 12 www.ischool.drexel.edu
• Modeled by directed graphs (control flow graphs)
– Each node corresponds to a single program statement
– Arcs (directed edges) indicate flow of control from one statement to another
INFO631 Week 3 13 www.ischool.drexel.edu
• Control flow graphs are useful for:
– Analysis (estimating number of defects)
– Expressing complexity by a single value
– Assessing testability and test coverage
INFO631 Week 3 14 www.ischool.drexel.edu
If A then X
A t
X f
While A do X
A f t
X
X
If A then X else Y t
A f
Y
Note: t=true f=false
Repeat X until A
X f t
A
INFO631 Week 3
X
1 a
1
Case A of a
1
.
: X
1 a n
.
: X n
A a
2
...
X
2 a n
X n
15 www.ischool.drexel.edu
• McCabe, 1976
• Based on a program’s control flow chart
• Related to number of separate graphable areas, or number of linearly independent paths in the program
• Complexity MC = edges - nodes +
2*(# of unconnected paths)
INFO631 Week 3 16 www.ischool.drexel.edu
• Complexity under 10 generally desired
• Can also find M as number of binary decisions (yes/no) minus one
– Multiple choice decisions with ‘n’ choices count as (n-1) binary decisions
• Ignores differences among specific types of control structures
INFO631 Week 3 17 www.ischool.drexel.edu
• Uses of complexity metric:
– Identify complex modules needing detailed inspection or redesign
– Identify simple modules needing minimal inspection and/or testing
– Estimate programming, testing and maintenance effort
– Identify potentially troublesome code
INFO631 Week 3 18 www.ischool.drexel.edu
• Software programs can be represented by linear directed segments combined with the basic control flow constructs
• Control flow constructs may be nested, e.g. an IF statement can be inside of a
WHILE loop
INFO631 Week 3 19 www.ischool.drexel.edu
• Example:
8
1 2
3
4
5
6
13
7
10
12 11
14
9
McCabe cyclomatic complexity (MC) - counts the number of linearly independent paths through a program
MC = # of edges - # of nodes +2
Linearly independent paths for example
<2, 11>
<2, 10, 12, 14>
<2, 10, 12, 13, 12, 14>
<1, 3, 5, 6, 9>
<1, 4, 6,9>
<1, 4, 6, 7, 8, 9>
INFO631 Week 3 20 www.ischool.drexel.edu
3 b c
4
9
1
5 e g a
8
6
MC = edges - nodes + 2
= 10 - 7 + 2 = 5
2
10 d f
7
Set of linearly independent paths: b1: abcg b2: abcbcg b3: abefg b4: adefg b5: adfg
Any arbitrary path is equal to a linear combination of the linearly independent paths listed above
For example, path abcbefg is equal to: b2 + b3 - b1
INFO631 Week 3 21 www.ischool.drexel.edu
• Knot measure -- total number of points at which control flow lines cross
IF (TIME) 30,30,10
10 CALL TEMP1
IF (X1) 20,20,40
20 Y1=Y+1
Y2=0
CALL TEMP2
GO TO 50
30 Z1=1
40 CALL TEMP3
Z2=Z2+1
50 CALL TEMP4
How many are here?
INFO631 Week 3 22 www.ischool.drexel.edu
• Examine effect of using specific control structures on defect rate
• Is, by definition, language-specific
• Can result in statistically significant relationships
– e.g. Lo used to show that DO WHILE should be avoided in COBOL
INFO631 Week 3 23 www.ischool.drexel.edu
INFO631 Week 3 24 www.ischool.drexel.edu
• Examines algorithmic efficiency and use of machine resources (memory, I/O, storage)
• Studies quantitative aspects of solutions to computational problems
• Examples may include sorting efficiency for a database, managing I/O constraints across a large scale network, etc.
INFO631 Week 3 25 www.ischool.drexel.edu
• Concerned with characteristics of software that affect human performance
- Injection of defects (when and why does a programmer make errors?)
- Ease of building the software (effort required)
- Ease of maintenance (effort required)
INFO631 Week 3 26 www.ischool.drexel.edu
• Database size per program size
(DBSPPS)
– DBSPPS = DBS/PS
• Where DBS is database size in bytes or characters
• PS is program size in source instructions
– Used in COCOMO model as a cost driver
• Ordinal scale measure derived from DBSPPS
INFO631 Week 3 27 www.ischool.drexel.edu
• Focus is the interaction among code modules
– Fan-in = # of modules which call a given module
– Fan-out = # of modules which are called by a given module
• Or, more formally...
INFO631 Week 3 28 www.ischool.drexel.edu
• Fan-in of a module is the number of local flows terminating at the module, plus the number of data structures from which info is retrieved by the module
• Fan-out of a module is the number of local flows that emanate from the module, plus the number of data structures (tables, arrays) that are updated by the module
INFO631 Week 3 29 www.ischool.drexel.edu
• Do fan-in and fan-out affect software quality?
– Large fan-in modules may be interpolation or look-up routines no defect correlation
– Large fan-out often relates to high defect rate
- has a high defect correlation
• Is large fan-in and fan-out bad?
INFO631 Week 3 30 www.ischool.drexel.edu
• Information flow complexity
– Henry and Kafura: Size*(fan-in * fan-out) 2
– Shepperd: (fan-in * fan-out) 2
• Henry and Kafura measure helps predict the number of software maintenance problems
Henry, S. and D. Kafura, IEEE Transactions on Software Engineering, 1981. SE-7(5): p. 510-518
Shepperd, M. 1990. Software Engineering Journal 5, 1 (January), pp. 3-10.
INFO631 Week 3 31 www.ischool.drexel.edu
• Shepperd measure correlates with software development time
• Information flow metric (Henry & Selig)
HC = C * (fan-in * fan-out)^2
– where C is the cyclometric complexity
INFO631 Week 3 32 www.ischool.drexel.edu
• System complexity (Card & Glass)
– Based on structural complexity (average fanout squared) and data complexity (based on number of I/O variables and fan-out)
– Quantified effect of complexity on error rate
INFO631 Week 3 33 www.ischool.drexel.edu
• Module - a contiguous sequence of program statements, bounded by boundary elements, having an aggregate identifier
– Or, a distinct, named group of LOC
• The module call graph shows which modules call each other, and what key information is passed among them
INFO631 Week 3 34 www.ischool.drexel.edu
scores
Read_Scores eof
Main scores scores
Average average
Find_Ave average
Print_Ave
INFO631 Week 3 35 www.ischool.drexel.edu
• Average number of calls per module
(ANCPM)
• Fraction of modules that make calls
(FMC)
ANCPM =
Number of Interconnections
Number of Modules
FMC =
Number of Modules that call
Number of Modules
INFO631 Week 3 36 www.ischool.drexel.edu
• Types of information flows
– Local direct flow
• Module invokes a 2nd module & passes info to it
• Invoked module returns result to the caller
– Local indirect flow
• Invoked module returns info that is subsequently passed to a second invoked module
– Global flow
• Info flows from one module to another via a global data structure
INFO631 Week 3 37 www.ischool.drexel.edu
• Number of Entries and Exits per Module,
‘m’
– Like fan-in and fan-out m = entries + exits
• Software Science measures
INFO631 Week 3 38 www.ischool.drexel.edu
• Graph-Theoretic Complexity
– Static Complexity
C = Edges - Nodes + 1
– Generalized Static Complexity
Based on summing resources needed for each module (e.g. storage, access time, etc.)
– Dynamic complexity
Complexity as it changes over time across a network
INFO631 Week 3 39 www.ischool.drexel.edu
• Cyclomatic complexity
• Minimal Unit Test Case Determination
– Determine number of independent paths through a module, to get minimum number of test cases for unit testing
• Data or information flow complexity
– Fan-in and fan-out of variables
INFO631 Week 3 40 www.ischool.drexel.edu
• Design Structure adds weighted (%) average of six parameters:
1. Whether designed top down (Y/N)
2. Module inter-dependence
3. Module dependence on prior processing
4. Database size (# of elements)
5. Database compartmentalization
6. Module single entrance and exit (Y/N)
– Weighting is chosen to meet project needs
INFO631 Week 3 41 www.ischool.drexel.edu
• Compiler measures
– Size (bytes of compiled code)
– Number of symbols and variables
– Cross-reference of all labels
– Statement count
INFO631 Week 3 42 www.ischool.drexel.edu
• Configuration Management Library
Measures
– Number of code modules
– Number of versions of each module
– History of change dates of each module
– Module size
– Number of related documents for each module
INFO631 Week 3 43 www.ischool.drexel.edu
• Most information systems are critical to day-to-day operations
– Witness Google or Blackberry being offline for mere minutes is news
• Availability depends on 1) how often the system goes down, and 2) how long it takes to restore it after a crash
INFO631 Week 3 44 www.ischool.drexel.edu
• Perfect availability (100%) is nice to dream of, but realistically, higher reliability is more expensive
• Often measure availability by the number of 9’s in the desired level of availability
– Two nines is 99%, three nines is 99.9%, four nines is 99.99%, etc.
– How many nines can you afford?
INFO631 Week 3 45 www.ischool.drexel.edu
No. of 9’s Availability
Down time per year
2 99% 87.6 hours
3
4
5
99.9%
99.99%
99.999%
8.8 hours
53 minutes
5.3 minutes
INFO631 Week 3 46 www.ischool.drexel.edu
• Many techniques are used to help ensure that high levels of availability are possible
– Duplicate systems (clustering)
– RAID data duplication
– Duplicate power supplies
– Independent power supplies
– Uninterruptible power supplies (UPS’)
INFO631 Week 3 47 www.ischool.drexel.edu
• Capers Jones demonstrated a clear connection between code quality (defect rate) and the corresponding mean time to failure (MTTF), which is a key aspect of availability
– Consistent methods for measurement and definitions of terms are needed for further refinement
INFO631 Week 3 48 www.ischool.drexel.edu
• In order to determine availability, the actual customer-visible system outage time needs to be collected
– In order to get this data, the customer must place a very high priority on availability
– This data could be used to identify software components which most reduce availability
INFO631 Week 3 49 www.ischool.drexel.edu
• We also expect that availability for a new system should increase over the first couple years of its use
• Defect causal analysis can help reduce the root cause of defects, thereby improving availability
INFO631 Week 3 50 www.ischool.drexel.edu