Group R

advertisement

INFO 631

Prof. Glenn Booker

Week 3 – Complexity Metrics and Models

INFO631 Week 3 1 www.ischool.drexel.edu

Origin

• Complexity metrics were developed by computer scientists and software engineers

• Strongly based on empirical (real world) measurement, with little theory

• Primarily broken into internal and external measures

INFO631 Week 3 2 www.ischool.drexel.edu

Internal versus External

• Internal measures describe the complexity within a module (number of decisions, loops, calculations, etc.)

• External measures describe relationships among modules (program or function calls, external file activities, input/output, etc.)

INFO631 Week 3 3 www.ischool.drexel.edu

Internal Measures

INFO631 Week 3

4 www.ischool.drexel.edu

Internal Product Attributes

• Size measures

– Input to prediction models

– Normalizing factor for cost, productivity, etc.

– Progress during development

• Typically use lines of code (LOC) or function point counts;

– LOC is a better measure for predicting cost and schedule

INFO631 Week 3 5 www.ischool.drexel.edu

Lines of Code

• Simple complexity metric, often based on number of executable statements or instruction statements

– Highest defect rates often occurs in small modules

– Larger modules have a smaller defect rate

(if they exist at all) - until too cumbersome

– Optimum module size ~ 250 lines

INFO631 Week 3 6 www.ischool.drexel.edu

Function Points

• Function points help avoid biases due to the programming language(s) used

• Provide a more “fair” basis for comparing different environments

• Focuses on how much work the program accomplishes, not how concisely it is expressed

INFO631 Week 3 7 www.ischool.drexel.edu

Halstead Metrics

• Also known as Software Science, 1977

• Examine program as compilable “tokens”

• Tokens are either operators (+, -) or operands

(variables)

• Derive metrics such as Vocabulary, Length, Volume,

Difficulty, etc.

• Not widely used

INFO631 Week 3 8 www.ischool.drexel.edu

Data Structure (Halstead)

• Halstead’s 

2

- number of distinct operands in a module

– Operands include: number of variables, number unique constants, and number of labels

• Operand usage (OU)

– OU = 

2

/N

2 where N

2 operand references is the total number of

INFO631 Week 3 9 www.ischool.drexel.edu

Software Complexity

• Is a characteristic that influences the resources needed to build and maintain it

• Many different characteristics of software relate to complexity

• These complexity characteristics revolve around the structure of the software

INFO631 Week 3 10 www.ischool.drexel.edu

Types of Structural Measures

• Control flow

– Addresses sequence in which instructions are executed

– Iteration and looping

• Data flow

– Follows trail of data as it is created and handled

– Depicts behavior of data as it interacts with the program

INFO631 Week 3 11 www.ischool.drexel.edu

Types of Structural Measures

• Data structure

– Concerned with organization of data itself

– Provides information about difficulties in handling data and in defining test cases

INFO631 Week 3 12 www.ischool.drexel.edu

Control Flow

• Modeled by directed graphs (control flow graphs)

– Each node corresponds to a single program statement

– Arcs (directed edges) indicate flow of control from one statement to another

INFO631 Week 3 13 www.ischool.drexel.edu

Control Flow

• Control flow graphs are useful for:

– Analysis (estimating number of defects)

– Expressing complexity by a single value

– Assessing testability and test coverage

INFO631 Week 3 14 www.ischool.drexel.edu

Basic Control Constructs

If A then X

A t

X f

While A do X

A f t

X

X

If A then X else Y t

A f

Y

Note: t=true f=false

Repeat X until A

X f t

A

INFO631 Week 3

X

1 a

1

Case A of a

1

.

: X

1 a n

.

: X n

A a

2

...

X

2 a n

X n

15 www.ischool.drexel.edu

Cyclomatic Complexity

• McCabe, 1976

• Based on a program’s control flow chart

• Related to number of separate graphable areas, or number of linearly independent paths in the program

• Complexity MC = edges - nodes +

2*(# of unconnected paths)

INFO631 Week 3 16 www.ischool.drexel.edu

Cyclomatic Complexity

• Complexity under 10 generally desired

• Can also find M as number of binary decisions (yes/no) minus one

– Multiple choice decisions with ‘n’ choices count as (n-1) binary decisions

• Ignores differences among specific types of control structures

INFO631 Week 3 17 www.ischool.drexel.edu

Cyclomatic Complexity

• Uses of complexity metric:

– Identify complex modules needing detailed inspection or redesign

– Identify simple modules needing minimal inspection and/or testing

– Estimate programming, testing and maintenance effort

– Identify potentially troublesome code

INFO631 Week 3 18 www.ischool.drexel.edu

Control Flow Representation of Programs

• Software programs can be represented by linear directed segments combined with the basic control flow constructs

• Control flow constructs may be nested, e.g. an IF statement can be inside of a

WHILE loop

INFO631 Week 3 19 www.ischool.drexel.edu

Control Flow Representation of Programs

• Example:

8

1 2

3

4

5

6

13

7

10

12 11

14

9

McCabe cyclomatic complexity (MC) - counts the number of linearly independent paths through a program

MC = # of edges - # of nodes +2

Linearly independent paths for example

<2, 11>

<2, 10, 12, 14>

<2, 10, 12, 13, 12, 14>

<1, 3, 5, 6, 9>

<1, 4, 6,9>

<1, 4, 6, 7, 8, 9>

INFO631 Week 3 20 www.ischool.drexel.edu

3 b c

4

9

Control Flow--Linearly

Independent Paths

1

5 e g a

8

6

MC = edges - nodes + 2

= 10 - 7 + 2 = 5

2

10 d f

7

Set of linearly independent paths: b1: abcg b2: abcbcg b3: abefg b4: adefg b5: adfg

Any arbitrary path is equal to a linear combination of the linearly independent paths listed above

For example, path abcbefg is equal to: b2 + b3 - b1

INFO631 Week 3 21 www.ischool.drexel.edu

Knots - Control Flow Crossovers

• Knot measure -- total number of points at which control flow lines cross

IF (TIME) 30,30,10

10 CALL TEMP1

IF (X1) 20,20,40

20 Y1=Y+1

Y2=0

CALL TEMP2

GO TO 50

30 Z1=1

40 CALL TEMP3

Z2=Z2+1

50 CALL TEMP4

How many are here?

INFO631 Week 3 22 www.ischool.drexel.edu

Syntactic Constructs

• Examine effect of using specific control structures on defect rate

• Is, by definition, language-specific

• Can result in statistically significant relationships

– e.g. Lo used to show that DO WHILE should be avoided in COBOL

INFO631 Week 3 23 www.ischool.drexel.edu

External Measures

INFO631 Week 3 24 www.ischool.drexel.edu

Computational Complexity

• Examines algorithmic efficiency and use of machine resources (memory, I/O, storage)

• Studies quantitative aspects of solutions to computational problems

• Examples may include sorting efficiency for a database, managing I/O constraints across a large scale network, etc.

INFO631 Week 3 25 www.ischool.drexel.edu

Psychological Complexity

• Concerned with characteristics of software that affect human performance

- Injection of defects (when and why does a programmer make errors?)

- Ease of building the software (effort required)

- Ease of maintenance (effort required)

INFO631 Week 3 26 www.ischool.drexel.edu

Data Structure (Database)

• Database size per program size

(DBSPPS)

– DBSPPS = DBS/PS

• Where DBS is database size in bytes or characters

• PS is program size in source instructions

– Used in COCOMO model as a cost driver

• Ordinal scale measure derived from DBSPPS

INFO631 Week 3 27 www.ischool.drexel.edu

Fan-in and Fan-out

• Focus is the interaction among code modules

– Fan-in = # of modules which call a given module

– Fan-out = # of modules which are called by a given module

• Or, more formally...

INFO631 Week 3 28 www.ischool.drexel.edu

Fan-in and Fan-out

• Fan-in of a module is the number of local flows terminating at the module, plus the number of data structures from which info is retrieved by the module

• Fan-out of a module is the number of local flows that emanate from the module, plus the number of data structures (tables, arrays) that are updated by the module

INFO631 Week 3 29 www.ischool.drexel.edu

Fan-in and Fan-out

• Do fan-in and fan-out affect software quality?

– Large fan-in modules may be interpolation or look-up routines no defect correlation

– Large fan-out often relates to high defect rate

- has a high defect correlation

• Is large fan-in and fan-out bad?

INFO631 Week 3 30 www.ischool.drexel.edu

Fan-in and Fan-out

• Information flow complexity

– Henry and Kafura: Size*(fan-in * fan-out) 2

– Shepperd: (fan-in * fan-out) 2

• Henry and Kafura measure helps predict the number of software maintenance problems

Henry, S. and D. Kafura, IEEE Transactions on Software Engineering, 1981. SE-7(5): p. 510-518

Shepperd, M. 1990. Software Engineering Journal 5, 1 (January), pp. 3-10.

INFO631 Week 3 31 www.ischool.drexel.edu

Structure Metrics

• Shepperd measure correlates with software development time

• Information flow metric (Henry & Selig)

HC = C * (fan-in * fan-out)^2

– where C is the cyclometric complexity

INFO631 Week 3 32 www.ischool.drexel.edu

Structure Metrics

• System complexity (Card & Glass)

– Based on structural complexity (average fanout squared) and data complexity (based on number of I/O variables and fan-out)

– Quantified effect of complexity on error rate

INFO631 Week 3 33 www.ischool.drexel.edu

Module Call Graph

• Module - a contiguous sequence of program statements, bounded by boundary elements, having an aggregate identifier

– Or, a distinct, named group of LOC

• The module call graph shows which modules call each other, and what key information is passed among them

INFO631 Week 3 34 www.ischool.drexel.edu

Module Call Graph example

scores

Read_Scores eof

Main scores scores

Average average

Find_Ave average

Print_Ave

INFO631 Week 3 35 www.ischool.drexel.edu

Module Coupling Measures

• Average number of calls per module

(ANCPM)

• Fraction of modules that make calls

(FMC)

ANCPM =

Number of Interconnections

Number of Modules

FMC =

Number of Modules that call

Number of Modules

INFO631 Week 3 36 www.ischool.drexel.edu

Information Flow Measures

• Types of information flows

– Local direct flow

• Module invokes a 2nd module & passes info to it

• Invoked module returns result to the caller

– Local indirect flow

• Invoked module returns info that is subsequently passed to a second invoked module

– Global flow

• Info flows from one module to another via a global data structure

INFO631 Week 3 37 www.ischool.drexel.edu

IEEE-STD-982

• Number of Entries and Exits per Module,

‘m’

– Like fan-in and fan-out m = entries + exits

• Software Science measures

INFO631 Week 3 38 www.ischool.drexel.edu

IEEE-STD-982

• Graph-Theoretic Complexity

– Static Complexity

C = Edges - Nodes + 1

– Generalized Static Complexity

Based on summing resources needed for each module (e.g. storage, access time, etc.)

– Dynamic complexity

Complexity as it changes over time across a network

INFO631 Week 3 39 www.ischool.drexel.edu

IEEE-STD-982

• Cyclomatic complexity

• Minimal Unit Test Case Determination

– Determine number of independent paths through a module, to get minimum number of test cases for unit testing

• Data or information flow complexity

– Fan-in and fan-out of variables

INFO631 Week 3 40 www.ischool.drexel.edu

IEEE-STD-982

• Design Structure adds weighted (%) average of six parameters:

1. Whether designed top down (Y/N)

2. Module inter-dependence

3. Module dependence on prior processing

4. Database size (# of elements)

5. Database compartmentalization

6. Module single entrance and exit (Y/N)

– Weighting is chosen to meet project needs

INFO631 Week 3 41 www.ischool.drexel.edu

Other Measures

• Compiler measures

– Size (bytes of compiled code)

– Number of symbols and variables

– Cross-reference of all labels

– Statement count

INFO631 Week 3 42 www.ischool.drexel.edu

Other Measures

• Configuration Management Library

Measures

– Number of code modules

– Number of versions of each module

– History of change dates of each module

– Module size

– Number of related documents for each module

INFO631 Week 3 43 www.ischool.drexel.edu

Availability Metrics

• Most information systems are critical to day-to-day operations

– Witness Google or Blackberry being offline for mere minutes is news

• Availability depends on 1) how often the system goes down, and 2) how long it takes to restore it after a crash

INFO631 Week 3 44 www.ischool.drexel.edu

Availability Metrics

• Perfect availability (100%) is nice to dream of, but realistically, higher reliability is more expensive

• Often measure availability by the number of 9’s in the desired level of availability

– Two nines is 99%, three nines is 99.9%, four nines is 99.99%, etc.

– How many nines can you afford?

INFO631 Week 3 45 www.ischool.drexel.edu

Availability Metrics

No. of 9’s Availability

Down time per year

2 99% 87.6 hours

3

4

5

99.9%

99.99%

99.999%

8.8 hours

53 minutes

5.3 minutes

INFO631 Week 3 46 www.ischool.drexel.edu

Achieving High Availability

• Many techniques are used to help ensure that high levels of availability are possible

– Duplicate systems (clustering)

– RAID data duplication

– Duplicate power supplies

– Independent power supplies

– Uninterruptible power supplies (UPS’)

INFO631 Week 3 47 www.ischool.drexel.edu

Availability and Code Quality

• Capers Jones demonstrated a clear connection between code quality (defect rate) and the corresponding mean time to failure (MTTF), which is a key aspect of availability

– Consistent methods for measurement and definitions of terms are needed for further refinement

INFO631 Week 3 48 www.ischool.drexel.edu

Customer Outage Data

• In order to determine availability, the actual customer-visible system outage time needs to be collected

– In order to get this data, the customer must place a very high priority on availability

– This data could be used to identify software components which most reduce availability

INFO631 Week 3 49 www.ischool.drexel.edu

Availability

• We also expect that availability for a new system should increase over the first couple years of its use

• Defect causal analysis can help reduce the root cause of defects, thereby improving availability

INFO631 Week 3 50 www.ischool.drexel.edu

Download