Lecture 6 - Bangladesh University of Engineering and Technology

advertisement
Institute of Information and Communication Technology (IICT)
Bangladesh University of Engineering and Technology (BUET)
Course: ICT-6010, Title: Software Engineering and Application Development
Class handouts, Lecture-6
Software metrics and complexity measures
6.0 SOFTWARE MEASUREMENT
Measurements in the physical world can be categorized in two ways: direct measures (e.g., the
length of a bolt) and indirect measures (e.g., the 'quality' of bolts produced, measured by counting
rejects).Software metrics can be categorized similarly.
Direct measures of the software engineering process include cost and effort applied. Direct
measures of the product include lines of code (LOC) produced, execution speed, memory size, and
defects reported over some set period of time. Indirect measures of the product include
functionality, quality, complexity, efficiency, reliability, maintainability, and: many other
‘attributes’.
The cost and effort required to build software, the number of lines of code produced, and other
direct measures are relatively easy to collect, as long as specific conventions for measurement are
established in advance. However, the quality and functionality of software or its efficiency or
maintainability are more difficult to assess and can be measured only indirectly.
The software metrics domain can be partitioned into process, project, and product metrics. It is also
noted that product metrics that are private to an individual are often combined to develop project
metrics that are public to a software team. Project metrics are then consolidated to create process
metrics that are public to the software organization as a whole. But how does all organization
combine metrics that come from different individuals or projects.
To illustrate, we consider a simple example. Individuals on two different project teams record and
categorize all errors that they find during the software process, Individual measures are then
combined to develop team measures. Team A found 342 errors during the software process prior to
release. Team B found 184 errors. All other things being equal, which team is more effective in
uncovering errors throughout the process? Because we do not know the size or complexity of the
projects, we cannot answer this question. However, if the measures are normalized, it is possible to
create software metrics that enable comparison to broader organizational averages.
6.1 PRODUCT AND PROCESS METRICS
Internal and external characteristics may be of two different natures: either product or process. A
product measure is a "snapshot" at a particular point in time. It can be a snapshot of the early design
or of the code today, about which we can then derive some code metrics. It cannot therefore tell us
anything about the movement from the first snapshot to the second. This occurs over a period of
time. Trying to understand the process by taking only one (or possibly at best a small number) of
instantaneous assessments is like trying to understand the life history of a frog only looking at the
state of the system each winter. Variables such as staffing levels or effort, both as a function of
time, may be used directly or may be included in prognostic models to forecast other characteristics.
Process contains a time rate of change and requires a process model to fully understand what is
happening.
1
Typically, as with quality assurance (QA) in manufacturing, instantaneous inspection can give only
a limited insight. Not until process-based management techniques, such as total quality management
(TQM), replace QA can a system be said to be fully controlled. Manufacturing insights have some
applicability to software development, although the major difference between a highly repeatable
and repeated mass manufacturing industry churning out identical widgets over many years and a
software industry developing software that is never identical, even though there may be strong
similarities between each accounting system, leads some authors to question the extent to which
manufacturing statistical control (which underlies TQM, for instance) can be of use to software
development
Other process metrics include recording fault metrics (totals and fault densities) and change metrics
(to record the number and nature of design and code changes). It is often argued that the complexity
of a piece of code may be used to give a forecast of likely maintenance costs (e.g., Harrison et al.,
1982), although such a connection is conceptually tenuous. Often extreme values of several metrics
can more reasonably be used (Kitchenham and Linkman, 1990) to draw the developer's attention to
a potential problem area.
The software project manager has a variety of uses for software metrics, both for product and
process (e.g., Weller, 1994). Responsibilities of a project manager range from applying (largely
product metrics) to monitoring, controlling, innovating, planning, and predicting (Ince, 1990).
Metrics are aimed at providing some type of quantitative information to aid in the decision-making
process (e.g., Stark and Durst, 1994). They are thus "circumstantial evidence" that could be used in
a decision support system (DSS), for example. The nature of software development precludes the
discovery of "global verities." Second, we need to note that the decision-making process is, ideally,
undertaken initially early in the project time frame. This provides a tension, since most metrics are
most easily collected toward the end of the project. This leads to the attempt either to use measures
of product (detailed design or code) as surrogates for process estimates or even estimates of these
product measures for use in early life-cycle time and cost estimates.
To estimate effort (process metric), a connection has traditionally been made to the overall "size"
(product metric) of the system being developed, by means of an organization-specific value of team
productivity. The use of size from the many product metrics available as input to the process model
equation should not suggest that size is the "best" product measure but, rather, that it is one of the
most obvious and easiest to measure, although unfortunately only available toward the end of the
life cycle. Consequently, this has feed back to a focus in product metrics on estimating or
measuring size, viewing size as in some way influenced by program modularity and structure
(Figure in the next page).
2
However, a more thorough analysis of the product metric domain suggests that "complexity" and
not size may be more relevant to modem software systems. Here complexity can be loosely defined
as that characteristic of software that requires effort (resources) to design, understanding, or code.
Such a loose definition is probably the best that can one or over, generic term complexity Fenton
stresses the uselessness of this term, which has no agreed empirical or intuitive meaning and can
therefore never be used as the basis or a measure.
6.1.1 PROCESS METRICS AND SOFTWARE PROCESS IMPROVEMENT
The only rational way to improve any process is to measure specific attributes of the process,
develop a set of meaningful metrics based on these attributes and then use the metrics to provide
indicators that will lead to a strategy for improvement. But before we discuss software metrics and
their impact on software process improvement, it is important to rote that process is only one of a
number of "controllable factors in improving software quality and organizational performance
[PAU94)."
Referring to Figure in the next page, process sits at the center or a triangle connecting three factors
that have a profound influence on software quality and organizational performance. The skill and
motivation of people has been shown (BOE81] to be the single most influential factor in quality and
performance. The complexity of the product can have a substantial impact on quality and team
performance. The technology (i.e., the software engineering methods) that populates the process
also has an impact. In addition, the process triangle exists within a circle of environmental
conditions that include the development environment (e.g., CASE, tools), business conditions (e.g.,
deadlines, business rules), and customer characteristics (e.g., ease of communication).
3
We measure the efficacy of a software process indirectly .That is, we derive a set of metrics based
on the outcomes that can be derived from the process. Outcomes include measures of errors
uncovered before release of the software defects delivered to and reported by end-users, work
products delivered (productivity), human effort expended, calendar time expanded, schedule
conformance, and other measures. We also derive process metrics by measuring the characteristics
of specific software engineering tasks, For example, we might measure the effort and time spent
performing the umbrella activities and the generic software engineering activities.
Grady (GRA921 argues that there are "private and public" uses for different types of process data.
Because it is natural that individual software engineers might be sensitive tothe use of metrics
collected on an Individual basis, these data should be private to the individual and serve as an
indicator for the individual only. Examples of private metrics include defect rates (by individual),
defect rates (by module), and errors found during development.
Some process metrics are private to the software project team but public to all team members.
Examples include defects reported for major software functions (that have been developed by a
number of practitioners), errors found during formal technical reviews, and lines of code or function
points per module and function. These data are reviewed by the team to uncover indicators that can
improve team performance.
Public metrics generally assimilate information that originally was private to individuals and teams.
Project level defect rates (absolutely not attributed 10 an individual), effort, calendar times, and
related data are collected and evaluated in an attempt to uncover indicators that can improve
organizational process performance.
Software process metrics can provide significant benefit as an organization works to improve its
overall level of process maturity. However, like a metrics, these can be misused, creating more
problems than they solve. Grady [GRA92} suggests a "software metrics etiquette" that is
appropriate for both managers and practitioners as they institute a process metrics program:
 Use common sense and organizational sensitivity when interpreting metrics data.
 Provide regular feedback to the individuals and teams who collect measures and metrics.
 Don't use metrics to appraise individuals.
4
 Work with practitioners and teams to set clear goals and metrics that will be used to achieve
them.
 Never use metrics to threaten individuals or teams.
 Metrics data that indicate a problem area should not be considered "negative." These data
are merely an indicator for process improvement.
 Don't obsess on a single metric to the exclusion of other important metrics. As an
organization becomes more comfortable with the collection and use of process metrics.
6.2 COMPLEXITY
The Macquarie Dictionary defines the common meaning of complexity as "the state or quality of
being complex”, that is, "composed of interconnected parts" or "characterized by an involved
combination of parts ...an obsessing notion." This definition is intuitively appealing. It recognizes
that complexity is a feature of an object being studied, but that the level of perceived complexity is
dependent upon the person studying it. For example, a person unfamiliar with an object may find it
far more difficult to understand than someone who has studied the object many times before.
However, the definition has two deficiencies as a technical definition upon which to base research.
The first is that it ignores the dependency of complexity on the task being performed. As Ross
Ashby (quoted in Klir. 1985) noted,
...although all would agree that the brain is complex and a bicycle simple, one has also to remember that to
a butcher the brain of a sheep is simple while a bicycle, if studied exhaustively. … .may present a very great
quantity of significant detail.
The second problem with the definition is that it does not allow complexity to be operationalized as
a measurable variable. The factors described as determining complexity are very general and
therefore lack the specificity required for measurement
It was probably with these problems in mind that Basili (1980) formulated his definition of
"software complexity" as
a measure of resources expended by a system [human or other] while interacting with a piece of software to
perform a given task.
Although this definition addresses the two deficiencies in the dictionary definition, it may be too
broad. As the dictionary definition indicates, complexity is usually regarded as arising out of some
characteristic of the code, on which the task or function is being performed and therefore does not
include characteristics of the programmer that affect resource use independently of the code. A
suitable compromise may be the definition proposed by Curtis (1979) and used throughout this
book as an operational definition. He claimed that software complexity is
a characteristic of the software interface which influences the resources another system will expend while
interacting with the software.
This definition is broad enough to include characteristics of the software that interact with
characteristics of the system carrying out the task, in such a way as to increase the level of resource
use. However, it is not so broad as to include arbitrary characteristics of the programmer that affect
his or her ability generally.
Software complexity research began in the 1970s around the same time as the push for structured
programming was beginning. Both were result of an increasingly large number of poor experiences
among practitioners with large systems, and as a consequence, the need for quality control for
software becomes very strong. This quality control took two forms. First, it involved the
development of new standard for programming, and, second, it required the development of
5
measure monitor the "complexity" of code being produced so as to provide benchmarks and to
permit poor-quality sections to be modified or rewritten. The best known of the early approaches to
complexity measurement were those focusing on structural complexity, such as McCabe (1976)
and Halstead (1977). Despite criticisms of these metrics (e.g., Coulter, 1983; Shen et al, 1983;
Evangelist, 1984; Nejmeh, 1988; Shepperd, 1988), they are probably still the best known and most
widely used-sometimes, as Shepperd (1988) notes, beyond their limits of validity.
At present there is no unified approach to software complexity research. Curtis (1979) examining
the field lamented that "Rather than unite behind a common banner, we have attacked in all
directions." He made a number of observations:
 There was an overabundance of metrics.
 Metricians defined complexity only after developing their metric.
 More time was spent developing predictors of complexity, where in fact more needed to be
spent on developing criteria, such as maintainability, understandability, and operationalizing
these criteria.
 For every positive validation there was a negative one.
Zuse (1990)) sums up the situation as one in which "software complexity measurement is confusing
and not satisfying to the user."
More recently a number of authors have been reinforcing the notion that complexity is a higherlevel concept, involving people. Zuse (1990) states that “the term software complexity is a
misnomer. The true meaning of the term software complexity is the difficulty to maintain, change
and understand software. It deals with the psychological complexity of the programs”
The definition of Sullivan (1975) that "Complexity denotes the degree of mental effort required for
comprehension."
6.2.1 CLASSIFICATION OF COMPLEXITY
When we consider the complexity an external attribute or characteristic of software, then it has
three ‘flavor’: Computational, psychological and representational. The most important of these is
psychological, which encompasses programmer characteristics, structural complexity and problem
complexity. Programmer characteristics are hard to measure objectively and little work has been
done to date on measure of problem complexity.
6
Problem complexity relates strongly to design and coding. Also functional complexity, this reflects
the difficulties implicit in the problem space and is thus sometimes referred to as ‘difficulty’. The
only measures are ordinal and subjective. It is often argued that problem complexity can not be
controlled. The design and code complexity measures do not provide an absolute rating; rather they
should be evaluated as relative to the problem complexity.
7
For example, a problem with a complexity of 1 unit, which gives rise to d design of complexity of
100 units, leads us to believe that this bad design. However, if a design of 100 units’ complexity
models a problem of complexity 100 units, then we would perceive this as a good design. In
addition, there are likely to be confounding factors derivable from the influence of programmer
characteristics. The appropriate measures are therefore the ratios:
8
The aim would be to minimize the ratios Dp Cp, and CD where both of these variables have a lower
bound of unity. While this is an important goal, operationalizing these measures awaits an objective
measure for problem complexity and agreed definitions on composite measures of complexity for
the total system at each stage of the software development life cycle.
Despite these arguments, the software engineering literature is focused primarily on developing and
validating structural complexity metrics (i.e., measures of internal characteristics
Recently, many have realized that the influence of the human programmer and analyst is of crucial
importance (Curtis, 1979; Davis, 1984; Kearney et al., 1986; Cant et al., 1992). A combination of
this, the static (product) complexity and its relation to the problem complexity contribute to the
overall psychological complexity.
Mathematically, we thus identify three components to complexity
 human cognitive factors, H;
 problem complexity, P; and
 objective measures, which includes size, S, data structure, D, logical structure, L, internal
cohesion, Ch, semantic cohesion, Cs' coupling, C. Complexity, X, is some (complicated)
function of such objective measures and taking human cognitive factors into consideration,
namely,
X =X (H, P, S, D, L, Ch, Cs, C)
Furthern1ore, the functional form of the above equation is likely to be different at different lifecycle
stages. Thus, we should write
Xi=Xi (H, P, S, D, L, Ch, Cs, C) ( i=l to j; )
Where there is j life-cycle phases identified in the chosen model.
9
6.2.2 COGNITIVE COMPLEXITY
Cant et al. (1992, 1995) have proposed a theoretically based approach to complexity metrics by
analyzing cognitive models for programmer comprehension of code and the coding process
(including debugging and maintenance). This cognitive complexity model (CCM) can be described
qualitatively in terms of a "landscape" model and is encapsulated quantitatively by a set of
equations. The underlying rationale for the CCM is the recognition that programmers, in problem
solving, use the two techniques of chunking and tracing concurrently and synergistically (e.g.,
Miller, 1956). In chunking, the software engineer divides up the code men- tally into logically
related chunks, e.g., an iterative loop or decision block; in tracing, other related chunks are located.
Having found that code, programmers will once again chunk in order to comprehend it. Conversely,
when programmers are primarily tracing, they will need to chunk in order to understand the effect
of the identified code fragments. For example, they may need to analyze an assignment statement in
order to determine its effect, or they may need to analyze a control structure to understand how it
interacts with the assignment statements contained in it, to create certain effects.
If one includes the qualification that complexity is dependent on the task being performed, then the
definition covers all the suggested ingredients of complexity. Therefore, the following definition
will be adopted.
The cognitive complexity of software refers to those characteristics of software that affect the level of
resources used by a person performing a given task on it.
We examine the complexity of software only in respect of tasks performed by programmers that
involve analyzing code. The tasks for which this is appropriate are manifold, including maintaining,
extending, and understanding code.
Characteristics relevant to cognitive complexity is shown figure below:
6.2.3 COGNITIVE ARCHITECTURE OF THE MIND
To understand the cognitive complexity of software fully, the comprehension process and the limits
on the capacity of the human mind must first be understood. The mind is usually modeled by
dividing it into activated (short-term) and long-term memory. The characteristics of short-term
memory are that distraction causes forgetting of recently learned material after about 20-30 seconds
(Tracz, 1979, Sutcliffe, 1989). Other simultaneous inputs or similar inputs impair recall; while
10
recall is improved by presenting both a word and a picture together (Sutcliffe, 1989). The limits on
the language-based subcomponent of short- term memory are a relatively fast access time, a rapid
loss of information unless refreshed (Sutcliffe, 1989), and a capacity of 7±2 chunks (Miller, 1956),
although this capacity is reduced while performing difficult tasks (Kintsch, 1977). The rate of
memory decay can be slowed down by mental rehearsal of the information, as we all know
personally. Furthermore, the effective capacity of short-term memory is expanded by chunking,
which involves abstracting recognized qualities from the information and storing the abstraction (in
terms of a single symbol) instead of the complete information. Long-term memory has "virtually
unlimited" capacity, and memory loss appears to be largely a problem of retrieval (Tracz, 1979, p.
26). However, the factors that determine recall ability appear to be similar to those for short-term
memory. Thus, pictures presented together with associated text tend to be remembered better than
text alone. Furthermore, memory recall is retarded by extraneous "noise" but is enhanced by
structure, particularly when the structure or its components already exist in long-term memory
(Tracz, 1979; Sutcliffe 1989, pp. 28-33). The structure of knowledge in long-term memory and the
process of chunking involved in comprehension (a subtask of debugging and extending) are
intimately related.
6.2.4 THE COGNITIVE PROCESSES IN COMPREHENSION
Debate continues about the exact mental processes involved in debugging, ex- tending, or simply
understanding a section of code. The generally held view is that programmers chunk (Miller, 1956;
Shneiderman and Mayer, 1979; Badre, 1982; Davis, 1984; Ehrlich and Soloway, 1984), which
involves recognizing sections of code and recording the presence of that section in short-term
memory by means of a single reference or memory symbol. Chunking involves recognizing groups
of statements (not necessarily sequential) and extracting from them information that is remembered
as a single mental abstraction. These chunks are then further chunked together into larger chunks
and so on, forming a multileveled, aggregated structure over several layers of abstraction
(somewhat akin to a layered object model). However, there is also evidence that programmers
perform a type of tracing. This involves scanning quickly through a program, either forward or
backward, in order to identify relevant chunks. (For example, Weiser, 1981, and, later, Bieman and
Ott, 1994, examined "slicing," which includes tracing backward to ascertain the determinants of a
set of variables at a point in a program.) Tracing may be required to allow knowledge to be acquired
about certain variables or functions that are scattered throughout a program. These two processes, of
chunking and tracing, both have implications for software complexity. Furthermore, both processes
are involved in the three programmer tasks of understanding, modifying, and debugging.
Shneiderman and Mayer (1979) suggest that the chunking process involves utilizing two types of
knowledge: semantic and syntactic. Semantic knowledge includes generic programming concepts
from structures as basic as loops to more complex concepts such as sorting algorithms. This
semantic knowledge is stored in a way which is relatively independent of specific programming
languages. Furthermore, the information is systematically organized in a multileveled structure so
that related concepts are aggregated into a single concept at a higher level of abstraction. Syntactic
knowledge is specific to particular languages and allows semantic structures, implemented in those
languages, to be recognized. By chunking, syntactic knowledge is used to convert the code into
semantic knowledge, which is then stored as a multileveled representation of the program in
memory. In a critique of this model, Gilmore and Green (1984) suggest that recognition of semantic
constructs may, to some extent, be dependent upon syntactic aspects of its presentation.
11
Ehrlich and Soloway (1984) propose a more comprehensive model of chunking based on "control
flow plans" and "variable plans"-a model adopted in this chapter. Control plans describe control
flow structures, while variable plans rely on a range of features common to variables with particular
roles. Examples of such roles include counter variables and variables in which user input is stored.
6.2.5 STRUCTURAL COMPLEXITY
Structural complexity is one of three major components of psychological complexity (the others
being programmer characteristics and problem complexity). Structural complexity is that portion of
overall (psychological) complexity that can be represented and thus assessed objectively. It thus
relates to the structure of the document, whether that document is the design document or the code
itself.
Structural complexity is thus measured by structural complexity metrics that offer assessment of a
variety of internal attributes (also called internal characteristics). Structural metrics can be divided
into intramodule (or just module) metrics and intermodule metrics (Kitchenham and Walker, 1986;
Kitchenham and Linkman, 1990).
Modularity metrics are focused at the individual module level (subprogram. class) and may be
measures for:
 The internals of the module (procedural complexity) such as size, data structure, or logic
structure (control flow complexity) or
 The external specification of the module (semantic complexity)-typical module cohesion as
viewed externally to the module. Semantic complexity metrics evaluate whether an
individual class really is an abstract data type in the sense of being complete and also
coherent; that is, to be semantically cohesive, a class should contain everything that one
would expect a class with those responsibilities to possess but no more.
Intermodule metrics measure the structure of the interconnections between modules which comprise
the system and thus characterize the system-level complexity. An Object Oriented (OO) system this
gives some assessment of how loosely or how strongly individual classes are coupled together in the
final system. Since such coupling is "designed into" the system, coupling metrics can be collected at
the "design" stage.
Procedural complexity metrics are perhaps best understood, although often assumed (inaccurately
as we saw earlier) to give immediate information about external characteristics-for instance, Kolewe
(1993) suggests that the inevitable consequence of unnecessary complexity is defective software.
Maintaining a clear focus on what such procedural complexity metrics really measure (basically
procedural complexity: an internal characteristic) is vital. That is not to say that they can- not be
used to forecast other characteristics (internal or external), but forecasting is not the same scientific
technique as measuring, and we must retain that difference clearly in mind as we use OO (and
other) metrics.
6.2.6 TRADITIONAL PRODUCT METRICS FOR STRUCTURAL COMPLEXITY
We stress that complexity, including structural complexity, is a concept that can be regarded as an
external characteristic. This is thus part of the empirical relation system for which we then require a
numerical relation system to form the substantial basis for a ‘metric’ or measure. Here, we shall
discuss the different aspect of structural complexity. The terms complexity (a concept) and
complexity metric (the measure) are not interchangeable, although we are sorry to say that this
equivalence is used in much of the software metrics literature.
We discuss, at a low level, the various possible measures without regard to whether they are
prognostic of any external characteristics. Card and Glass (1990) argue that the overall sys- tem
12
complexity is a combination of intramodule complexity and intermodule complexity. They suggest
that the system complexity, Cs, is given by
Cs = CI + C M
……………(i)
Where, CI is the intermodule complexity, and CM the intramodule complexity. Traditionally,
intermodule complexity is equated to coupling in the form of subroutine CALLs or message passing
and intramodule complexity to data complexity, which represents the number of I/O variables
entering or leaving a module. Card and Glass (1990) suggest that the complexity components
represented in equation (i) approximate Stevens et al.'s (1974) original concepts of "module
complexity and cohesion.”
Particularly for object-oriented systems, as can be seen in Figure below, there is a third element,
included as part of CM by Card and Glass (1990)--- semantic cohesion-which is another form of
module complexity.
13
If we specifically exclude cohesion from the measure of intramodule complexity, CM then we can
modify equation (i) explicitly to take into account semantic cohesion, Ch Noting that increasing
cohesion leads to a lower system complexity, we propose that
Cs = CI + CM + (Chmax –Ch) ………….. (ii)
Although this depends strongly on the scale used for Ch . (Chmax is the maximum cohesion possible
for the module.) Thus we discriminate between:
 The objective (countable) measures such as number of I/O variables and number of control
flow connections and
 The semantic measures needed to assess the cohesiveness of, module via the interface it
presents to the rest of the system.
Card and Glass's (1990) calculation of the metric for intramodule complexity (their data
complexity) suggests that a module with excessive I/O is one with low cohesion. While this is
grossly acceptable, it does not translate well into the object paradigm where cohesion is high, yet
interconnections, via messages and services can also be high. Indeed, it could be argued that in
object-oriented systems, there is minimal data connectivity such that this notation of data
complexity at the intermodule level is virtually redundant.
Finally, Card and Glass (1990) propose that a better comparative measure is the relative (per
module) system complexity (Cs = Cs / n, where n is the number c modules in the system). This
normalization allows us to compare systems of contrasting sizes. Since it gives a mean module
value, it offers potential applicability t object-oriented systems as an average class metric.
6.2.6 INTERMODULE METRICS (FOR SYSTEM DESIGN COMPLEXITY)
An easy measure of intermodule coupling is the fan-in/fan-out metric (Henry and Kafuca, 1981).
Fan-in (strictly "informational fan-in") refers to the number of locations from which control is
passed into the module (e.g., CALLs to the module being studied) plus the number of (global) data,
and conversely, fan-out measures the number of other modules required plus the number of data
structures that are updated by the module being studied. Since these couplings are determined in the
analysis and design phases, such intermodule metrics are generally applicable across the life cycle.
While there are many measures incorporating both fan-in and fan-out Card and Glass (1990) present
evidence that only fan-outland its distribution between modules contribute significantly to systems
design complexity. Thus, they propose
In contrast, Henry and Kafura (1981) propose a family of information flow metrics:
1. fan-in x fan-out
2. (fan-in x fan-out)2
3. Ss x (fan-in x fan-out)2
14
Where, Ss is the module size in lines of code. These metrics of Henry and Kafura's have been
widely utilized and were thoroughly tested by their developers with data from the UNIXTM
operating system.
Page-Jones (1992) recommends "eliminate any unnecessary connascence and then minimizing
connascence across encapsulation boundaries by maximizing con- nascence within encapsulation
boundaries." This is good advice for minimizing module complexity since
 it acknowledges that some complexity is necessary;
 it focuses on self-contained, semantically cohesive modules; and
 it reflects the need for low fan-out
6.3 MODULE METRICS (FOR PROCEDURAL COMPLEXITY)
Of the five types of module metrics shown in the previous figure, size and logic structure ac- count
for the majority of the literature on metrics: perhaps because of their perceived relationship to cost
estimation and to the difficulties of maintenance, respectively and perhaps to their ease of
collection. However, it should be reiterated that these metrics can, in general, be obtained only from
code and are inappropriate for the analysis and design (i.e., the specification phase).
Size metrics, notably, lines of code (LOC) or source lines of code (SLOC), and function points
(FPs), and their variants will be discussed here.
6.3.1 SIZE METRICS: LINES OF CODE
The use of source lines of code as a measure of software size is one of the oldest methods of
measuring software. It was first used to measure size of programs written in fixed-format assembly
languages. The use of SLOC as a metric began to be problematical when free-format assembly
languages were introduced (Jones, 1986) since more than one logical statement could be writ- ten
per line. The problems became greater with the advent of high-level languages that describe key
abstractions in the problem domain (Booch, 1991) removing some accidental complexity (Brooks,
1975). Furthermore, lines of code can be interpreted differently for different programming
languages. Jones (1986) lists 11 different counting methods, 6 at the program level and 5 at the
project level. With the introduction of fourth-generation languages, still more difficulties were
experienced in counting source lines (Verner and Tate, 1988,Verner, 1991).
Size oriented software metrics are derived by normalizing quality and/or productivity measures by
considering the size of the software that has been produced. If a software organization maintains
simple records; a table of size-oriented measures. such as the one shown in Figure in the next page
can be created. The table lists each software development project that has been completed over the
past few years and corresponding measures for that project. Referring to the table entry (Figure) for
project alpha: 12,100 lines of code were developed with 24 person-months of effort at a cost of
$168,000. It should be noted that the effort and cost recorded in the table represent all software
engineering activities (analysis, design, code, and test), not just coding further information for
project alpha indicates that 365 pages of documentation were developed, 134 errors were recorded
before the software was released, and 29 defects were encountered after release to the customer
within the first year of operation. Three people worked on the development of software for project
alpha.
15
In order to develop metrics that can be assimilated with similar metrics room other projects, we
choose lines or code as our normalization value. From the rudimentary data contained in the table, a
set or simple size-oriented metrics can be developed for each project:
 Errors per KLOC (thousand lines or code).
 Defects per KLOC.
 $ per LOC.
 Page or documentation per KLOC.
In addition, other interesting metrics can be computed:
 Errors per person-month.
 LOC per person-month.
 $ per page or documentation.
Size-oriented metrics are not universally accepted as the best way to measure the process or
software development. Most or the controversy swirls around the use of lines or code as a key
measure. Proponents or the LOC measure claim that LOC is an "artifact" or all software
development projects that can be easily counted, that many existing software estimation models use
LOC or KLOC as a key input, and that a large body or literature and data predicated on LOC
already exists. On the other hand, opponents argue that LOC measures are programming language
dependent, that they penalize well-designed but shorter programs, that they cannot easily
accommodate nonprocedural languages, and that their use in estimation requires a level of detail
that may be difficult to achieve (i.e., the planner must estimate the LOC to be produced long before
analysis and design have been completed).
6.3.2 FUNCTION-ORIENTED METRICS: FUNCTION POINT
Function-oriented software metrics use a measure of the functionality delivered by the application
as a normalization value. Since 'functionality' cannot be measured directly, it must be derived
indirectly using other direct measures. Function-oriented metrics were first proposed by Albrecht
16
[AL879J, who suggested a measure called the function point. Function points are derived using an
empirical relationship based on countable (direct) measures of software's information domain and
assessments of software complexity.
Function point counting (Albrecht, 1979; Albrecht and Gaffney, 1983) is a technology-independent
method of estimating system size without the use of lines of code. Function points have been found
useful in estimating system size early in the development life cycle and are used for measuring
productivity trends (Jones, 1986) from a description of users' specifications and requirements. They
are probably the sole representative of size metrics not restricted to code. However, Keyes (1992),
quoting Rubin, argues that they are unlikely to be useful to modern-day graphical user interface
(GUI) developments since they cannot include concerns of color and motion.
Albrecht argued that function points could be easily understood and evaluated even by nontechnical users. The concept has been extended more recently by Symons (1988) as the Mark II FP
counting approach-a metric that the British government uses as the standard software productivity
metric (Dreger, 1989).
In FP application, system size is based on the total amount of information that is processed together
with a complexity factor that influences the size of the final product. It is based on the following
weighted items:
 Number of external inputs
 Number of external outputs
 Number of external inquiries
 Number of internal master files
 Number of external interfaces
Function points are computed (IFP94) by completing the table shown in Figure below. Five
information domain characteristics are determined and counts are provided in the appropriate table
location.
Once these data have been collected, a complexity value is associated with each count.
Organizations that use function point methods develop criteria for determining whether a particular
entry is simple, average, or complex. Nonetheless, the determination of complexity is somewhat
subjective.
To compute function points (FP) the following relationship is used:
17
FP = count total x [0.65 + 0.01 x (Fj)] ……………………. (ii)
Where, count total is the sum of all FP entries obtained from Figure above
The Fj ( j= 1 to 14) are complexity adjustment values based on responses to the following
questions [ART85]:
1. Does the system require reliable backup and recovery?
2. Are data communications required?
3. Are there distributed processing functions?
4. Is performance critical?
5. While system run in an existing heavily utilized operational environment?
6. Does the system require on-line data entry'?
7. Does the on-line data entry require the input transaction to be built over multiple screens or
operations?
8. Are the master files updated on-line?
9. Are the inputs outputs files or inquiries complex?
10. Is the internal processing complex?
11. Is the code designed to be reusable?
I2. Are conversion and installation included in the design?
13. Is the system designed for multiple installations in different organizations?
14. Is the application designed lo facilitates change and ease of use by the user?
Each of these questions is answered using a scale that ranges from 0 (not important or applicable) to
5 (absolutely essential). The constant values in Equation (ii) and the weighting factors that are
applied to information domain counts are determined empirically.
Once function points have been calculated, they are used in a manner analogous to LOC as a way to
normalize measures for software productivity, quality, and. other attributes:
 Errors per FP.
 Defects per FP
 $ per FP.
 Pages of documentation per FP
 FP per person-month.
Example
The function point metric can be used effectively as a means for predicting the size of a system that
will be derived from the analysis model. To illustrate the size of the FP metric in this context, we
consider a simple analysis model representation, illustrated in Figure in the next page. Referring to
the figure, a data flow diagram for a function within the SafeHome software is represented. The
function manages user interaction, accepting a user password to activate or deactivate the, system,
and allows inquiries on the status of security zones and various security sensors. The function
displays a series of prompting messages and sends appropriate control signals to various
components of the security system.
18
The data flow diagram is evaluated to determine the key measures required for computation of the
function point metric:
 number of user inputs
 number of user outputs
 number of user inquiries
 number of files
 number of external interfaces
Three user inputs---- password, panic button, and activate/deactivate---are shown in the .figure
along with two inquires---zone inquiry and sensor inquiry. One file (system configuration file) is
shown. Two user outputs (messages and sensor status) and four external interfaces (test sensor,
zone setting, activate/deactivate, and a1arm alert) are also present. These data, along with the
appropriate complexity, are shown in Figure below.
The count tolal shown in Figure above must be adjusted using Equation (ii)
FP = count total x [0.65.. 0.01 x (Fj)]
19
Where count total is the sum of all FP entries obtained from Figure SafeHome DFD above and Fj
(j = 1 to 14) are "complexity adjustment values." For the purposes of this example, we assume that
(Fj) is 46 (a moderately complex product). Therefore,
FP = 50 x [0.65 + (0.01 x 46)) = 56
Based on the projected FP value derived from the analysis model, the project team can estimate the
overall implemented size of the SafeHome user interaction function. Assume that past data indicates
that one FP translates into 60 lines of code (an object-oriented language is to be used) and that 12
FPs are produced for each person-month of effort. These historical data provide the project manager
with important planning information that is based on the analysis model rather than preliminary
estimates. Assume further that past projects have found an average of three errors per function point
during analysis and design reviews and four errors per function point during unit and integration
testing. These data can help software engineers assess the completeness of their review and test
activities.
Advantages of function points
Advantages of function points as seen by the users of this method are:
 Function point counts are independent of the technology (operating system, language and
database), developer productivity and methodology used.
 Function points are simple to understand and can be used as a basis for discussion of the
application system scope and requirements with the users.
 Function points are easy to count and require very little effort and practice
 Function points can be used to compare productivity of different groups and technologies.
 Function point counting method uses information gathered during requirements definition.
6.4 RECONCILING DIFFERENT METRICS APPROACHES
The relationship between lines of code and function points depend upon the programming language
that is used to implement the software and the quality of the design. A number of studies have
attempted to relate FP and LOC measures.
The following table provides rough estimates of the average number of lines of code required to
build one function point in various programming languages:
A review of these data indicates that one Lac of C++ provides approximately 1.6 times the
"functionality" (on average) as one LOC of FORTRAN. Furthermore, one of a Visual Basic
provides more than three times the functionality of a LOC for a conventional programming
language. More detailed data on the relationship between FP and LOC are presented in and can be
used to "backfire' (i.e., to compute the number of function points when the number of delivered Lac
are known) existing programs to determine the FP measure for each.
20
LOC and FP measures are often used to derive productivity metrics. This invariably leads to a
debate about the use of such data. Should the LOC/person-month (or FP/person-month) of one
group be compared to similar data from another? Should managers appraise the performance of
individuals by using these metrics? The answers to these questions is an emphatic “No”!. The
reason for this response is that many factors influence productivity, making for “apples and
oranges” comparisons that can be easily misinterpreted.
Function points and LOC based metrics have been found to be relatively accurate predictors of
software development effort and cost. However, in order to use LOC and FP for estimation, a
historical baseline of information must be established.
6.5 METRICS FOR SOFTWARE QUALITY
The overriding goal or software engineering is to produce a high-quality system, application. or
product. To achieve the goal, software engineers must apply effective methods coupled with
modern tools within the context or a mature software process. In addition, good software engineer
(and good software engineering managers) must measure if high quality is to be realized.
The quality of a system, application, or product is only as good as the requirements that describe the
problem, the design that models the solution, the code that leads to an executable program, and the
tests that exercise the software to uncover errors. A good software engineer uses measurement to
assess the quality or the analysis and design models, the source code, and the test cases that have
been created as the software is engineered. TO accomplish this real-time quality assessment, the
engineer must use technical measures to evaluate quality in objective, rather than subjective ways.
The project manager must also evaluate quality as the project progresses. Private metrics collected
by individual software engineers are assimilated to provide project-level results. Although many
quality measures can be collected, the primary thrust at the project level is to measure errors and
defects. Metrics derived from these measures provide an indication of the effectiveness of
individual and group software quality assurance and control activities.
Metrics such as work product (e.g., requirements or design) errors per function point, errors
uncovered per review hour, and errors uncovered per testing hour provide insight into the efficacy
of each or the activities implied by the metric. Error data can also be used to compute the deject
removal efficiency (DRE) for each process frame work activity.
6.5.1 MEASURING QUALITY
Although there are many measures of software quality, correctness, maintainability, integrity, and
usability provide useful indicators for the project team. Gilb [GIL88] suggests definitions and
measures for each.
Correctness: A program must operate correctly or it provides little value to its users. Correctness is
the degree to which the software performs its required function; the most common measure for
correctness is defects per KLOC, where a defect is defined as a verified lack of conformance to
requirements. When considering the overall quality of a software product, defects are those
problems reported by a user of the program alter the program has been released for general use. For
quality assessment purposes, defects are counted over a standard period of time, typically one year.
Maintainability: Software maintenance accounts for more effort than any other software
engineering activity. Maintainability is the ease with which a program can be corrected if an error is
encountered, adapted if its environment changes, or enhanced if the customer desires a change in
requirements. There is no way lo measure maintainability directly; therefore, we must use indirect
measures. A simple time-oriented metric is mean-time-to-change (MTTC), the time it takes to
21
analyze the change request, design an appropriate modification, implement the change, test it, and
distribute the change to all users. On average, programs that are maintainable will have a lower
MTTC (for equivalent types of changes) than programs that are not maintainable.
Integrity: Software integrity has become increasingly important in the age of hackers and
firewalls. This attribute measures a system's ability to withstand attacks (both accidental and
intentional) to its security. Attacks can be made on all three components of software: programs,
data, and documents.
To measure integrity, two additional attributes must be defined: threat and security. Threat is the
Probability (which can be estimated or derived from empirical evidence) that an attack of a specific
type will occur within a given time. Security is the probability (which can be estimated or derived
from empirical evidence) that the attack of a specific type will be repelled. The integrity of ( a
system can then be defined as
Integrity = summation [(1 -threat) x (1 -security)]
Where, threat and security are summed over each type of attack.
Usability: The catch phrase 'user-friendliness' has become ubiquitous in discussions or software
products. If a program is not user-friendly, it is often doomed to failure, even if the functions that it
performs are valuable. Usability is an attempt to quantify user-friendliness and can be measured in
terms of four characteristics:
 The physical and or intellectual skill required to learn the system,
 The time required lo become moderately efficient in the use of the system,
 The net increase in productivity (over the approach that the system replaces) measured
when the system is used by someone who is moderately efficient, and
 A subjective assessment (sometimes obtained through a questionnaire) of users attitudes
toward the system..
The (our factors just described are only a sampling or those that have been proposed as measures (or
software quality.
6.5.2 DEFECT REMOVAL EFFICIENCY
A quality metric that provides benefit at both the project and process level is defect removal
efficiency (DRE). In essence, DRE is a measure of tile filtering ability of quality assurance and
control activities as they are applied throughout all process frame- work activities.
When considered for a project as a whole, DRE is defined in the following manner:
DRE = E/(E + D)
Where, E is the number of errors found before delivery of the software to the end-user and D is the
number of defects found after delivery.
The ideal value for DRE is 1. That is no defects are found in the software. Realistically; D will be
greater than 0, but the value of DRE can still approach 1. As E increases (for a given value of D),
the overall value of DRE begins to approach 1. In fact, as E increases, it is likely that the final value
of D will decrease (errors are filtered out before they become defects). If used as a metric that
provides an indicator of the filtering ability of quality control and assurance activities, DRE
22
encourages a software project team to institute techniques for finding as many errors as possible
before delivery.
DRE can also be used within the project to assess a team's ability to find errors before they are
passed to the next framework activity or software engineering task. For example, the requirements
analysis task produces an analysis model that can be reviewed to find and correct errors. Those
errors that are not found during the review of the analysis model are passed on to the design task
(where they mayor may not be found). When used in this context we redefine DRE as
DREi = Ei/(Ei+Ei+1)
Where, Ei is the number of errors found during software engineering activity i and Ei+1 is the
number of errors found during software engineering activity i+1 that are traceable to errors that
were not discovered in software engineering activity i.
A quality objective for a software team (or an individual software engineer) is to achieve DRE,
which approaches 1. That is, errors should be filtered out before they are passed on to the next
activity.
6.6 SUMMARY
Measurement enables managers and practitioners to improve the software process; assist in the
planning, tracking, and control of a software project; and assess the quality of the product (software)
that is produced. Measures of specific attributes of the process, project, and product are used to
compute software metrics. These metrics can be analyzed to provide indicators that guide
management and technical actions. Process metrics enable an organization to take a strategic view
by providing insight into the effectiveness of a software process. Project metrics are tactical. They
enable a project manager to adapt project work now and technical approach in a real-time manner
Both size- and function-oriented metrics are used throughout the industry. Size-oriented metrics use
the line of code as a normalizing factor for other measures such as person-months or defects. The
function point is derived from measures of the information domain and a subjective assessment of
problem complexity.
Software quality metrics, like productivity metrics, focus on the process, the project, and the
product. By developing and analyzing a metrics baseline for quality, an organization can correct
those areas of the software process that are the cause of software defects.
Metrics are meaningful only if they have been examined for statistical validity. The control chart is
a simple method 1or accomplishing this and at the same time examining the variation and location
of metrics results.
Measurement results in cultural change. Data collection metrics computation and metrics analysis
are the three steps that must be implemented to begin a metrics program. In general, a goal-driven
approach helps an organization focus on the right metrics for its business. By creating a metrics
baseline----a database containing process and product measurements----software engineers and their
managers can gain better insight into the work that they do and the product that they produce.
23
Download