Change Rate and Complexity in Software Evolution

advertisement
Change Rate and Complexity in Software Evolution
Andrea Capiluppi
Dipartimento di Automatica e
Informatica
Politecnico di Torino – ITALY
Andrea.Capiluppi@polito.it1
1A.
Juan F. Ramil
Alvaro E. Faria
Computing Department
Computing Department
Faculty of Maths and Computing
Faculty of Maths and Computing
The Open University, Milton Keynes – The Open University, Milton Keynes –
UK
UK
J.F.Ramil@open.ac.uk
a.e.faria@open.ac.uk
Capiluppi is currently visiting Computing Department at The Open University, UK
Abstract
This paper deals with the problem of identifying
which parts of the source code will benefit most from
refactoring, that is, complexity reduction, and how this
work should be prioritised for maximum impact on the
evolvability of a system. Our approach ranks the files
based on the value of a metric called release-touches,
which simply counts the number of releases at which a
given file has been changed. The metric enables us to
split the code into two parts: the more stable and the less
stable. The paper presents empirical data derived from
the distributed file system Arla, an evolving Open Source
System, with 62 public releases since 1998. The results
indicate that the source files which are subject to the
higher change rate include a large portion of highly
complex functions. It is argued that refactoring of this
less stable, more complex functions, is likely to have an
impact on evolvability.
Keywords: Cyclomatic Complexity, Maintenance,
Metrics, Open Source, Refactoring, Software Evolution
1.
Introduction and rationale
The results from empirical studies of software maintenance
and evolution can provide the basis for improved processes,
methods and tools. In our recent work we have been studying the
long-term evolution of Open Source Software (OSS). In several
recent studies we have been looking at the relationship between
functional growth, software structure and change rate in OSS
systems [Capiluppi et al 2004a,b]. The present paper explores
relationship between change rate and complexity during the
evolution of an OSS software system. The results presented in this
paper complement the previous studies by identifying which parts
of the software system should be refactored first, so that the
maximum impact on evolvability is achieved.
Anti-regressive work [Lehman 1974], that is, complexity
control, is an essential task in order to sustain software evolution
(hypothesis 4a). Peaks of anti-regressive work are likely to be
associated to discontinuities in the trends and to restructuring
releases, releases in which the architectural complexity is reduced
by means, for example, of re-organising modules in sub-systems
(hypothesis 4b)
In the early seventies, Lehman observed that as change upon
change are implemented in a software system, its complexity is
likely to increase, unless effort is allocated to complexity control
[Lehman 1974]. Lehman used the term anti-regressive in order to
refer to such complexity control and to distinguish it from
progressive work. The former does not add immediate value to the
system. The latter does. Re-writing code so that it implements the
same function but in a less complex way is an example of antiregressive work. It adds long-term value to the system by making
easier the implementation of future changes to the code. Parnas
recognised the presence of software aging effects as a
consequence of the accumulation of change [Parnas 1994].
Lehman and Parnas views justify the development of metricsbased methods and tools which related to anti-regressive work. In
the recent years, the term refactoring, a term closely related to
Lehman's antitegressive work, has become popular [refactoring].
Empirical evidence suggests that anti-regressive work is an
integral part of the work undertaken during software
evolution[Capiluppi & Ramil 2004]. The question is whether such
work can be directed to the parts of the system in which it would
provide the highest benefit in terms of improving productivity. In
order to develop a metrics-based model or a tool to guide
refactoring efforts, one can start, for example, by identifying the
relationship between change rate and complexity using empirical
data from actively evolved systems. One then can identify what
parts of the system should be refactored first in order to achieve a
maximum benefit. One possible approach consists of refactoring
the most complex functions in the less stable [Bevan & Whitehead
2003] part of the system, the part that historically has been shown
as more likely to undergo change.
In this paper we explore this idea through a metrics-based
approach which identifies the less stable part of a system, that is,
the part of the system that has been subject to the higher evolution
rate. Then, we observe the tails of the complexity distribution for
the files or function belonging to this part of the system. The tails
contain the subset of the most complex functions in the system that
are most likely to change in the future. One can argue that
refactoring these functions is likely to provide the higher gains.
This paper presents initial results based on empirical data from
Arla, an Open Source software which implements distributed file
systems, and briefly indicates plans for our further work.
2.
Approach and application to a case
study
In this section we introduce the sequential steps used to obtain
the results presented so that other researchers may be able to
replicate this study. The steps are as follows:
1. Project selection: in principle, any system, either proprietary or
OSS can be studied with this approach. Access to source code is
necessary to perform the analysis. Ideally one wish to have
access to the source code at each one of all the past releases of
the software. But it could be sufficient with having access to the
code at the most recent release2 if there is also access to
sufficiently complete and well-formatted change log records
which can make it possible the extraction of the metrics of
interest.
2. Metrics extraction: The attributes which are being examined in
this study are: the number of releases for which a particular file
has been touched, called release-touches, for each file in the
source code, on the one hand, and the complexity of each
function measured by the McCabe cyclomatic complexity index
[McCabe 1976], on the other. The number of release-touches is
simply the number of different releases during which a file has
been manipulated, via addition of code to it, removal of existing
code or modification. When the number of release-touches is
one, it indicates that the file has been touched (created or
modified), during one single release interval, and never touched
again during the evolution of the system. The maximum
possible number of release-touches for a file is the total number
of releases. A number of release-touches equal to the number of
releases indicates that a file has been created and then touched at
every release. In the majority of the cases the investigators can
only access a subset of all the releases of a software system, or
the software system has inherited code from an existing
software system. If this is the case, a number of release-touches
equal to zero for a particular file indicates that the file was
already present at the first available release and never modified
again. The release-touches metric provides a way to track the
number of releases during which a file has been modified, and it
can also be expressed as a percentage of the total number of
releases available for study in the evolution history of the
system. One way of calculating the metric is to compare each
distinct source file with the same file at the subsequent (or
2
In general, a software release is an executable snapshot of the evolving
software which is delivered to users. We use the term release to refer to
the source code, as it was at the moment in time in which it is
delivered.
1.1.
System studied: Arla
Arla is an Open Source implementation of the AFS file system:
AFS was first created by IBM, and Arla is an OSS clone
emulating AFS on various configurations (Windows NT, Linux,
Solaris, and so on). This system made available its first public
release in February 1998 (labeled 0.0), and its most recent release
is labeled 0.35.12 (considered as a minor release, February 2003).
35 major releases and 27 minor releases have been generated so
far, that is, a total of 62 public releases are downloadable from the
Internet for study.
1.2.
Source files identification
The Arla system is written mainly in C (*.c files), which
together with header files (*.h) and macros (*.m4), form the basic
core of the system. Figure 1 displays Arla's growth trend in total
number of files: From around 330 files at its first available release,
the system evolves up to 760 source files in its latest available
release.
By examining all the releases of the system, we identified a
total of 1055 distinct source files. About 300 of them have been
deleted during the evolution of the system, which explains why the
latest available release consists of 760 source files only.
All the Arla results in this and the next sections are obtained
through PERL scripts that we developed, which can be made
available to other researchers who may wish to replicate the study.
Arla - evolution of source files
number of source files
800
750
700
650
600
550
500
450
400
350
57
60
54
51
45
48
42
39
36
30
33
27
24
21
15
18
12
1
3
5
7
9
300
releases
Figure 1 - Arla system: growth of the number of source files
1.3.
Metrics extraction
per file we evaluated each
For each pair of Release-touches
subsequent releases,
distinct source file, in order to capture any delta (i.e., addition,
deletion or modification). The results were summarised for each
file using the release-touches metric with each of the files being
assigned a fsn according to the release-touches value and then
ranked. Figure 2 presents the value of the release-touches
metric expressed as a percentage of the total number of
releases. A small subset of the files was changed frequently
(left part of Figure 2), while most of them were never
changed, or changed onlyfile once
(right part and tail of Figure
sequence number
09
10
32
4
36
4
40
4
44
4
48
4
52
4
56
4
60
4
64
4
68
5
72
5
76
5
80
5
84
5
88
5
92
5
96
5
28
5
24
6
7
20
7
12
7
16
91
1
80%
75%
70%
65%
60%
55%
50%
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
28
60
release touches
previous) release by using UNIX tools (e.g. diff). One releasetouch is counted for each release interval at which at least an
addition, deletion or modification of source lines is reported by
the tool. This indicator provides a measure of evolution change
rate that can be applied to software repositories.
3. Identification of the less stable part of the software: our
approach ranks all the files according to its number of releasetouches. One possible way of ranking the files is by assigning to
each file a file sequence number (fsn), say, fsn 1 to the file that
has the highest number of release-touches, and increasing the
assigned fsn as the number of release-touches decreases. Files
with the same release-touches number can be ranked
alphabetically according to their name or following other rule.
This way, the higher fsn is assigned to one the files which has
the lowest number of release-touches. Then, we can use the fsn
in order to split the set of files into two groups of equal size in
terms of number of files. These two sets are as follows: one
contains the 50 percent of the files or so with the higher number
of release-touches, what we call the unstable part of the system,
and the remainder 50 percent of the files which were subject to
the lower change rate, are called the stable part of the system.
4. Complexity measurement for stable and unstable parts of the
system: we wish to check whether the files which are subject to
more frequent change (unstable part) represent the more
complex part of the system, as implied independently by
Lehman and Parnas. In order to do this, we use a tool [metrics]
that evaluates the McCabe index for the functions belonging to
the files in each of the two sets identified during step 3 above.
The McCabe index is evaluated by UNIX utilities [metrics].
One issue that emerges here is that McCabe index is defined
for functions (or procedures), which are contained within files,
but there is no equivalent measure of complexity at file level to
the McCabe index. The issue is then how to compare the
measures at two different levels of granularity: the change rate
at the file level (releases touches) with the complexity measure
at the function level (McCabe). Possible alternatives include:
calculating the median or average McCabe index of the
functions in a file, evaluating the sum of the McCabe indexes
per file, calculating a weighted average McCabe index for each
file, which takes into account the portion of the size file which is
allocated to a particular function, or alternatively, if possible,
making the analysis of release-touches at the function level,
instead of files. Further work will need to consider the merits of
each of those. For the moment we have applied the first two: the
sum and the average of the McCabe index, which are the
simplest to obtain.
5. Identifying candidates for refactoring: this is done by analysing
the files in the unstable part of the system and identifying those
which have the highest complexity values. In general, the
McCabe index is considered as an indicator of overly complex
functions when its value is larger than 20 [McCabe & Butler
1989]. All the functions above that limit can become candidates
for refactoring. As a final step, in order to assess whether the
most complex functions have already been subject to antiregressive work in the past, we then consider, and display the
evolution of their McCabe index over release sequence numbers
(rsn).
Figure 2 - Distribution of release-touches per file, as a
percentage of total number of releases. The line and circle
indicate the point that splits files into two sets of similar size.
2). deleted files does not affect the overall qualitative results of the
study.
Next, since we are interested in understanding whether larger
files correspond to heavily changed ones, we plotted the size of the
corresponding files at the latest available release. Figure 3 shows
the count of lines of code when the corresponding files are ordered
using the fsn (same rank order as in Figure 2). The horizontal axis
shows the same files as above, while the vertical axis displays their
individual size in LOCs: from this figure it's clearly visible that
heavily changed files are also the larger ones. One concludes that
the parts of the system subject to more frequent change are also the
parts that become larger.
Figure 3 - Size in lines of code for source files ranked in the
same order as in Figure 2 Size of files
at latest available release
Lines of code (LOC)
5000
4500
4000
3500
3000
2500
2000
1500
1000
500
2
5
7
0
3
9
99
94
88
82
77
6
8
1
4
6
9
5
7
2
file sequence number
71
65
59
54
48
42
36
31
25
0
19
14
1
42
88
0
We performed a similar visualisation of Figure 3 but now for
complexity. We obtained the McCabe index for complexity at the
function level. Then, we obtained an indicator for complexity at
the file level in two different ways: by using the sum of all
complexity indexes of the functions in the same file, and by using
the average of complexity indexes of all the functions in the same
file.
McCabe - sum of indexes per file
200
180
Sum of McCabe indexes per file
Figure
160 4 - Sum of complexity index of all functions in a file
140
120
100
80
60
40
20
20
10
24
4
29
9
35
4
40
9
46
4
51
9
57
4
62
9
68
4
73
9
79
4
84
9
90
4
95
9
3
9
18
13
1
40
84
0
file sequence number
The first visualisation (Figure 4) shows on the horizontal axis
the files in the same rank order imposed by the number of releasetouches as in (Figure 2), while the vertical axis displays the sum of
the complexities of functions inside each file. This visualisation
displays the data at the latest available release of the system
studied. A similar pattern than that found investigating LOCs is
found here but with respect to complexity of files: the more
50
80
45
70
40
60
35
30
50
25
40
113
430
118
987
224
534
29
318
1
35
363
8
40
428
46 5
482
51 2
537
57 9
592
62 6
657
68 3
712
73 0
77
7967
82
8424
87
9081
2
9
9538
7
99
10 5
18
20
30
15
20
10
10
5
0
0
11
440
2
884
8
McCabe
index
Number
of functions
McCabe
- average
of indexes
per file
Number
of functions
per file
Figure
5 - Number
of available
functions per
file, at latest available
60
(latest
release)
Figure
release
55 6 - Average complexity index of all functions per file
90
file sequence
number
file sequence
number
complex files are the ones that underwent the higher number of
release-touches.
The second visualization (Figure 5) displays the complexity of
files when calculating the average of the complexities of their
functions: the results are different from both Figure 3 and Figure
4, since complexity of files appears to be more uniformly
distributed over files which have been subject to different amount
of changes. This result could be a consequence of the average
calculations. In order to understand better the issue, we calculate
the number of functions per file and presented it as a function of
the fsn in Figure 6. This shows that, when considering the latest
available release, the files which have been subject to more
changes are also the ones that contain a larger number of
functions. The higher the number functions, the more likely that
the file will contain a mixture of small and large functions with
dissimilar complexity. Then the average complexity index
becomes less meaningful as an indicator for the purpose of
identifying candidates for refactoring. Ideally the analysis should
be done at the function level.
1.4.
Identification of more stable and
less stable parts
Observing Figure 2, there's a clear distinction between files
which were heavily changed, and files which are subject of less
evolution work. In order to compare the complexity of the
unstable and stable parts of the systems, we split the sequence of
files in Figure 2 in two halves, each containing 50% of the distinct
source files. Splitting the horizontal axis of Figure 2 in two, we are
interested in isolating files that were never or once changed from
the others. We refer to these two subsets as the less stable part of
the system and the most stable part, respectively. As presented in
Figure 7, we selected the functions pertaining to the highly
modified files, and displayed the distribution of complexity
measures, as a McCabe number (horizontal axis). The vertical axis
presents the number of functions having a particular McCabe
index. A large number of small and minimally complex functions
are represented by the first two bars at the left of the plot.
However, a small number of functions display the highest
complexity values in the tails of the distribution, with a maximum
number of 136 as the McCabe index for the most complex
function in the Arla system. This histogram, again, reflects the files
and functions of Arla at its latest available release.
We used the same approach to draw the histogram for the most
stable subset of files: the results are displayed in Figure 8. At least
two observations can be made here. The first observation relates to
the length of tails. The tail in Figure 7 is longer than in Figure 8,
which demonstrates that more frequently touched files contain the
most complex functions. The second observation relates to the
total sum of functions composing this subset of files: in the most
stable part we can sum up to around 800 functions composing the
distribution (Figure 8), while the less stable part consists of
(around) 2700 functions composing the distribution (Figure 7).
This indicates that the less frequently touched files are also less
complex with respect of the number of functions contained.
Less part
stable
half
More
stable
half -McCabe
- file
file level
level
Figure 8 - Less stable
of Arla:
index for functions
in the1000
files belonging to the less stable half of the files
900
800
Procedures
functions
700
700
600
600
500
500
400
400
300
300
1. However, we have found a transformation which
applied to the data have approximately make those
associations linear. Figure 2 shows the plots of the
associations after natural logarithmic transformations
have been applied to the data. The resulting relationships
are approximately linear.
L
2
0
4
6
8
1
M
0
2
4
8
1
0
2
3
4
5
6
M
0
e
0c
s0
C
s
a
S
b
t e
a
b
I
n
l
e
d
e
F
x
u
n
c
t
i
o
n
s
200
200
100
100
4192
6
5123
4
4161
8
4131
0
41002
617
82
3
10
29
133
5
1641
4
19 7
53
22
59
256
5
2871
7
31 7
83
348
9
3795
00
25
411
0
0
McCabe
McCabe index
index
Figure 7 - Most stable part of Arla: McCabe index for functions
in the files belonging to the more stable half of the files set
3.
Comparison between the McCabe
Score function for the more and the less
stable phases of an open source code
development
To compare the functional association of the McCabe
indexes for less stable functions with the McCabe indexes
for more stable functions, we make use of the statistical
theory of linear models. We will be fitting the best linear
regression model (in the maximum likelihood sense) to
the data relative to each of the two associations and
perform the appropriate hypothesis tests for the
differences between them. A reason of using linear
models is that the theory behind them is (apart from being
coherent and robust) easy to communicate to a nonstatistical audience. Certainly, there is scope for adopting
more complex (but also more refined) non-linear models
in a future development such as a generalised linear
model (GLM) for Poisson distributed data with
logarithmic link function and possibly learning (from the
data) on the Poisson’s mean.
2. 1. Testing for differences between two linear
regression models
To fit a linear regression model to represent the
association between two variables we first must make
sure the association is at least approximately linear. This
is not the case in our problem where the two associations
are both exponential as can be seen in the plots of Figure
0c
o
C r
e
a
S
b
e t
a
I
b
n
l
d
e
e
F
x
u
n
c
t
i
o
n
s
Figure 1 – The scatter plots of the original data
6
7
0
1
2
3
4
5
L
5
6
7
0
1
2
3
4
L
.
.
0n
0n
(
0
0
(
L
M
M
e
c
o
c
s
C
C r
s
e
a
a
S
S
b
b
t
t
a
e
e
a
b
)
)
b
l
l
e
e
y1x
)
)
Figure 2 – the scatter plots of the transformed data.
In order to verify whether there is a significant difference
between the two functional relationships of the
transformed data, we have adopted an approach
consisting in fitting (i.e. estimating from the data) two
linear regression models for each association. Note that
here we are using the regression lines estimated from the
data as descriptors of the associations between the
variables. In this sense, the term ‘significant’ means that
when comparing the two regression lines the variability of
the data in each case is being taken into account. That is,
the statistics used for the comparisons are normalized by
measures of the data variability such that they are made
appropriate to compare with the tabled value of a
standard probability distribution function (that represents
the ideal or normally expected value).
Now, we can represent each linear regression model, in
the following way. Let y = ln(McCabe), x1 =
ln(LessStable), x2 = ln(MoreStable), the α’s be the
intercepts and the β’s the slopes of the respective curves,
and, the ε’s be the error terms (independent normally
distributed zero mean errors), then we can write
(1)
ˆd 
ˆ1 
ˆ2 are the estimates of αd = α1 – α2,

2
ˆ and
ˆ are the data variances of each of the

y2x
where
2
1
2
intercepts, and n1 and n2 are the sizes of each respective
data set. Zp/2 is the value of the cumulative standard
normal distribution corresponding to the significance
level p.
Without loss, the test for the difference in the slopes is
similar to the above test by substituting the α’s by the β’s
accordingly.
We now apply this result to each of the estimated curves.
The intercept and the slope for the regression of the
ln(McCabe) on the ln(LessStable) estimated by the MLE
(maximum likelihood estimation) method produced the
following results.
3. 2. Regressing
Functions
McCabe
on
Less
Stable
Coefficientsa
Model
1
Unstandardized
Coefficients
B
Std. Error
3.550
.086
-.458
.036
(Constant)
lnLess
Standardized
Coefficients
Beta
-.895
a. Dependent Variable: lnCompl
Note that not only the estimations for the intercept and the
slope are provided in column ‘B’ of ‘Unstandardized
Coefficients’ but also the corresponding standard
deviations of each estimator under the ‘Std. Error’
column. They are all significant as ‘Sig’. are 0.000 for
both estimates. That is, relating to our model (1) we have
that
(2)
Comparing the two relationships represented by the
equations above corresponds to comparing the intercepts
and the slopes of the two linear regression models. This in
its turn corresponds to testing the hypothesis:
ˆ
ˆ
ˆ
ˆ


3
.
5
5
,


0
.
4
5
8
,


0
.
0
8
6
a
n
d


0
.
0
3
6
.


1
1
1
1
(4)
H 0 :  d  d  0
H a :  d ,  d 
0
Model Summary
where αd = α1 – α2 and βd = β1 – β2 are the differences
between the intercepts and the slopes respectively. It is a
standard result in statistics that an appropriate test for the
hypothesis above is to reject H0 if
Model
1
R
R Square
.895 a
.801
Adjusted
R Square
.796
Std. Error of
the Estimate
.46635
a. Predictors: (Constant), lnLess
z 
ˆ d
ˆ1
2
n1

ˆ2 2
n2

The R Square is a measure of the variability in the data
explained by the estimated regression. In this case, the
model is explaining about 80% of the data
Z pestimated
/2
variability. This means this model produces a very good
fit to the data.
The estimated line is represented below:
(3)
0
1
2
3
4
5
6
7
O
L
L
n
b 0n
i.
n
e( a
(
es 0
L
Mr
r
e
v
c
e
C s
sd
a
S
t
b
e
a
)
b
l
e
)
4
5
-L
R
3
2
1
0
D
S
2
1
-4
R
3
2
1
0
D
S
2
1
.
0n
ec e
ec e
0
( g a p
g a p
M
r t
e
et
r t
e
et
c
en
s
en
s
C
r ds
ap
i e
b
o
l
no
n
e
t t )
S
V
t
a a
r ds
p
i e
o
l
no
n
t t
A
S
V
t d
a a
j
rn
i
d
a
a
b
rn u
i
d s a
t
a
b
e
r
l
d
e
i
:
z
r
l
d d
e
i
:
( z
L
e
P
L
e
r
d
n
d
n
(
e
R
M
( s R
M
e
s
e
)
c
s
c
s
P
C i
d
C i
d
r
a
u
b
a
e l
a
eu
b
d a
e il
)
c
)
t
e
d
V
a
l
u
e
Before we proceed to estimating the other regression line,
we visually verify whether the errors look to be
independent and normally distributed with zero mean, by
examining the histogram and the Normal P-P plot of the
errors as well the plot of standardised errors against both
the dependent variable ln(McCabe) and the predicted
values. The histogram should have an approximately bell
shape of a normal distribution, the P-P plot should follow
the diagonal line and the standardised errors against the
dependent variable and the predicted values should look
scattered along the zero axis.
R
3
1
8
6
4
2
0
F
M
S
N
D
H
2
1
0
t
er
d e=
ie
ea
s
.
g
4
pn
D
q
t
2
r
=e
o
e
u
e
-n
v
g e. s
6
=
.
dn rs
1
a0
1
c
i e
.o
E
m
9
y
n
- 8n
1
8
t
S
6
V
t
a a
rn
i
d
a
a
b
r
l
d
e
i
:
le
z
n
d
C
o
R
e
m
s
i
p
l
d
u
e
a
x
l
i
t
y
The regression above seems to have produced
approximately zero mean normally distributed errors. The
standardised errors seem to have some scatter around the
zero axes although there is scope for improvement on
that. However, in general this model seems a good one.
4. 3. Regressing McCabe on More Stable
Functions
Now, the second regression (2) is estimated as:
O
0
1
E
D
N
.
2
4
6
8
0
x
bo
e
p
sp r
em e
e
c
r
n
av t
e d
e
l
d
ed P
n
C -
C
P
t
u
u
V
m
P m
a
l
oP r
P
i
tr
a r
o
o o b
b
b
f l
e
R
:
e
l
n
g
C
r
e
o
s
m
s
i p
o
l
n e
x
S
i
t
t
y
a
n
d
a
r
d
i
z
e
d
R
e
s
i
d
u
a
l
Coefficientsa
Model
1
(Constant)
lnMore
Unstandardized
Coefficients
B
Std. Error
3.165
.091
-.418
.032
Standardized
Coefficients
Beta
-.940
a. Dependent Variable: lnCompl
ˆ
ˆ
ˆ
ˆ


3
.
1
6
5
,



0
.
4
1
8
,


0
.
0
9
1
a
n
d


0
.
0
3
2
.


2
2
(5)
2
2
R
3
2
1
0
D
S
1
.
0
5
ec e
g a p
r t
e
et
en
s
r ds
p
i e
o
l
no
n
t t
A
S
V
t d
a a
j
rn u
i
d s a
t
a
b
e
r
l
d d
e
i
:
( z
P
L
e
r
d
n
e
( s R
M
s
e
)
c
s
P
C i
d
r
a
eu
b
d a
e il
c
)
t
e
d
V
a
l
u
e
Model Summary
Model
1
R
R Square
.940 a
.884
Adjusted
R Square
.878
Std. Error of
the Estimate
.32760
a. Predictors: (Constant), lnMore
0
1
2
3
4
5
6
7
L
lLO
R
F
7
6
5
4
3
2
1
0
M
S
N
D
H
n
2
1
C
se 0
bn 0n
i.
t
er
d e=
ie
ea
s
.
(
g
2
pn
D
q
r or
M
e a
t
r
4
=e
o
e
v
u
e
e
-n
v
om
g e. s
5
ep
r
d
=
.
dn rs
a0
1
3
l
S
c
i e
.o
E
m
9
y
n
e
- 7n
ax
t
1
8
t
S
6
b i
V
t
l
a a
t
y
e
)
rn
i
d
a
a
b
l
r
d
e
i
:
le
z
n
C
d
o
R
m
e
s
i
p
d
l
u
e
x
a
l
i
t
This model is also a good model. It explains 88.4% of the
data variability and does not present any serious
departures for the normality and independence
assumptions of the linear regression models.
y
5. 4. Testing for differences
Now, substituting the estimated values shown in (4) and
(5) on the test statistics (3) we can test for difference
between the intercepts:
ˆ
ˆ
ˆ
ˆ


3
.
5
5
,


0
.
4
5
8
,


0
.
0
8
6
a
n
d


0
.
0
3
6
.


1
O
0
1
E
D
N
.
2
4
6
8
0
x
bo
e
p
sp r
em e
e
c
n
r
av t
e d
e
l
d
ed P
n
C -
P
C
t
u
V
u
m
P m
a
l
oP r
P
i
tr
a r
o o b
o
b
b
f l
e
R
:
e
l
n
g
C
r
e
o
s
m
s
i p
o
l
n e
x
S
i
t
t
y
a
n
d
a
r
d
i
z
e
d
R
e
s
i
d
u
a
l
1
z 
2
ˆ1 ˆ 2

ˆ21 ˆ2 2
n1
1
.
0n
ec e
0
( g a p
M
r t
e
et
c
en
s
C
r ds
ap
i e
b
o
l
no
n
e
t t )
S
V
t
a a
rn
i
d
a
a
b
r
l
d
e
i
:
z
L
e
d
n
(
R
M
e
c
s
C i
d
a
u
b
a
e l
1
ˆ
ˆ
ˆ
ˆ


3
.
1
6
5
,



0
.
4
1
8
,


0
.
0
9
1
a
n
d


0
.
0
3
2
.


2
4
-L
R
3
2
1
0
D
S
1

n2
2
2
3.55  3.165
2
2
;
0.385
; 16
0.02283
;
0.04
8.5746 
103
(0.086) (0.091)

42
24
)
z 
ˆ1  ˆ2

ˆ 21 ˆ 2 2
n1

n2
0.458  0.418
2
2
(0.036) (0.032)

42
24
Now, at 1% significance level (p = 0.01) we have from
the table of values for the standard normal distribution
that
Zp/2 = Z0.005 = 2.56.
and=
Therefore, z  2.56 for both and we can conclude that
there is evidence from the data to support the rejection of
H0, i.e. the two regression lines are different.
4.
Identification
functions
of
most
complex
An histogram of the McCabe index for the most complex
functions, at the latest available release, is drawn in Figure 9.
Basically, we observe a subset of 25 functions that are part of the
tail of the distribution in Figure 7, with a peak around a value of
135 for the McCabe index, which denotes a highly complex and
critical function, together with other functions which should be
also considered as highly complex.
Based on these observations, one may wish to investigate how
the complexity of these functions evolved, whether they were
touched frequently, and if yes, how many times. Figure 10 and
Figure 11 show two subsets of these functions, and in only one
case (evidenced in Figure 10) the final complexity of the function
is lower than in its initial state. All the other functions depicted
Figure 9 - Most complex functions at latest available release
Most complex functions - latest available release
140
130
McCabe index
120
110
100
90
80
70
60
50
40
30
20
10
0
functions
show an increase value of the McCabe index. This suggest that
within this system a small portion of functions is highly complex,
and the several times these functions were handled, (the steps
depicted in both Figure 10 and Figure 11) the outcome was an
increasing complexity.
Some cases of anti-regressive maintenance are visible when
considering descending steps, but in general, when comparing the
values for the first and the last release available, the overall
complexity trend for each plotted function is increasing.
Given that the distribution of the number of functions per file is
not uniform, with the highest number of functions present in the
most frequently touched files, we decided to split the system in
two halves based on the number of release-touches as in Figure 2,
but selecting the two halves, such as the two contain the same
number of functions. We have found that the splitting point in
Figure 2 will be now fsn 210. In order words, the first 210 files
ranked by the number of release-touches, have 50 percent of the
functions. We have also found that 20 of the 25 most complex
functions are present in the less stable part of Arla and only 5 of 25
in the most stable part, providing further evidence between the link
between change and complexity. In a different paper we have
presented evidence that there have been complexity control work
in the Arla system with between 1 and 2 percent of the files being
subject of some refactoring [Capiluppi & Ramil 2004]. The results
presented here suggests that focusing this relative small refactoring
effort on the less stable, highly complex part of the system will be
an advantage for the long term evolvability of this system.
Note: Figure 2 displays not only the files present in the latest
release but also those files that have been deleted from the Arla
code repository. But it was not possible to obtain the size in LOCs
and McCabe indexes for deleted files. For the sake of simplicity of
data manipulation, in Figure 3,4,5 and 6, the deleted files have
been ranked, plotted and given a zero value for their LOC and
McCabe indexes. The presence of the deleted files with zero
McCabe value has biased the histograms in Figure 3,4,5 and 6 but,
we believe, not to the point of changing the overall conclusions of
this analysis, which are mainly qualitative. In fact the authors are
have performed statistical analysis of the histograms removing the
deleted files from the dataset and the analysis confirms the
conclusions stated here. The authors plan to submit for publication
the results of the statistical analysis in the near future.
5.
Conclusions and further work
The study of the relationship between change rate and
complexity is important given the lack of approaches that can
provide objective guidance to developers as to which parts of the
system should be refactored first than others. Requirements evolve
and generally it is difficult to predict the detailed future
requirements for an evolving system. Given this limitation, we use
historical data reflecting the past evolution of the system. We can
use the patterns we observe in historical behaviour in order to
guide the future evolution of the system.
In this paper we have analyzed Arla, a system released as Open
Source Software, from the point of view of the number of changes
and corresponding complexity of its components. We have used a
metric, release-touches, which represents the number of releases
where a change to a particular file was detected. We used this
metric to rank the files according to their stability over releases.
The corresponding size of files at the latest available release was
evaluated, and we found that heavily changed files correspond in
general to larger files. In addition, heavily changed files contain
more functions than less changed files: this is also reflected when
depicting the sum of McCabe indexes per file, which tend to
follow a similar pattern to the distribution of sizes.
Based on the distribution of release changes, we divided the
corresponding area in two halves, each containing 50 percent of
the files, containing two different profiles of files changed. We
modified the granularity level of our analysis, and observed the
functions rather than files. The half having the most frequent
changes (the less stable part) revealed to be both the more complex
in the sense of highest McCabe index, and the bigger in terms the
total number of functions.
We finally focused the analysis on the highly complex
functions, and observed the evolution of their complexity in the
application life-cycle: several steps of increasing complexity are
recognizable, while efforts for reducing it (descending steps) are
not enough to avoid a latest state which is more complex than the
initial state. The corelation observed between high complexity and
low statibility suggests that focusing refactoring efforts on the less
stable parts of the system can be of benefit in order to avoid
system decay and stagnation and for maximum impact on
productivity.
This paper presents results which we plan to further investigate
as part of our research programme into Open Source Software
evolution. The analysis presented in this paper is visual and mainly
qualitative. We plan, however, to refine our approach. We have
already started to apply statistical modeling techniques in order to
encapsulate the patterns observed. Further work could involve the
adaptation of our approach for object-oriented systems and the
exploration of different criteria for the splitting of the system into
more stable and less stable parts. We intend to perform
dependency analysis between each of the most complex functions
and the rest of the system in order to asses whether refactoring can
be performed in a localised, function by function basis, or whether
it will require modification of large parts of the system. This will
enable us to estimate the cost of the refactoring and prioritize the
possible refactorings in an objective manner. We also plan to
replicate this initial study in other 20 Open Source software or so
from which we have collected an empirical dataset. Using this
dataset we plan to explore approaches to determine which parts of
an evolving system will benefit most from complexity control
work such as refactoring.
6.
References
[Bevan & Whitehead 2003] Bevan J. & Whitehead E.J.,
Identification of Software Instabilities, WCRE 2003, 13 - 16
November 2003, Victoria, Canada
[Capiluppi & Ramil 2004] Capiluppi A., Ramil J.F.; Studying
the Evolution of Open Source Systems at Different Levels of
Granularity: Two Case Studies, Proc. Intl. Workshop on the
Principles of Software Evolution, IWPSE 2004, 5 – 6 Sept.
2004, Kyoto, Japan
[Capiluppi et al 2004a] Capiluppi A., Morisio M., Ramil J.F.;
Structural evolution of an Open Source system: a case study,
Proc. of the 12th International Workshop on Program
Comprehension, 24 - 26 June 2004, Bari, Italy,
[Capiluppi et al 2004b] Capiluppi A., Morisio M., Ramil J.F.;
2004, “The Evolution of Source Folder Structure in actively
evolved Open Source Systems”, Proc. Metrics 2004, 14 - 16
September, 2004, Chicago, Illinois, USA
[Lehman 1974] Lehman M.M.; Programs, Cities, Students,
Limits to Growth?, Inaugural Lecture, in Imperial College of
Science and Technology Inaugural Lecture Series, v. 9, 1970,
1974: 211 - 229. Also in Programming Methodology, Gries D
(ed.), Springer Verlag, 1978: 42 - 62
[metrics] Metrics, a tool for extracting various software
metrics, at http://barba.dat.escet.urjc.es/index.php?menu=Tools
<as of May 2004>
[McCabe 1976] McCabe T.J., “A Complexity Measure”, IEEE
Trans. on Softw. Eng. 2(4), 1976, pp 308 – 320
[McCabe & Butler 1989] McCabe T.J. and Butler C.W.;
"Design
Complexity
Measurement
and
Testing."
Communications of the ACM 32, 12, 1989, pp 1415 – 1425
[Parnas 1994] Parnas D.L.; “Software Aging”, Proc. ICSE 16,
May 16-21, 1994, Sorrento, Italy, pp 279 – 287
[refactoring] Refactoring, a web-site with links to sources in
the topic, at http://www.refactoring.com/ <as of Sept 2004>
Figure 10 - Evolution of the McCabe
index for the
of the
12 most
complex functions
in the Arla system. Note that only in one
Evolution
of subset
first 12
most
complex
functions
case (Function-7) the McCabe index at latest available stage is lower than the first available release
140
130
120
110
Funct-1
Funct-2
Funct-3
Funct-4
Funct-5
Funct-7
Funct-9
Funct-11
Funct-12
90
80
70
60
50
40
30
20
10
releases
62
59
56
53
50
47
44
41
38
35
32
29
26
23
20
17
14
9
11
5
7
0
1
3
McCabe index
100
Figure 11 - Evolution of the McCabe index for the functions in the Arla system ranked 13 th to 24th from higher to lower complexity.
Evolution of second half of 24 most complex functions
40
35
Funct-13
Funct-14
Funct-15
Funct-19
Funct-20
Funct-21
Funct-22
Funct-23
Funct-24
25
20
15
10
5
releases
57
60
51
54
48
45
42
39
36
33
30
27
24
21
18
15
12
0
1
3
5
7
9
McCabe index
30
Download