Software Maintenance

advertisement
Visualization and Analysis of
Open Source Software Evolution using
An Evolution Curve Method
Dr. Robertas Damaševičius
Software Engineering Department,
Kaunas University of Technology
Studentų 50-415, Kaunas, Lithuania
Email: robertas.damasevicius@ktu.lt
http://soften.ktu.lt/~damarobe
Context and Problem

Software systems are:



Software design is:



a social process embedded within organizational and cultural structures
influenced by social processes such as programmer collaboration in teams
Open source software systems:





designed, constructed and used by people
components in larger socio-technical systems
Free to use
Free availability of source code
Developed by many programmers
Continuously evolve
Aim: analysis of open source software evolution using metrics
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
2
What is software evolution?

Definition:


Activities:



a continuing process in time during which some essential
software properties are changed
modification, adaptation, maintenance, and
other activities which occur after the delivery of the first
operational release to the users
Importance:

costs devoted to system maintenance and evolution account for
more than 90% of total software costs (Erlikh, 1990)
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
3
Forces and factors of open source
software evolution

Evolution of open source systems:





less strict control and management model
usually started by a single developer (seed)
attracted users become co-developers
governed by the needs of users and spontaneous collaboration of
co-developers
Evolution mechanisms:



natural selection, competition
variation-increasing & variation-decreasing
influenced by psychological, intellectual, social and cultural,
economic and business factors
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
4
Software metrics

Common







Specific software evolution metrics





Source lines of code
Cyclomatic complexity
Halstead metrics
Number of classes and interfaces
R.C. Martin’s software package metrics
Cohesion, Coupling, …
SDI metric
L–metric
AICC metric
G-metric
Software development models




Statistical models
Rayleigh model
Halstead’s Software Science model
COCOMO model
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
5
Lehman’s “Laws of Software Evolution”

Formulated by M.M. Lehman in the 1980s









Law of Continuing Change
Law of Increasing Complexity
Law of Statistically Smooth Growth
Law of Organisational Stability
Law of Conservation of Familiarity
Law of Continuing Growth
Law of Declining Quality
Law of Feedback System
Evolution forces


Growth
Maintenance
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
6
Transition-based model of evolution
Software
characteristic
Gradual change
Sudden change
Transitions
time




Stages: many, often overlapping
Transitions: breakpoints between stages, which represent significant
changes. Transitions occur because as a system evolves, its structure must
be regularly adapted to the changing requirements and environment
Gradual change: a slow process of incremental change caused by
accumulating maintenance steps or gradual decay
Sudden change: significant changes in the evolving system or in the
process by which it is evolved
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
7
Information-theoretic methods

Shannon entropy


A measure of the uncertainty associated with a random variable.
The information source generates a series of symbols xi belonging to an
alphabet with size N according to a known probability distribution p(xi),
the entropy function H of a sequence X can be defined:
H X   
n
 px   log
i
2
p  xi 
i 1



High entropy: higher complexity of the system’s code
Low entropy: there are some repeated patterns of source code; code
maintenance is required
Kolmogorov Complexity


Measures the ‘complexity’ (i.e., information content) of an object by the
length of the smallest program that generates it.
Kolmogorov Complexity Kφ(x) of an object x in the description system φ
is the length of the shortest program capable of producing x:
K x   min { w :  w  x}
w
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
8
Evolution curve method (1)



1
2
Motivation: the addition of new features to a software
system leads to the change of basic software
characteristics (complexity/entropy) in the system.
Idea: use the change of software size and complexity as
a means to determine different stages of evolution of a
software system
Inspiration: Z-curve1 and DNA walk2 methods used in
analyzing complex genetic sequences
R. Zhang, C.T. Zhang. Z Curves, an Intuitive Tool for Visualizing and Analyzing DNA sequences. J. Biomol. Struc. Dynamics 11, 767–782, 1994.
S. Paxia, A. Rudra, Y. Zhou, B. Mishra. A Random Walk down the Genomes: DNA Evolution in VALIS. IEEE Computer 35(7):73-79, 2002.
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
9
Evolution curve method (2)



E-curve is composed of a series of nodes Ei  ( xi , yi ) , whose
coordinates are xi and yi (i = 1,2,...,N), where N is the number
of versions of the analyzed software system.
The nodes Ei are connected sequentially with straight segments.
The coordinates xi and yi are calculated iteratively:
 xi 1  1, if

xi   xi 1 ,
if
 x  1 if
 i 1


K i  K i 1
K i  K i 1
K i  K i 1
 yi 1  1, if

yi   yi 1 ,
if
 y  1 if
 i 1
H i  H i 1
H i  H i 1
H i  H i 1
K i is the Kolmogorov Complexity of the i-th version of a software
system;
H i is the Shannon entropy of the i-th version of a system
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
10
Evolution curve method (3)

Two dimensions of the Evolution curve



x (relative information content) and
y (relative complexity),
Represent two independent (orthogonal) characteristics of a
software system:


x-dimension: amount of information contained in a software
system and is an estimation of software size;
y-dimension: information entropy of a software system and is
an estimation of software complexity.
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
11
Software evolution stages




Software Growth: system
is actively developed
Software Maintenance:
system becomes simpler
often at a cost of its size
Software Improvement:
system becomes more
complex and generic
Software Shrink:
functionality of a system is
reduced
Complexity
GROWTH
MAINTENANCE
EVOLUTION
IMPROVEMENT
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
SHRINK
Size
12
Trends of Evolution curve


Actively developed systems: long upward trends of growth
Mature, stable systems: long downward trends of maintenance
Complexity
Complexity
Actively Developed
Systems
Mature
Systems
Size
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
Size
13
Case studies

Source: SourceForge

7-zip



Grip



Archiver
82 versions, 5 years, 160K LOC
CD player/ripper
36 versions, 14K LOC
eMule

P2P file sharing client
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
14
Case study: eMule

eMule:








one of the biggest P2P file sharing clients
coded in Microsoft Visual C++ using MFC
Free software, released under the GNU GPL
Source code first released at version 0.02 on July 6, 2002
Latest release contains 222,680 lines of code
Actively developed by 5 developers
Current development status is “Production/Stable”
For analysis, 68 versions of eMule source code were used
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
15
eMule: Entropy
Version 015a
Version 030a
Version 018a
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
16
eMule: Size
y = A + B∙x + C∙x2
A = 7676.17
B = 4324.67
C = 177.488
r = 0.9935
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
17
eMule’s Evolution curve
30e
47c
23b
44b
25b
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
18
What does the changelog say?
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
19
Conclusions

Software evolution process can be divided into 4 stages





software growth: the size and complexity of developed software is
increasing
software maintenance: the aim is to contain complexity and fix
software bugs
software improvement: the aim is to contain software system size at
a cost of increasing complexity
software shrink: both software size and its complexity is trimmed
Evolution curve method can:


identify software evolution stages
identify the initial development status of the analyzed software system:



actively developed systems show long growth trends
mature systems show maintenance and improvement trends
Is independent from software implementation language
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
20
Ongoing Research and Further Work

Analysis of other entropy measures such as block entropy
and Rényi entropies


Dynamic models of software evolution


paper submitted to Journal of Software Maintenance and
Evolution
Differential equations, etc.
More case studies

paper submitted to Computing and Information Systems Journal
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
21
Thank You.
Any Questions?
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
22
7-zip: Evolution curve
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
23
Grip: Evolution curve
Eighth International Baltic Conference on Databases and Information Systems
June 2-5, 2008, Tallinn, Estonia
24
Download