Revista IEEE Sofware

advertisement
IEEE Software
a) Title:
ESTIMATING SOFTWARE DEVELOPMENT PROJECTS: A NEW APPROACH
b) Category: Papers
c) Authors and affiliations:
Isabel Ramos
Researcher
José Cristóbal Riquelme
Researcher
Dpto. de Lenguajes y Sistemas Informáticos.
Universidad de Sevilla.
Dpto. de Lenguajes
Universidad de Sevilla.
Facultad de Informática y Estadística.
Avda. Reina Mercedes, s/n.
41012 - Sevilla (Spain).
Facultad de Informática y Estadística.
Avda. Reina Mercedes, s/n.
41012 – Sevilla (Spain).
Phone number:+34 954552776
Fax number: +34 954557139
Phone number: +34 954552775
Fax number: +34 954557139
e-mail: isabel.ramos@lsi.us.es
y
Sistemas
e-mail: riquelme@lsi.us.es
Informáticos.
ESTIMATING SOFTWARE DEVELOPMENT PROJECTS: A NEW APPROACH
Isabel Ramos
José Cristóbal Riquelme
Dpto. de Lenguajes y Sistemas Informáticos Dpto. de Lenguajes y Sistemas Informáticos
Universidad de Sevilla
Universidad de Sevilla
e-mail: isabel.ramos@lsi.us.es
e-mail: riquelme@lsi.us.es
ABSTRACT
The actual simulation environments and the dynamic models for software development projects
(henceforth SDP) are making feasible the creation of the denominated SDP simulators. The main
advantage of these simulation tools is the possibility of answering without cost question as: How
will the project evolve if ...? before the project execution or how would the project evolve if ...?
when the project has finished. In this paper we present a part of the results obtained by
combining, on one hand, the use of a tool that learns producing rules, and additionally a dynamic
model of SDP. Thus allows us to obtain automatically management rules applicable to a SDP for
estimating good results with the variables that the project manager desires.
1. INTRODUCTION
Between the jobs that SPDs manager must perform are the activities of planning, monitoring and
development control. For this, the managers basically have available their own mental models
based on the accumulated experience in similar projects and they lack formal models and tools
that make possible to improve the accuracy of the decision to taking.
Recently, a new tool called PDS simulator has been established. These simulators make
feasible to the project managers to experiment with several management policies without cost,
and in this way to obtain the possible decision more right. A dynamic model constitutes the
essential core of these simulators. This model is obtained from the observation of the variables
that define the real project state and the relations that guide its time evolution. Present simulators
environments (Stella, Vensim, iThink, Powersim, etc.) make feasible enlarge the model utilities
of these models and they facilitate the simulators construction.
The simulation of a dynamic model for a SDP make feasible, before beginning the
development (or when the project has finished), to know what is the impact: a) that a change of
technology would have on the project [Chichakly 93], b) of the application of different
management policies and c) of a change of the maturity level of the development organization.
Until now, for using a project simulator the manager must know: the initial estimations,
the project and development environment constraints, and the management policies that he will
apply. From these data the simulator provide the final results of the project (time, cost, etc.). If
these results are not the expected, the previous process is repeated modifying the applied
management policies. The process continues until that the project manager obtains the desired
results.
In this paper, we propose a new approach to estimating the final results of a project: to
obtain automatically management rules1 for the SDPs. The knowledge of these management rules
can be obtained before the beginning the project's execution (or when the project have finished.)
and it will permit us to obtain good results for the variables that the project manager desires. That
is, now the manager must know: the initial estimations, the project and development environment
constraints, and the results that he wishes to get. Our tool suggest to him/her the management
policies that he/she must apply (or would must have applied if he/she is doing a post-mortem
analysis).
In order to obtain automatically the management rules, we have combined the advantages
that a system that learns based on rules presents and the information that a dynamic model for
SDPs provides. In the following sections, we first present a brief introduction to the concept of
machine learning and the tool used for this purpose; later, we present the information that is
1
We call management rule to a set of management policies (decisions) that to take the manager for carrying out the
project final goals.
given to the dynamic system for SDPs and how we have composed the two techniques. Finally,
we apply these methods to a specific SDP for carrying out a post-mortem analysis.
2. MACHINE LEARNING
The computational techniques and tools designed to support the extraction of useful knowledge
from databases are traditionally named machine learning. More recently the names of data
mining or Knowledge Discovery in Databases are used (KDD). In general, the previous
techniques try to extract, in an automatic way, information useful for decision support or
exploration and understanding the phenomena that is the data source.
A standard KDD process is constituted by several steps [Fayyad 96] such as data
preparation, data selection, data cleaning, data mining and proper interpretation of the results.
Therefore, data mining can be considered a particular step that consists in the application of
specific algorithms for extracting patterns from data. A wide variety and number of data mining
algorithms are described in the literature from the fields of statistics, pattern recognition,
machine learning and databases. Most data mining algorithms can be viewed as compositions of
three basic techniques and principles: the model (classification, regression, clustering, linear
function, etc.), the preference criterion, usually some form of goodness-of-fit function of the
model to the data and search algorithm (genetic, greedy, gradient descent, etc.).
Thereby, the choice of a method of data mining depends on the model representation that
we need. Given that our goal is to find rules to describe the behavior of a SDP, our election has
been to work with decision trees. A decision tree is a classifier with the structure of a tree, where
each node is a leaf indicating a class, or an internal decision node that specifies some test to be
carried out on a single attribute value, and one branch and subtree for each possible outcome of
the test. The main advantages of decision trees are their utility for finding structure in highdimensional spaces and the conversion to rules easily meaningful for humans is immediate.
However, classification trees with univariate threshold decision boundaries, which may not be
suitable for problems, where the true decision boundaries are non-linear multivariate functions.
The decision tree algorithm more spread is C4.5 [Quinlan 93]. Basically, C4.5 consists in
a recursive algorithm with divide and conquer technique that optimizes the tree construction on
basis to gain information criterion. The program output is a graphic representation of the found
tree (figure 1), a confusion matrix from classification results and an estimated error rate.
If Condition 1 is V1: Class C1
If Condition 1 is V2:
If Condition 2 is V3: Class C2
Condition 2 is V4: Class C2
Condition 2 is V5: Class C1
Figure 1: Example of type of not binary tree and set of rules equivalent.
The rules of the Figure 1 also can be translated as:
If Condition 1 is V1 or Condition 1 is V2 and Condition 2 is V5 then Class 1.
If Condition 1 is V2 and Condition 2 is V3 or V4 then Class 2.
C4.5 is very easy to set up and run it only needs a declaration for the types and range of
attributes in a separate file of data and it is executed with UNIX commands with very few
parameters. The main disadvantage is that the regions obtained in continuous spaces are
hyperrectangles due to the test of the internal nodes are the forms: pi  L or pi  U. However,
given our purpose in this work the results supplied by the C4.5 are perfectly valid.
3. THE DYNAMIC MODEL AND THE C4.5
To obtain the database that is the entry of the C4.5, we have used a dynamic model for SDP
proposed in [Ramos, 98], denominated Reduced Dynamic Model (RDM)2, and implemented in
the environment simulation Vensim®. The variables that allow to know the basic behaviour of a
dynamic system are defined through differential equations. Furthermore, the model possesses a
set of parameters that permit us to study different behaviours. These are provided by the
management policies that can be applied in the SDPs, both related to the environment of the
project (initial estimations, complexity of the software, etc) and the related to the development
organization (personnel management, effort assignment, etc.) and its maturity level (like the
average delays through the realization of the activities of detection and correction of errors).
Table 1 shows the different groups of parameters classified according to their function3.
PROJECT
ENVIRONMENT
ORGANIZATION
ENVIRONMENT
Initial Estimation
Project Complexity
Effort assignment
Management Policies Personnel management
Delivery time
Average delays
Maturity Degree
Nominal values
Others
Table 1: Classification of the dynamic model for SDP parameters.
The values of the parameters can be chosen randomly in an interval defined by the user
(for example, the technical personnel average dedication can vary between 20% and 100%
depending on the uncertainty level that the user have). Later, the model is simulated and a record
for the database is generated with the values of the parameters and the finals values obtained for
the desired system variables (time, cost, number of errors, etc.). From this generated database, the
2
This model makes feasible to know the evolution of a SPD in early stages of the project that is when the
information available is limited.
3
The number of parameter that have a dynamic model vary from a model to other. For example, the model of [AdelHamid, 91] have around 64, however RDM have about 32.
C4.5 learns examining the supplied data and proposing a set of rules for the decision-making
(See figure 2).



Initial Estimations
Project and Organization
Environment
Project Objectives
Project Simulation
D. B.
Machine
learning
Management
rules
Figure 2: Steps to follow for gathering management rules from a dynamic model.
4. OBTAINING OF MANAGEMENT RULES
In the following sections we use the data of a real SDP proposed in [Abdel-Hamid 91] which we
will call PROJECT. This is a well-known project and amply validated by the authors. In section
4.1 we show some of the parameters more significant of the PROJECT environment and
organization. In section 4.2 we indicate the initial estimations and final values obtained by the
cost and delivery time. In section 4.3, management rules have been obtained automatically that
permit us to accomplish a post-mortem analysis of the PROJECT. That is to say, answer the next
question: how we could have improved the final results of this project? In other way, what would
management policies must have applied for improving simultaneously the PROJECT cost and
time.
4.1. Project and organization environment
From among the parameters that define the development environment, so much for the project as
for the organization and the maturity degree of the development organization, we have collected,
by considering them representative of each one of the blocks3 of Table 1, those which appear in
Table 2. Indicated in this table are, for each parameter, the name that it has in the Reduced
Dynamical Model, the interval values that it can take, a brief description of its meaning and the
units of measurement. It is considered, for the specific SDP that we are going to analyze, that the
rest of the parameters [Abdel-Hamid, 91] are not going to vary.
NAME
DEDIC
RESQA
READE
RECON
PORTE
TECCO
RENOT
ESFPR
RETRA
POFOR
INTAM
INTERVAL
DESCRIPTION (UNITS)
(20 - 100)
Average dedication of the technical personnel (%).
(5 - 15)
Average delay in the development of Quality activities (days).
(20 - 120)
Average delay in the appropriateness of the new technical personnel in the
project (days)
(1 - 40)
Average delay in accomplishing the contracting of technical personnel
(days).
(30 - 100)
Percentage of technicians at the beginning of the project in relation to the
estimated average value (%).
(1 - 4)
Technicians to contract for each experienced full time technician
(technicians)
(5 - 15)
Average delay in notifying the real state of the project (days).
(0,1 - 0,25) Nominal effort necessary in the Tests stage by error (technicians-day).
(1 - 15)
Average delay in the transferring of technical personnel that exceed to other
projects (days).
(10 - 40)
Average percentage of the experienced technicians’ dedication to training
(%).
(0 - 0,5)
Initial underestimation of the project's size in source code lines (ldc).
Table 2: Representative parameters of the project's environment and of the organization's environment.
The variables that have been studied in the next section are the cost and the delivery time of the
PROJECT.
4.2 Initial estimations and project goals
The initially estimated values for the PROJECT delivery time and cost was 320 days and 1111
technicians-day (t-d), however the real values were 387 days and 2092 technicians-days [AbdelHamid 91]. Therefore, the final values obtained exceed the initial estimations about 20% and 50%
respectively. Next, we will define the values that we want obtain for the project time and cost.
These values will be denominated GOOD.
Delivery time (days):
Any value of the time between 320 and 387 days will be labeled as GOOD because it is inferior
to the obtained real results. All values greater than 387 days will be labeled as BAD by
surpassing the obtained real results.

Cost (technician - days):
Any value between 1111 y 2092 technicians-day will be labeled as GOOD because it is inferior
to the obtained real results. All values greater than 2092 technicians-day will be labeled as BAD
by surpassing the obtained real results. Therefore, before this final we would have to ask
ourselves: Do management rules exist that might have improved the final results? In the
following section we answer this question.
4.3. Management rules obtained
In fact, we want to know: What values should the parameters have taken to improve the obtained
real results? And a second question that the actual development organization must answer is: Are
these values be easy to modify?. Below, the management rules obtained for PROJECT applying
RDM and C4.5 are shown in Table 3.
READE <=27;
RENOT < = 12; INTAM > 0,40, DEDIC > 0,6
RENOT > 12; RETRA > 10
READE > 27;
RETRA < = 14;
INTAM < = 0,47; POFOR <= 0,13; ESFPR > 0,22
INTAM > 0,47; POFOR <= 0,18
RETRA > 14; DEDIC > 0,82
(rule 1)
(rule 2)
(rule 3)
(rule 4)
(rule 5)
Table 3: Management rules to estimate GOOD results, simultaneously, for the delivery time
and the cost.
To comply with the goals of the time and cost proposed, five management rules have been
obtained (Table 3). A general reading of the obtained rules indicate us: Which are the most
important parameters for obtaining the wished values for the time and cost simultaneously, and
what is the range of values for those parameters?
Particularly, the management rules (1) and (2) indicate to us that the final results achieved for
the delivery time and the cost could have been improved either if (rule 1):
"The integration of the new personnel in the project (READE) might have been lesser than or
equal to 27 days and the notification of the progress of the project (RENOT) might have been
lesser than or equal to 12 days and the initial underestimation of the size of the product in
source code lines (INTAM) might have been greater than 40 % and the dedication of the
technical personnel in the project (DEDIC) might have been greater than 60 %".
Or if (rule 2):
"The integration of the new personnel in the project (READE) might have been lesser than or
equal to 27 days and the notification of the project's progress (RENOT) might have been
greater than 12 days and the transfer of the technical personnel to other projects (RETRA)
might have been greater than 10 days".
In figure 3, we can verify what would have been the evolution of time and effort by the
application of management rule 2. This rule would have improved simultaneously the final results
obtained for the time and effort in a 5% and a 2% respectively. For thus, we would must raise the
parameters RENOT (>12 days) and RETRA (>10 days) whose initial values were 10 days. While
the parameter READE (<=27 days) would not have been necessary modify because initially was
20 days [Abdel-Hamid, 91].
400
3,000
days
t-d
325
2,000
days
t-d
250
1,000
days
t-d
0
50
100
150
200
Days
Delivery time (rule 2)
Cost (rule 2)
250
300
350
days
t-d
Figure 3: Time and cost evolution when rule 2 is applied.
Therefore, based on the previous management rules, we can answer the first of the
questions that we previously mentioned. The answer is yes, PROJECT's final results could have
been improved and the values of the parameters appear in the management rules of Table 3. The
second question can only be answered by the project director and the others managers of the
development organization. Once the management rules have been obtained, the manager of the
project is who decides which rule or rules are the easiest to apply, in function of the specific
project and of the software organization. In any case, he/she knows that if the parameters don't
take the values of the rules, the optimization of the variable or groups of variables of his interest
are not guaranteed.
In view of the results obtained and of the complexity that have the management and
control of a SDP, we propose at least two basic criteria in the election of management rules: first,
to choose rules whose parameters are easy to control and to modify and, in second place, if it is
possible, to choose rules that have a small number of parameters.
5. CONCLUSIONS AND FUTURE WORKS
The obtaining of management rules for SDPs can be applied before beginning the execution of a
project to define the management policies more adequate for the project that is going to be
accomplished. It can also be used in projects already ended to accomplish a post-mortem
analysis. These rules can be applied in order to:
 Obtain values that can be considered good (acceptable or bad) for any variable that we
are interested in analyzing, either in an independent way or simultaneously with other
variables.
 Analyze which are the parameters involved in the definition of management policies
and the level of maturity of the organization and which are easy to modify.
 Study which of the previously mentioned parameters have more influence in obtaining
good results.
In fact, we can say that it is possible to obtain automatically management rules for a SDP
and to recognize what are the management policies that guarantee the attainment of its goals.
In light of the potential that the obtaining of management rules presents from a dynamic model,
our future projects are guided in the application of fuzzy logic techniques and in the creation of a
simulator for SDP that can generate management rules in a multiproject environment.
6. REFERENCES
[Abdel-Hamid, 91] Abdel-Hamid, T.; Madnick, S.: “Software Project Dynamics: an integrated
approach”, Prentice-Hall, 1991.
[Chichacky, 93]
Chichacly, K. J.: “The bifocal vantage point: managing software projects
from a Systems Thinking Perspective”. American Programmer, pp.: 18 - 25.
May, 1993.
[Fayyad, 96]
Fayyad, U.; Piatetsky-Shapiro, G.; Smyth P.: “The KDD Process for
Extracting Useful Knowledge from Volumes of Data”. Communications of
the ACM. Vol. 39, Nº 11, pp.: 27-34. November, 1996.
[Quinlan, 93]
Quinlan, J.: “C4.5: Programs for Machine Learning”, Morgan Kaufmann
Pub. Inc., 1993.
[Ramos, 98]
Ramos, I.; Ruiz, M.: “A Reduced Dynamic Model to Make Estimations in
the Initial Stages of a Software Development Project”. INSPIRE III. Process
Improvement through Training and Education. Edited by C. Hawkings, M.
Ross, G. Staples, J. B. Thompson. Pp.: 172 – 185, September 1998.
Download