Document 11073343

advertisement
Q/
T?
5
[4
AUG 181988
3^
SCALE ECONOMIES
IN
NEW SOFTWARE DEVELOPMENT
Banker
Kemerer
Rajiv D.
Chris
F.
February 1988
CISRWPNo. 167
WP No. 1995-88
Sloan
Center for Information Systems Research
Massachusetts
Institute of Technology
Sloan School of Management
77 Massachusetts Avenue
Cambridge, Massachusetts, 02139
'
SCALE ECONOMIES
IN
NEW SOFTWARE DEVELOPMENT
Banker
Kemerer
Rajiv D.
Chris
F.
February 1988
CISRWPNo. 167
Sloan WP No. 1995-88
•1988 R.D. Banker,
C.F.
Kemerer
Center for Information Systems Research
Sloan School of Management
Massachusetts Institute oflechnology
M.I.I,
^RARIB
AUG 1
REC-T
u'c
(
SCALE ECONOMIES
IN
NEW SOFTWARE DEVELOPMENT
Rajiv D. Banker
Professor of Management
Carnegie Mellon University
Chris
F.
Kemerer
Assistant Professor of Management Science
Massachusetts Institute of Technology
ABSTRACT
In
this
research we reconcile two opposing views regarding the
presence of economies or diseconomies of scale in new software
development
Our general approach hypothesizes a production function
model of software development that allows for both increasing and
decreasing returns to scale, and argues that local scale economies or
diseconomies depend upon the size of projects.
Using eight different
data sets, including several reported in previous research on the subject.
we provide empirical evidence in support of our hypothesis. Through
use of the nonparametric DBA technique we also show how to identify
the most productive scale size that may vary across organizations. These
results are extended to include the effects of nonparametric scalerelated factors, such as project duration and the number of new staff on
the project team.
first author was
Mellon University.
The
supported
in
part
by
NSF
grant
SES-8709044
at
Carnegie
1
RESEARCH PROBLEM
1.
Software development practitioners are faced with the problem of how
new
software
appropriately
size
productivity
Unfortunately,
development
much
research
the
of
so
projects
in
as
area
this
to
maximize
to
has
arrived
at
apparently contradictory conclusions, namely that either economies of scale exist or
that
diseconomies
contradictory
of
results
the existence of
a
in
local
most
the
identifying
environment,
paper
these
integrates
apparently
framework, and empirically demonstrates
economies or diseconomies depends upon the
projects.'
scale
effects
the
we
addition,
In
productive
show
and
This
consistent
scale
development
software
exist
scale
for
size
of
some
provide
given
a
other
scale
that
size of
methodology
a
software
for
development
related
on
variables
productivity.
A
volume
production process exhibits
level,
(local)
increasing returns to scale
when average
Economies
increasing,
and scale diseconomies prevail when average productivity
provided
specialization of
of
to
scale
explain
labor to
researchers such as
and
the
thus
present
presence
of
that
like
may contribute
online
the organizational
learning costs,
area
productivity.
(e.g.,
assembly
Finally,
ail
management overhead.
language
projects
This
is
scale
initial
may proscribe
may
coding)
a
decreasing.
from
range
number of factors
(e.g.,
These tools
investment, both
their
increase
certain
overhead
is
Software engineering
debuggers or code generators.
require
type of
productivity
to economies of scale, such as
Larger projects may also benefit from specialized personnel,
certain
of
the presence of a
increase productivity, but the relatively large
in
economies
as learning curves.
Boehm [16] have noted
software development tools
may
are
phenomena such
new software development
in
a given
at
the marginal returns of an additional unit of input exceed the average
returns.
Reasons
if,
fixed
status
in
purchase
use on small projects.
whose
the
expertise
project's
investment
in
in
a
overall
project
meetings and reports)
1
production economics, economies of scale larel defined at specific volume levels in a production process, and
It is therefore inappropriate to limit the characterization of a production process to
only global
economies (or diseconomies! of scale.
In
dealing with single input-single output pfoduction
correspondences, we shall use the terms increasing returns to scale and scale economies interchangeably.
In
are thus best described as local.
does not increase
economies of scale for
In
with
directly
project
and therefore can be a source of
size
larger projects.
many authors have pointed
contrast to the reasons cited above,
of diseconomies of scale on large software projects.
possibility
out the
Brooks [17] has
suggested that the number of communication paths between project team members
increases
an
(at
increasing
communication overhead
is
with
rate)
clear
a
number
the
case of
Somewhat
al.
[21]
suggest
that
systems
larger
development
increasing
conflicts
number of people
the
among team members.
also
Jones [25] notes
such as planning and documentation, grow
Another
increases.
which
is
likely
to
source
possible
increases
be larger on
a
analogously, Conte
projects
the
that
will
face
more
Boehm [16]
points out
chances
personality
complex interface problems between system components.
that
This
and hence a
cost increase,
nonlinear
factor that could contribute to diseconomies of scale.
et
members.^
team
of
for
many overhead
activities,
at a faster than linear rate as project size
of
larger
diseconomies
of
project and
may
scale
is
project
contribute
to
slack,
reduced
productivity.
how
Given these contradictory hypotheses,
software
development
production
development managers appropriately
process?
size
maximize average productivity?
to
organized as follows.
can researchers best model the
how
And,
can
practicing
new software development
This
organizations,
We
the
increasing returns
We
integrate
scale,
two
these
but decreasing returns
empirical applications even the
The number
of
peths required
)S
n(n-1)/2,
where
n
more
is
the
set
in
in
suggest
development production process
believe that one reason that this has not
in
and
notions
for
first
new software
in
most
exhibits
(local)
that
very large projects.
been shown by other researchers
due to the simple parametric models employed.
that
is
Section 2 presents the empirical evidence for both the notion
software
to
projects so as
paper addresses these questions and
of economies of scale and the notion of diseconomies of scale
development
software
We
flexible
number
show
in
Section 3, however,
parametric forms are limited
of project
is
team members.
in
their ability to estimate the returns to
Envelopment Analysis as an
Data
most productive
the
identify
alternative
scale
2.
Section 4 of
nonparametnc modeling technique to
Finally,
Section
in
in
Section 5 presents the results of further
size
analysis of other scale-related variables
further research are presented
This motivates our use
scale.
the conclusions and suggestions for
6.
EMPIRICAL EVIDENCE
A number
of researchers have collected empirical data that support increasing
The general approach of these researchers has been to
returns to scale theories.
estimate a function of the form:
y =
aix)"
where /
of
the
amount of
the
is
project,
Function
Points
input, typically
measured
typically
This
(FP).
in
function
is
professional work-hours, and x
terms of
source
lines
estimated by taking the
of
the size
is
code (SLOC) or
of both
logarithms
sides and then estimating the resulting linear model using regression techniques.
(1)
= a + bln(x)
ln(y)
An estimated exponent
exponent greater than
value,
1
returns to scale measure
less than
b,
1
indicates
indicates diseconomies of
economies of
scale.
while an
scale,
This follows because the
is
X dy
y dx
That
is,
(xly)
if
marginal productivity (dxidyj
b
greater than (less than) average productivity
less than (greater than) one.
is
One of
the earliest pieces of research to estimate this function
of Walston and Felix [31].
indicate
Vessey [30]
mild
was
the
work
They collected data on 60 projects within IBM's Federal
Systems Division and estimated
would
is
increasing
a
function with an exponent of .91,
returns
to
scale.
Jeffery
a result that
and Lawrence [24]
and
have also reported economies of scale on small projects, although
they have not published their data
Using
We
have extended
the
1978-80
data
this
analysis
from
a
to
a
number of other published
Yourdon [22]
survey
of
17
data
projects
sets.
from
a
we
variety
of firms,
scale.
Two
Bailey's
study [3]
estimated an exponent of .72, indicating increasing returns to
other data sets that display exponents of approximately .95 are from
study [14]
25
of
19 NASA/Goddard Space
of
projects
[26] Function Point data from
from
number
a
Kemerers
Society.''
yield
summary, the evidence for economies of scale
In
sources
of
Assurance
Life
commercial data processing consulting firm
a
an estimated exponent of .85.
comes
Equitable
at
Center projects and Behren's
Flight
representing
wide
a
of
variety
application
environments.
However, a number of researchers have provided empirical support for the
set
an exponent of
exhibits
produce
identical
Lehman
are Belady and
Wingfield's [32]
exponents of
s
15 project
economies of
Table
five
scale.
their
Note
average
of Table
list)
tend
1
is
to
U.S.
in
Function Points.
in
Army
Two
1.49
data
for
data sets
showing mild decreasing returns to scale
1.06,
data set
a large
software house and
Therefore, the empirical evidence
new software development
is
at least as
compelling as
scale.
summarizes the
1
exhibiting
estimated a high exponent of
[15] 33 project data set from
for diseconomies of scale
that for
We
.
24 projects from IBM measured
Albrecht's [2]
that
1.11
COCOMO
Boehm's [16] 63 project
notion of diseconomies of scale as well.
ioglinear
model
of the nine data sets, with
analysis
increasing returns to scale and four exhibiting decreasing returns to
that the
size."
data sets
One
in
Table
1
are
listed
interesting result available
in
the approximate
from even
order of
a cursory examination
that the data sets with smaller average projects (the first four
show economies
projects tend to
largest projects
of
scale,
show diseconomies
on the
list,
set (also referred to as the
is
while
of scale.
the
The
data
last
sets
with
larger
data set)
is
described
average
data set, which contains the
the only exception to this simple analysis.
ABC
on the
in
Table
This data
2.
Behren's data set is not reported directly irt his paper. However, a scatter graph is provided, which was enlarged
end the data directly extrapolated. Note that the graph contains only 22 of the 25 reported data-points.
Behren's data is only provided in terms of Function Points.
Behren's data is estimated to be approximately 10. 8K.
Using Albrecht's linear model, the average SLOC for
The
returns
about
theories
conflicting
described
Section
in
different
data
scale
to
of
Table
in
thus
1
indicate
economies
scale
the
that
diseconomies
or
matched by conflicting empirical evidence obtained for
We
sets
presence
the
are
1
reported
results
reconcile
apparent
this
by
contradiction
offering
the
hypothesis that for most software development "production processes" there exist
returns
increasing
That
large projects
than
smaller
is
projects
projects
smaller
average productivity
is,
"most productive
the
are
that
for
scale
to
The
larger.^
is
scale
and decreasing
for
very
increasing as long as the project size
size"
and
(MPSS),
MPSS may
actual
returns
be
decreasing
is
different
for
for
different
organizational settings.
The reasons for our above hypothesis stem from the conflicting arguments
presented
earlier
diseconomies of
in
Section
makes
it
likely
is
is
(MPSS)
average
decreasing returns to scale
the
In
returns
productivity
being
productivity,
next
two
framework of
sections
less
and
as
initially
the
in
fixed
project size generally
larger
manage, and the marginal productivity of the project team
Increasing
marginal
economies
both
of
increases
But the
continue
remains less than marginal productivity.
productivity
size
difficult to
decline.
presence
average productivity
spread over a larger project
more
to
the
Since most projects require a significant fixed investment
scale.
project management overhead,
overhead
for
1
equals
greater
prevail.
we
restrictive
than
average
marginal
to
prevail
as
long
as
average
At the most productive scale
and
productivity,
productivity,
This intuitive argument
is
beyond MPSS
is
declining
depicted
in
and
Figure
1.
reexamine eight* of the nine data sets within the
estimation models to provide empirical
support for
our hypothesis.
The
large,
or
if
MPSS
will
tend to differ across organizations.
If
the
fixed overhead
the marginal productivity does not decline rapidly, increasing returns
Banker |4| provides s rigorous definition and discussion of the concept of most productive scale si2e (MPSSI.
pursue this analysis further in Section 4.
e
Walston and Felix I311 report
their
estimated loglinear nnodel, but do not present the actual data.
is
will
We
MPSS
continue to prevail for larger projects and the
hand,
if
the fixed overhead
sharply, then the
3.
MPSS
relatively small
is
if
large.
On
the other
the marginal productivity declines
small and decreasing returns set
is
in
lower scale
at a
level.
PARAMETRIC PRODUCTION FUNCTION ANALYSIS
The problem with the simple
it
or
be
will
does not allow for the
loglinear
possibility
of
model of the previous research
some
returns for
increasing
projects and
The estimated returns to scale are determined by
decreasing for others.
parameter, the exponent
we
But
b.
require a
more
that
is
a
single
general model that allows for
average productivity increases as the fixed project overhead gets spread over larger
and larger projects, and after reaching the most productive scale size (MPSS),
it
allows for declining average productivity caused by negative factors affecting large
projects such as the proliferation of communication paths.
parametric approach
more
flexible
on the simple
based only
forms
parametric
that
have been employed
Such
other production environments.
loglinear
a
model
Rather than reject the
model,
in
that estimates
we
first
empirical
explore
research
MPSS would
also
in
be
of use to software development managers because they can then identify the scale
size
where average
One
productivity
method
possible
function for
is
for
maximized
in their
generalizing
new software development
the
organization.
restrictive
of previous research
logquadratic term as an independent variable.
We
loglinear
is
production
by simply adding
a
can thus estimate the following
translog function:^
12)
\n(HOURS) =
Letting
/ equal
scale measure p
is
d\r\
y
(3)
p =
X
+
fi^(\n(SIZE))
HOURS
+ /3^{\n{S/ZE)f
and x equal SIZE,
it
is
evident that the returns to
given by
=
d\r\
fi^
xdy
—
- =
fi+
ydx
2;?
(In
X)
Chnstensen, Jorgenson and Lau 20
note that the translog is a llexibl« functional form that provides
second-order approximation to an arbitrary, twice-continuously-dtfferentiabie production function.
1
1
a
local
Therefore,
X < exp
the
{(1 -/?
)/2/?
projects,
y5
<
>
/S,
then increasing returns to scale prevail for
and decreasing returns prevail for project sizes greater than
}
MPSS
estimated
estimated
the estimated
if
here
given
exp {(]-^
x' =
by
then average productivity
is
)/2/i
however,
If,
}.
the
estimated to be increasing for smaller
and decreasing for longer projects, contrary to our arguments presented
earlier.
Varian [29]
Hildenbrand [23],
that
such a
To provide evidence of robustness of
therefore
studies
For instance,
forms.
have argued
approach imposes considerable untested structure on the
parametric
production function.
empirical
and Banker and Maindiratta [9]
estimate
different
parametrically
their
specified
our present context an alternative specification
in
several
results,
functional
may be
the
following quadratic form:
HOURS
(4)
Again
is
=
letting
/?(,
+ ^^(SIZE) + p^[SIZE]^
/ equal
HOURS
and x equal SIZE, the returns to scale measure p
given by:
(5)
=
/>
xdy
^(/?,
ydx
p>
Therefore,
1
if
and only
X*
and
MPSS
decreasing
fi,x*2fi^x^
^^ + p^x +
y
are positive then the
x<
+ 2/?^x)
if
is
fi^x'^>
the estimated values of both
If
fi^.
given by x* = \/
returns
for
decreasing returns are exhibited for
x>
all
then increasing returns are exhibited for
J3^
fi^x^
x«».
0,
x>
and
0.
and
fi^
with increasing returns for
,
estimated
If
x>
all
B/ B
fi^
if
If
and
J3^<
estimated
/^g
>
fi
>0
and
then
/ff j
<
the estimated values of both
and ^^ are negative then decreasing returns correspond to small projects and
increasing returns correspond to large projects, contrary to our earlier hypothesis.
The empirical
(translog)
results
for
the
eight
available
and the quadratic models are presented
data
in
sets
Tables
for
the
logquadratic
3 and 4 respectively.
8
Two
empirical problems are encountered
observed
that these
in
practice.
several researchers have
First,
so-called flexible parametric functional forms frequently violate
reasonable regularity conditions, such as monotonicity, see for instance. Caves and
and
Christensen [18]
and
Barnett
Lee [13].
our
In
present
context,
the
logquadratic models estimated for the Bailey and Wingfield data sets exhibit dyldx <
(decreasing
requirement
labor
for
Similar violation of the monotonicity condition
models;
for
projects by the Bailey,
smaller
for
projects.
exhibited by the estimated quadratic
is
by the Albrecht and Kemerer data sets and for large
projects
small
project size)
increasing
COCOMO
and Belady data
The second empirical problem
sets.
more serious
is
for our objective of estimating
The pairs of independent
returns to scale for ne\N software development
variables
In (SIZE) and (In (SIZEjP. and (SIZE) and (SIZE)\ tend to be highly correlated.
The range of
pairwise
0.915 - 0.974 for (SIZE) and (SIZE)^ for the eight
(SIZE)j^ and
sets.
(In
data
available
This high level of collinearity implies that the confidence about interpreting the
estimates of the coefficients
to a change
in
be
to
and ^^ as the change
>5,
the independent variables
and the quadratic models.®
likely
was 0.967 - 0.999 for In (SIZE) and
correlations
unstable,
will
in
the dependent variable due
be very low for both the logquadratic
Consequently, the estimates of these coefficients are
see
for
and
Pindyck
instance
The
Rubinfeld [27].
usual
econometric methods, therefore, may not be appropriate for estimating the nature
of returns to scale or the most productive scale size for these eight data
The
high
importance to
loglinear
likely
the
collinearity
the
between
of
interpretation
models reported
in
Table
(SIZE)
In
the
results
of
and
(In
the
(SIZE))^
estimation
The estimated coefficient b
1.
of
in
sets.^
is
also
simple
the
this
of
case
is
to also pick up the effect of the omitted variable (In (SIZE))^, and therefore,
of
interpretation
b as
estimated
the
returns
to
scale
measure may not be
appropriate.
8
The standard errors
of
less likely to be significant
the
estimated coefficrents are likely to be larger, and the corresponding i-statistics are
the independent variables are highly correlated.
when
9
The variance of the estimates
covartance of the estimates of p
of
jj
the
and
returns
p
.
to
scale
or
MPSS measures depend
on
the
variance
and
the
4.
NONPARAMETRIC PRODUCTION FUNCTION ANALYSIS
Given these problems, and the limited a priori knowledge about the functional
form
form
parametric
or
theoretically
validate
these
impose
the
theory
with
from
deviations
individual
observations
process
and
Envelopment
estimation
formal
29].
[8,
Analysis
This
by
due
by
Production
economics
production
a
Banker,
function,
exhibited
to
use
production
to
and
Data
frontier
extended to
a
[5].'°
Cooper
and
Charnes,
in
of the
characteristics
Rhodes [19]
and
what
approach,
propose
approach
substantiate
apparent
inefficiencies
we
a
econometric
between
Therefore,
framework
for
to
differentiates
Cooper
Charnes,
economics
notion
nonparametric
a
the
in
specifying
to
immediately
[9, 23, 29].
frontier
a
inefficiencies.
(DEA),
developed
axioms
as
difficult
is
not
is
it
occurring
frontier
the
individual
production
DEA does
treated
need to employ
the
indicates
Also,
correspondence
production
development,
correspondence
statistically
hypotheses,
software
underlying
production
the
for
restrictions
on
process
production
the
of
not impose a parametric form on the production function and assumes
only that a monotonic and convex relationship exists between inputs and products."
More
following
the
formally,
production function f(x)
1.
2.
Monotonicity
:
If
Convexity
limited
assumptions
made about
are
the
frontier
:
./
=
y =
If
-
y'
fix],
f{x')
y'
fix),
and x ^
=
x',
then / ^
<
and
fix')
X
>/
<
1
then
(l-X)/* \y' ^ /[(1-X)x+ Xx']
3.
Envelopment
4.
Minimum Extrapolation
:
For each observation
:
If
a
Ar,
=
Ar
function
n,
1
gi-)
satisfies
convexity and envelopment conditions, then ^(x) ^
The
estimation
of
the
function
programming techniques, and estimates of
likelihood
and
consistent,
see
can
f(x)
f(x
Banker [12].
}
be
obtained
'^
y^
fix)
fix^)
the monotonicity,
for all x.
accomplished
in
this
using
linear
manner are maximum
The most productive scale
size
is
10
Recent developments MO) m stochastic data envelopment analysis simultaneously consider deviations from the
production frontier due to inefficiencies and also measurement errors.
Evidence on the comparative application
Strauss |6|.
of
the
DEA and
tranilog
models
is
provided by Banker, Conred and
10
estimated
following
the
via
programming model as
linear
in
Banker [4]
for
the
general case of multiple inputs and outputs.'^
(6)
min
Tj
^
subject to
n
n
k«1
,^.X^ >
(6.3)
where x
^
- output
= input
y.^^
j,
for observation
for observation
/,
n = number of observations
The
fyyA'
MPSS
'K,A'
MPSS
=
•
•
for
the
and
k,
(/r
=
1,
2,
=
...,/?).
mix
input-output
'y^J- ^^^ ^A
k,
'^A'
''^lA'
given
•
•
by
"^JA''
where
(Yj^^xJ
'S
computed
y^
=
as follows:
'^A
X
'.2.\
In
our present context,
correspondence,
production
considerably simplified.
i'
'A
-xJy.M
A' 'A
where
we
and
are interested only
in
problem
computational
the
The solution to the
M = max\x/y
kk'k'
I
k='\
linear
a single input-single output
program
n]
is
in
the
(6)
is
is
consequently
given by simply
maximum
observed
Alternativ« models for estimating MPSS when some inputs or outputs are fixed or uncontrollable, or when some
variables are measured on a categorical rather than on a continuous scale are described by Banker and Moray (71.
Banker and Maindirstta 181 discuss the estimation of other non-convex technologies.
11
average
project
the
by
given
across
productivity
If
the
observation,
say
M'
evidence of
local
observations.
x^
x^, and
observed
with
.
input-output
5
using
the
perspective,
MPSS correspond
MPSS
MPSS
the
is
largest
for
all
also attained for
pair
(x^.
some
then
y^J,
,
future
other
there
is
the
a
From a
projects.
both
of
identification
and
increasing
Projects larger (smaller) than the
sets.
decreasing (increasing) returns,
to
From
order to maximize
in
new software development
allows
also
it
provides a project size goal
5 shows
Table
respectively.
and the corresponding percentile value for the range of observed output
data for each of the eight data sets.
the
Mj
chosen by each researcher.
metric
size
decreasing returns within these empirical data
the
is
=
is
calculated for the eight available data sets, and the results are
average productivity of
research
productivity
(
size
represent MPSS.
practitioner's viewpoint, the
the
xjy^
which
scale
constant returns to scale, and the range of project sizes between
Table
in
for
say,
x,^,
maximum average
MPSS was
The
reported
all
size
The most productive
observations.
all
inter-quartile
range
for
the
In
of the eight cases, the
five
observed
output
thus
data,
MPSS
indicating
is
within
that
both
increasing and decreasing returns are clearly present since there exist both smaller
and
larger
projects
model
loglinear
than
may
the
be
MPSS
an
at
that
inadequate
site.
It
description
follows
therefore
many
of
new
that
the
software
development application environments.
5.
OTHER SCALE-RELATED FACTORS
In
size as
in
the
our analysis
in
the
earlier
sections,
we
have considered only the project
measured by SLOC or Function Points as the factors explaining the
professional
labor
requirements of
factors, such as the duration of a project,
assigned
to
a
project
team
methodology employed for our
important
link
may
earlier
also
different
explain
analysis
Other scale-related
projects.
or the number of
these
variations
new
staff
variations.
members
While
the
remains applicable and provides an
to prior research, the specific results must be interpreted cautiously
to the extent that such additional factors have not been included
in
the analysis,
in
12
we
section,
this
consider the explicit inclusion of other scale-related factors
estimation models, and
we
in
our
provide empirical evidence about the importance of these
factors.
In
addition to size as
measured by SLOC or Function
Points, project scale
often represented by the amount of calendar time taken by a project
duration,
while possibly
highly
correlated with a
SLOC
example, a project containing a given number of
a
relatively
longer period of time
might be
this
due to
situation
a
result
in
a
critical
Abdel-Hamid and Madnick
slack.
on projects
that
their
that are not
[ 1 ]
on the
suggest that
the project
"critical
necessitating
there
more
is
re- work.
Also,
on a longer project
slack
This
would
path" having a lot of
type of effect also occurs
this
is
likely
to increase the
hours expended on the project for a couple of reasons.
project
An example
have mis-estimated the amount of time required.
Increasing the duration on a project
long
of a number of tasks
dependence.
logical
some piece of
user-signoff on
number of other tasks
For
where one task was required to be completed
before the others could commence due to
might be getting a
be.
could be stretched out over
composed
For a project
A projects
measure, need not
size
is
chance
of
changes
Boehm [16] suggests
a "Parkinson's
number of yvork-
Over the course of a
in
user
requirements,
that
with
the
likely
Law" type ("work expanding to
fill
thus
additional
the time
available for its completion") effect occurs.
In
order to test the significance of the project's calendar time as a scale-
related explanatory variable, regression analysis
where
the calendar time (DURATION)
was
was performed on
available.'^
the six data sets
The regression models were
of the form:
(7)
HOURS
This
SIZE
=
/*(,
+ ^^{S/ZE) + fi^[SIZE? +
^pURATION)
model was chosen to represent the nonlinear aspects of
relationship.
The variables SIZE and
are
(SIZE)^
For the Bailey. YourOon, Belady, and Wingfield data sals, this data
is
highly
availabl*
in
the
correlated
Conte (211.
HOURS
as
:
noted
13
from
therefore
and
earlier
collinearity
problems.
which
to identify
analysis
variation
the
estimation
the
is
However,
this
statistically
other two,
in
the Yourdon data
any
adding
DURATION
see Table
seem
variable not
scale-related
with
the
at
DURATION
provided
that
shown
95%
is
a significant explanation of
Table
in
the
by
Of the
6.
confidence
SIZE
level
and
power.
(SIZE)^
data sets,
six
Of the
four.
in
highly correlated with SIZE (correlation
which provides a possible reason why
7)
explanatory
additional
Another
significant
suffer
will
does not affect the objective of our
this
analysis are
is
= .71,
corresponding parameters
whether DURATIOM provides
DURATION
coefficient
the
HOURS, over and above
in
The results of
variables
of
Only
in
the
Wingfield
is
it
not
does the
data
to add to the labor requirement
factor
affecting
communication
productivity
on
paths
suggested
is
project
Not
by
the
literature
dealing
additional
overhead be related to the number of paths, but also to the efficiency of
those paths
working as
a
project
the
This could be affected by
A
a group.
team,
more new
(8)
factor
HOURS
=
by
we
/Jq
the
number of new
Rubin [28]
is
only
could
experience the project team had
likely
as
a
to require
this
need not be the case.
for the
Kemerer data set [26].
staff,
currently available only
this additional
suggested
While a larger project
productivity variation.
assigned
NEWSTAFF,
variable,
been
has
how much
a
The
To
staff
possible
members on
source
of
more people and be
NEWSTAFF
identify
data
are
the impact of
estimate the following model:
+ fi^{SIZE] + fi^iS/ZE)^ +
fi
^(DURATION)
+ /3JN£WSTAFF)
The results for the ABC data
are:
Hours = 16042 - 93.93 (FP) + 0.06(FP)' + 1522 (DURATION) + 9. 06 (NEWSTAFF)
(1.43)
(-4.10)
(6.62)
(2.71)
The t-statistics are reported in parentheses.
R* =
.90
The correlation matrix for these
NEWSTAFF
DURATION
FP
variables
is.
(2.10)
14
-0.104
-0.165
-0.221
DURATION
FP
(FP)'
From these
0.338
0.203
results,
0.950
apparent that the number of
is
it
significant predictor of the variation
and above the
Duration.
scale-related variables
important to note that
is
It
other
simply the number of staff.
average staff
An
general model
is
(9)
HOURS
Rather
in
increasing
and
terms.
for
= nFP\
this
by 88.9%.
incremental
results
are
consistent
data set,
this
DEA,
of
extension
^^pURATION]
a
DEA model
this
specific
the
(Function
number of new
staff
this
is
members, not
only .02.
production
function
Banker [10].
see
and
Points)^
IMEWSTAFF and
the correlation of
The
parametric
function
^(FP)
is
functional
form
can be
following
(such
as
the
assumed only to be monotone
DURATION and NEWSTAFF represent
while
additive
linear
case are coefficients of 1018 for DURATION and 1163
sum of squared
residuals by
when we remove
Similarly,
NEWSTAFF from
the variable
residuals increases by
model
DURATION from
each
explanatory
power
to the model,
these
two
variables
provides
in
(9)
residuals
the model
109.9% and the sum of absolute
Thus,
of
the
123.1% and the sum of absolute
76.6%.
by
a
+ fi^(NEWSTAFF)
The deletion of the variable
sum of squared
increases
an
assuming
convex,
NEWSTAFF.
the
is
specified:
The results for
increases the
the
using
than
quadratic),
is
Function Points,
nonparametric estimation of
alternative
by
For
this
members
staff
for the Kemerer data set, over
of work-months to calendar months)
(the ratio
accomplished
work-months
in
new
in
(9),
residuals
substantial
see Banker and Morey [11].
These
not dependent upon pre-selecting a functional form but are generally
with
the
results
from
the
quadratic
model,
conclusions drawn from our earlier regression analysis.
thus
corroborating
the
15
6.
CONCLUDING REMARKS
In
this
we
research
have
reconciled
two
presence of
economies or diseconomies of scale
Our
approach
general
development
provides
allows
that
for
both
for
Through use of the DEA technique
productive scale
production
a
increasing
we
views
opposing
and
regarding
the
new software development
in
model
function
decreasing
have also shown
how
of
returns
software
to
scale.
to identify the
most
These results were extended to include the effects of other
size.
scale-related factors.
For the practitioner, our results contain a number of useful implications.
terms of project estimation,
loglinear
have
traditional
algorithmic models have suggested a simple
model with which to estimate eventual work-hours.
some
limited
applicability,
they
ignore
the
possibility
productivity by carefully selecting the scale of the project
scale as exogeneous,
as
most of these simple models
seek to identify the most productive scale size for
suggest that
this
MPSS
also
work would be
organizations' ability to successfully
assess
the
effects
do,
their
While these models
of
improving
project
Rather than taking the
managers could
organization.
actively
Our results
varies widely across different application environments, and
an interesting extension to this
some
In
manage
on productivity of
calendar duration and the number of
new
to identify factors that contribute to
larger
other
projects.
scale-related
Managers could
factors,
project team staff members.
such
as
16
Table
Summary of Loglinear Models
1:
t-statistic
DATA SET
Significant
n
at the
•"Significant at the
MEAN SLOC MEAN
5 percent
FP
level for a one-tailed test
10 percent
level for a one-tailed test
H„:b=l
17
Table
Project
Hours
2:
Kemerer [26] Data Set
Func Pts
KSLOC
Duration
Newstaff
18
Table
DATA SET
3:
Sutnmary of Logquadratic Models
19
Table
DATA SET
/?g
/?,
4:
Summary of Quadratic models
y5j
R^
MPSS'
20
Table
DATA SET
5:
Most Productive Scale Sizes Using DEA
MPSS"
Percentile
21
Table
DATA SET
6;
Duration Regression Results
/?3
DURATION
COEFFICIENT
t-STATISTlC
22
Table
7:
Duration Regression:
DATA SET
Correlation
CORRELATIONS
Hours
Bailey
.96
.46
.63
.55
.57
.95
Size
.63
Size
.71
.55
.74
.92
.58
.33
.64
.53
.63
.95
.46
.31
Duration
.82
.85
.40
.97
.43
.42
Size
.74
(Size)'
.86
.24
.95
.34
.20
Size
(Size)'
Duration
Size
(Size)'
Kemerer
.71
(Size)'
Duration
Wingfieid
.38
.91
.79
.60
Duration
Belady
(Size)
Size
(Size)^
COCOMO
Size
(Size)^
Duration
Yourdon
Between Independent Variables
Duration
23
References
[
1
]
[2]
Abdel-Hamid. Tarek and Stuart Madnick
Impact of Schedule Estimation on Software Project Behavior.
IEEE Software 3.70-75, July, 1986.
Albrecht, Allan J. and John Gaffney, Jr.
Software Function, Source Lines of Code, and Development Effort
Prediction A Software Science Validation.
IEEE Transactions on Software Engineering SE-9(6);639-648, November,
1983.
[3]
John W. and Victor R. Basiii
for Software Development Resource Expenditures.
Proceedings of the 5th I nter national Conference on Software
Engineering, pages 107-116,.
1981.
Bailey,
A Meta-model
In
[4]
Banker, Rajiv.
Estimating Most Productive Scale Size Using Data Envelopment Analysis.
European Journal of Operations Research 17(1):35-44, July, 1984.
[5]
Banker,
R.
D.,
Some Models
A.
Charnes and W. W. Cooper.
for Estimating Technical and Scale Inefficiencies
Management Science
[6]
30(9);
D., R. F. Conrad and R. P. Strauss.
Comparative Application of DEA and Translog Methods;
Study of Hospital Production.
Management Science 32(11:30-44, January, 1986.
An
[7]
Banker, R. D. and R. C. Morey.
Efficiency Analysis for Exogenously Fixed Inputs and Outputs.
Operations Research 34(4);5 13-521, July-August, 1986.
[8]
Banker,
[9]
Banker,
Illustrative
R. D. and A. Maindiratta
Piecewise Loglinear Estimation of Efficient Production Surfaces.
Management Science 32(1); 126- 135, January, 1986.
R. D. and A. Maindiratta
Nonparametric Analysis of Technical and Allocative Efficiences
1987.
Forthcoming in Econometrica
in
Banker, Rajiv.
Stochastic Data Envelopment Analysis.
1987.
Carnegie Mellon University Working Paper, 1987.
[11]
Banker.
R.
D.
and
R.
C.
Evaluating Hypotheses
Morey.
in
Efficiency Analysis;
An
Application.
1987.
V^/orking Paper.
[12]
DEA.
Banker, R
A
[10]
in
1078- 1092, September, 1984.
Banker,
Rajiv.
Maximum
Likelihood, Consistency and Data Envelopment Analysis.
Carnegie Mellon University Working Paper, 1987.
Production.
24
[13]
W.
Barnett,
and
A.
Y.
W.
Lee.
The Global Properties of the Minflex Laurent, Generalized Leontief and
Translog Flexible Functional Forms.
142 1-1437, 1985.
—
Econometrica
[14]
:
Behrens, Charles A.
Measuring the Productivity of Computer Systems Development
Activities with
Function Points.
IEEE Transactions on Software Engineering SE-9(6);648-652, November,
1983.
[15]
Belady, L. A. and M. M. Lehman.
The Characteristics of Large Systems.
In
[16]
Wegner, P. (editor). Research Directions in Software Technology, pages
106-138. MIT Press, Cambridge, MA, 1979.
Boehm, Barry W.
Software Engineering Economics.
Englewood
Prentice-Hall,
[17]
Cliffs, NJ,
1981.
Brooks, Frederick P.
The Mythical Man-Month.
Addison-Wesley, Reading, Mass., 1975.
[18]
Caves, D. W. and L R. Christensen.
Global Properties of Flexible Functional Forms.
:422-432, 1980.
American Economic Review
—
[19]
W. W. Cooper and E. Rhodes.
Program and Managerial Efficiency: An Application of Data
Envelopment Analysis to Program Follow Through.
Charnes,
A.,
Evaluating
Management Science 27(61:668-697,
[20]
1981.
Christensen, L R., D. W. Jorgenson and L J. Lau
Transcendental Logarithmic Production Frontiers.
Review
[21]
June,
Conte,
of Economics
S.,
H.
and
Dunsmore and
Statistics
55:28-45, 1973.
V. Shen.
Software Engineering Metrics and Models.
Benjamin/Cummings, Reading, MA, 1986.
[22]
DeMarco, Tom.
Yourdon Project Survey: Final Report.
Technical Report, Yourdon,
[23]
Inc.,
September, 1981.
Hildenbrand, Werner.
Short-Run Production Functions based on Microdata.
49(51:1095-1 125, September, 1981.
£co/7o/T7er/-/ca
[24]
Jeffery, DR. and M. J. Lawrence.
Inter-Organisational Comparison of Programming Productivity.
In Proceedings of the 4th International Conference on Software
An
Engineering, pages 369-377.
[25]
Jones, Capers.
Programming
McGraw-Hill,
Productivity.
New
York,
1986.
1979.
[26]
Kemerer, Chris
F.
An Empirical Validation of Software Cost Estimation Models.
Communications of the ACM 30(5)416-429, May, 1987
[27]
Pindyck, R.S. and D.L Rubinfeld
Econometric Models and Economic Forecasts.
McGraw-Hill, New York, 1981.
[28]
Rubin,
Howard
A.
Macroestimation of Software Development Parameters: The Estimacs System.
In SOFTFAI R Conference on Software Development Tools, Tectiniques and
Alternatives. IEEE. 1983.
[29]
Varian. H.
The Nonparametric Approach to Production
Econometrica 52(3);579-597, 1984.
[30]
Vessey,
Analysis.
Iris.
On Program Development Effort and Productivity.
Information & Management 10:255-266, 1986.
[31]
Walston,
C.E.
and
C.P.
Felix.
A Method
IBM
[32]
of Programming Measurement and Estimation.
Systems Journal 16(1);54-73, 1977.
Wingfield,
C.
G.
USACSC
experience with SLIM.
Technical Report lAWAR 360-5, U.S. Army Institute for Research
Management Information and Computer Science, 1982.
in
Figure
1
Most Productive Scale Size (MPSS)
productivity
(size/hours)
^
average productivity
marginal productivity
MPSS
project scale size
hours
production
function
MPSS
project scale size
INIIIIimim?'"^^ DUPi
3
TOflO
2
DDSt,?MT?
b
6'iu-&^
Date Due
I
I
'.
'
1
1U
Jv
Lib-26-67
BARCODE
ON iMEXT
TO LAST
PAGE
Download