Three Essays on Modeling Dynamic Organizational ...

Three Essays on Modeling Dynamic Organizational Processes
by
Hazhir Rahmandad
Bachelor of Science, Industrial Engineering,
Sharif University of Technology (2000)
Submitted to the Alfred P. Sloan School of Management
in partial fulfillment of the requirements for the Degree of
Doctor of Philosophy in Management
at the
Massachusetts Institute of Technology
August 2005
C2005 Massachusetts Institute of Technology
All Rights Reserved
Signatureof Author.................................................................................
Department of Operations Management and System Dynamics, August 2005
Certified by ............
...................................
John D. Sterman
/
Certified by .............
Professor of Management
Thesis Supervisor
.............................................
Nelso.
N
Repenning
Professor of Management
.i~
a _aft
Accepted by ...
.............................................
MASSACHUSEUS INSTITE
OF TECHNOLOGY
SEP 0
Thesis Supervisor
2005
LIBRARIES
Birger Wernerfelt, Chair, Ph.D. Committee
Sloan School of Management
Three Essays on Modeling Dynamic Organizational Processes
by
Hazhir Rahmandad
Submitted to the Alfred P. Sloan School of Management
in partial fulfillment of the requirements for the Degree of
Doctor of Philosophy in Management
Abstract
Essay 1- Effects of Feedback Delay on Learning
Learning figures prominently in many theories of organizations. Understanding barriers to
learning is therefore central to understanding firms' performance. This essay investigates the
role of time delays between taking an action and observing the results in impeding learning.
These delays ubiquitous in real world settings can introduce important tradeoffs between the
long-term and the short-term performance. In this essay, four learning algorithms, with different
levels of complexity and rationality, are built and their performances in a simple resource
allocation task are analyzed. The study focuses on understanding the effect of time delays on
learning. Simulation analysis shows that regardless of the level of rationality of the organization,
misperceived delays can impede learning significantly.
Essay 2- Heterogeneity and Network Structure in the Dynamics of Diffusion: Comparing
Agent-Based and Differential Equation Models
When is it better to use agent-based (AB) models, and when should differential equation (DE)
models be used? Where DE models assume homogeneity and perfect mixing within
compartments, AB models can capture heterogeneity in agent attributes and in the network of
interactions among them. The costs and benefits of such disaggregation should guide the choice
of model type. AB models may enhance realism but entail computational and cognitive costs
that may limit sensitivity analysis and model scope. Using contagious disease as an example, we
contrast the dynamics of AB models with those of the analogous mean-field DE model. We
examine agent heterogeneity and the impact of different network topologies, including fully
connected, random, Watts-Strogatz small world, scale-free, and lattice networks. Surprisingly,
in many conditions differences between the DE and AB dynamics are not statistically significant
for key metrics relevant to public health, including diffusion speed, peak load on health services
infrastructure and total disease burden. We discuss implications for the choice between AB and
DE models, level of aggregation, and model boundary. The results apply beyond epidemiology:
from innovation adoption to financial panics, many important social phenomena involve
analogous processes of diffusion and social contagion.
Essay 3- Dynamics of Multiple-release Product Development
Product development (PD) is a crucial capability for firms in competitive markets. Building on
case studies of software development at a large firm, this essay explores the interaction among
the different stages of the PD process, the underlying architecture of the product, and the
products in the field. We introduce the concept of the "adaptation trap," where intendedly
functional adaptation of workload can overwhelm the PD organization and force it into
firefighting (Repenning 2001) as a result of the delay in seeing the additional resource need
from the field and underlying code-base. Moreover, the study highlights the importance of
2
architecture and underlying product-base in multiple-release product development, through their
impact on the quality of new models under development, as well as through resource
requirements for bug-fixing. Finally, this study corroborates the dynamics of tipping into
firefighting that follows quality-productivity tradeoffs under pressure. Put together, these
dynamics elucidate some of the reasons why PD capability is hard to build and why it easily
erodes. Consequently, we offer hypotheses on the characteristics of the PD process that increase
its strategic significance and discuss some practical challenges in the face of these dynamics.
Thesis Supervisors:
Michael A. Cusumano
Professor of Management
Nelson P. Repenning (Co-chair)
Professor of Management
Jan W. Rivkin
Professor of Management
John D. Sterman (Co-chair)
Professor of Management
3
Acknowledgments
When I entered my undergraduate program in Iran, I told my mom that this would be my last
official educational degree. She calmly suggested that there was no need to make that decision at
that time and that the future might unfold differently. This dissertation manifests the different
turn of events she had anticipated, a turn of events which wouldn't have been possible without
several individuals to whom I am most grateful.
Ali Mashayekhi introduced me to the field of system dynamics, stimulated my interest in doing
research, and helped me get into MIT. More importantly, his passion for working on problems
that matter, and his deep caring for people, have set high standards that I will always aspire to.
In the last five years, I have been extremely lucky to work with some of the best people I could
ever hope to learn from. John Sterman first took a chance by admitting me to the program, for
which I am very much grateful, and after that he was an exemplary advisor: he set high
standards, embodied these standards in his rigorous research, and helped me in every step of the
learning process by his continuous feedback and his insightful ideas and remarks. Moreover,
John's caring attitude was crucial in providing the safe and supportive environment that is
needed for learning and doing research, especially for an international student away from home
and family.
Nelson Repenning has been the ultimate supervisor in teaching me to focus on what matters in
the big picture of a problem. Nelson always gave the best pragmatic advice on the process of
Ph.D. work, tried hard to sharpen my focus, and was a role model in setting the right priorities
and following them.
Michael Cusumano and Jan Rivkin kindly took time out of their busy schedules and joined my
committee in the later stages of the process. Their sharp advice, their close interaction, and their
helpful attitudes have been instrumental in improving my work, and I am looking forward to
future collaborations with them.
The help and support of many people in my research sites were crucial for success of my field
studies for the third essay. Martin Devos, David Weiss, Randy Hackbarth, John Palframan, and
Audris Mockus spent many valuable hours helping me with the research and provided me with
lots of data, connections, support, and feedback. Moreover, I am very grateful to many
individuals in Sigma organization who gave their time for interviews, gave feedback on my
models, provided data, and accepted me as a colleague among themselves.
It is easy to think of Ph.D. education as a solitary endeavor: you are assigned a window-less
cubicle from which, after a few years, you are expected to emerge with a fine dissertation.
Reflecting on my experience, I find this perception far from the truth. Besides the invaluable
friendships, much of the learning, research ideas, and improvements in my work came from
close interaction with fellow Ph.D. students in different programs. Laura Black, Paulo
Gongalves, and Brad Morrison, who preceded me in the SD program made me feel at home,
gave me much good advice, and helped me on several important occasions. Jeroen Struben and
Gokhan Dogan nurtured the friendly and intimate atmosphere that softened the hard days and
brought about many fruitful discussions and initiatives. Finally, Joio Cunha, Lourdes Sosa,
4
Sinan Aral, and Kevin Boudreau helped me integrate my work within the larger community of
organizational behavior and strategy.
Fun is often not the best word to describe the life of a Ph.D. student at MIT, yet luckily I can
describe the last five years as fun and enjoyable, and I owe that largely to several great friends.
Not only did I enjoy the many quality days and evenings we spent together, but I also very much
enjoyed our collaboration in the Iranian Studies Group with Ali, Mehdi, Mohammad, Farzan,
Sara, and Nima. I also have some of my best life-long memories from events, parties,
conversations, intimate moments, trips, dances, and sport activities that I enjoyed in the company
of Adel, Anya, Ali(s), Chahriar, Mahnaz, Maryam, Mehdi, Mohsen, Parisa, Roya, Saloomeh,
Sergi, Yue, and many others to whom I can't do justice in this short note.
Mom and Dad get all of the credit for providing me with what I needed most at different stages
of life, and they continued to excel in this task even though they were thousands of miles away.
Their frequent phone calls, surprise gifts, and constant attention helped me feel connected and
happy in the years that I had to spend away from my family. I am eternally thankful to them.
The last months before the thesis defense are often considered the most stressful, yet I was lucky
to be accompanied in this phase of my program by my adorable sweetheart, Sara. Her support,
encouragement, and love made these months the best of all.
Finally, I dedicate this dissertation to my grandparents, Babajoon and Mamanjoon, for their
endless love and support, and for bringing up, and together, a wonderful family.
5
Table of Contents
Essay 1- Effects of Feedback Delay on Learning ........................................
..................
7
Introduction ................................................................................................................................. 7
Modeling learning with delayed feedback ..........................................................
I
Results and Analysis ..........................................................
21
Discussion ................................................................................................................................. 30
R eferences:................................................................................................................................
35
Essay 2- Heterogeneity and Network Structure in the Dynamics of Diffusion ..................... 38
Introduction ...............................................................................................................................
A spectrum of aggregation assumptions ...............................................................................
Model Structure ..........................................................
Experimental design..................................................................................................................
Results.......................................................................................................................................
Discussion .................................................................................................................................
Conclusion ..........................................................
References ......................................
.....................................................................................
38
42
43
46
49
56
60
62
Essay 3- Dynamics of Multiple-release Product Development ............................................... 65
Introduction ................................................................................................................................
Data Collection ..........................................................
Data Analysis ..........................................................
Model and Analysis ..........................................................
Discussion .................................................................................................................................
Implications for Practice .........................................................
65
71
72
76
95
100
R eferences...............................................................................................................................
108
Appendix A- Detailed model formulations............................................................................. 111
Appendix B- The maximum PD capacity ............................................................................... 113
Appendix C- Boundary conditions of the model ........................................ ..................
114
Appendix 1-1- Technical material to accompany essay 1.
......................................
120
Mathematics of the models ..................................................................................................... 120
A Q-learning model for resource allocation learning task...................................................... 125
Appendix 2-1- Technical material to accompany essay 2.
......................................
Formulation for the Agent-Based SEIR Model ........................................
..................
Deriving the mean-field differential equation model..............................................................
Network Structure: ..........................................................
The Fitted DE Model ..........................................................
Sensitivity analysis on population size: ..........................................................
134
134
137
139
141
142
Appendix 3-1- Reflections on the data collection and modeling process ............................. 146
Appendix 3-2- Feedback loops for multiple-release product development ......................... 158
Appendix 3-3- Documentation for detailed multiple-release PD model .............................. 169
6
Essay 1- Effects of Feedback Delay on Learning
Abstract
Learning figures prominently in many theories of organizations. Understanding barriers to
learning is therefore central to understanding firms' performance. This essay investigates the
role of time delays between taking an action and observing the results in impeding learning.
These delays, ubiquitous in real-world settings, can introduce important tradeoffs between the
long-term and the short-term performance. In this essay, four learning algorithms, with different
levels of complexity and rationality, are built and their performances in a simple resource
allocation task are analyzed. The study focuses on understanding the effect of time delays on
learning. Simulation analysis shows that regardless of the level of rationality of the organization,
misperceived delays can impede learning significantly.
Introduction
Learning figures prominently in many perspectives on organizations (Cyert and March
1963). Organizational routines and capabilities fundamental to firm performance are learnt
through an adaptive process of local search and exploration (Nelson and Winter 1982).
Researchers have highlighted learning as an important source of competitive advantage (DeGeus
1988; Senge 1990), central to the firm's performance in high-risk industries (Carroll, Rudolph et
al. 2002), and underlying many efficiency gains by firms (Argote and Epple 1990). Therefore
understanding the functional and dysfunctional effects of learning processes are vital to
understanding firms' performance.
Management scholars have studied both positive and negative aspects of learning. On the
one hand, this literature portrays learning as an adaptive process (Cyert and March 1963), which
brings useful knowledge (Huber 1991), increases efficiency (Argote and Epple 1990), and is
central to the success of organizations in the fast-changing world (DeGeus 1988; Senge 1990).
At the individual level, it is argued that learning helps individuals find the best strategies, and
therefore, this argument is used to justify the individual rationality assumption in economics
(Friedman 1953).
7
On the other hand, if learning mechanisms were always functional and could find the
optimum strategies fast enough', all organizations in similar market niches would acquire similar
strategies and capabilities. Therefore, to understand heterogeneity in a population of firms, we
need to understanding the failure and path-dependence of different learning processes, along
with the population-level dynamics of selection and reproduction. Organizational learning
scholars have studied several factors that hinder learning or result in different outcomes for
learning, including payoff noise, complexity, defensive routines, and delays.
Noise and ambiguity in the payoff that an organization receives increase the chance of
superstitious learning (Argyris and Schon 1978; Levitt and March 1988), where the organization
draws wrong conclusions from its experience. Moreover, stochasticity in results creates a bias
against more uncertain strategies for which an unlucky first experience prohibits future
experimentation (Denrell and March 2001).
Another challenge to functional learning lies in the interaction between the competency
evolution and the search process. One can conceptualize learning as a process of (virtual or
actual) experimentation by an agent (or by other agents in the case of vicarious learning),
observing the resulting payoff, and adjusting (mental models and) actions to enhance
performance. In this framework, each alternative's payoff not only depends on the inherent
promise of that alternative, but also depends on the agent's experience with that alternative. As a
result, initially-explored alternatives pay off better, regardless of their inherent promise, and
therefore they close the door of experimentation to other, potentially better, alternatives
(Levinthal and March 1981; Herriott, Levinthal et al. 1985). Such competency traps (Levitt and
March 1988) also inhibit the organization from exploring better strategies once the current
routines of the firm become obsolete in face of changing environment, thereby turning core
capabilities into core rigidities (Leonard-Barton 1992).
Complexity of the payoff landscape on which organizations adapt raises another
challenge to organizational learning. The complementarities among different elements of a
firm's strategy (Milgrom and Roberts 1990) result in a complex payoff landscape in which
improvements in performance often come from changing multiple aspects of the strategy
together. In other words, after some local adaptation to find an internally consistent strategy,
incremental changes in one aspect of the strategy are often not conducive to performance gains,
l Here the speed of learning should be compared to the speed of environmental change and
processes of selection and reproduction.
8
so that the firm rests on a local peak of a rugged payoff landscape (Busemeyer, Swenson et al.
1986; Levinthal 1997). Even though discontinuous change can take the organization out of such
local peaks (Tushman and Romanelli 1985; Lant and Mezias 1990; March 1991; Lant and
Mazias 1992), this spatial complexity of payoff landscape can indeed contribute to the observed
firm heterogeneity (Levinthal 2000; Rivkin 2000). Further studies in this tradition have shed
light on the challenges of imitation and replication in the presence of spatial complexity (Rivkin
2001), the adaptation processes empirically used by organizations under these conditions
(Siggelkow 2002), and the role of cognitive search in adaptation on rugged landscapes (Gavetti
and Levinthal 2000).
Individuals play a central role in organizations; therefore cognitive biases and challenges
for individuals' learning have important ramifications for organizational learning. Studies have
shown how dynamic complexity and delays can significantly hamper decision-making (Sterman
1989; Paich and Sterman 1993; Diehl and Sterman 1995) and result in self-confirming attribution
errors in organizations (Repenning and Sterman 2002). Moreover, the activation of individual
defensive routines closes the communication channels in the organization, creates bias in intraorganizational information flows, seals off flawed theories-in-practice from external feedback,
and promotes group-think (Argyris and SchOn 1978; Janis 1982; Argyris 1999).
Temporal challenges to learning such as delays between action and payoff have received
little attention in organizational learning research. In contrast, as early as in the 1920 s the
importance of temporal contiguity of stimulus and response was recognized by psychologists
studying conditioning (Warren 1921). However, despite evidence of adverse effects of delays on
individual learning (Sengupta, Abdel-Hamid et al. 1999; Gibson 2000) and some case-specific
evidence at the organizational level (Repenning and Sterman 2002), with a few exceptions
(Denrell, Fang et al. 2004), the effects of delays on organizational learning have not been
studied. In fact, few formal models of organizational learning capture such delays explicitly.
Denrell and colleagues (2004) build on the literature of learning models in artificial
intelligence to specify a Q-learning (Watkins 1989) model of a learning task in a multidimensional space with a single state with non-zero payoff. Consequently the learner needs to
learn to value different states based on their proximity to the goal state. Their analysis informs
the situations in which several actions with no payoff need to be taken before payoff is observed.
This setting is conceptually similar to having a delay between an action and the payoff where the
intermediate states that the system can occupy are distinguishable by the decision-maker and
9
lead to no payoff. The study therefore highlights the importance of information about such
intermediate states and elaborates on the central problem of credit assignment, how to build
associations between actions and payoff in the presence of delays. This study, however, does not
discuss the existence of payoff values in the intermediate states, does not consider the tradeoffs
between short-term and long-term actions central to real-world delay effects, and uses an
information-intensive learning algorithm that requires thousands of action-payoff cycles to
discover a short path to a single optimum state.
We expand these results by examining learning in a resource allocation setting that
resembles the real-world challenges of an organization. We study how much performance and
learning weaken as a result of delays, how sensitive the results are to the rationality of the
decision-maker, and what the mechanisms are through which delays impede learning. A formal
model enables us to pin down the effects of time delays on learning by removing other
confounding factors that potentially influence learning. Specifically, we will remove the effects
of feedback noise, multiple peaks and competency traps, discount rate, interpersonal dynamics,
and confounding contextual cues on learning. Running controlled experiments on the simulation
model, we can also gain insight into the mechanisms through which delays impede learning.
Moreover, using a formal model, we are able to run a large number of controlled experiments
and investigate them closely at a relatively cheap cost, which improves the internal validity of
the study compared to using real experiments. Nevertheless, using simulations raises the
question of the external validity of the model. Specifically, it is not clear how closely our
algorithms of learning correspond to human learning patterns or organizational learning
practices, and therefore how well the results can be generalized to real social systems.
To meet this challenge we draw on the current literature on learning in organizational
behavior, psychology, game theory, and attribution theory to model learning, and investigate four
different learning procedures with different levels of rationality, information processing
requirements, and cognitive search ability. By including these different algorithms, we can
distinguish the behavioral patterns that are common across different algorithms and therefore
derive more valid conclusions about the processes that extend to real social systems. Moreover,
using these different algorithms, we can differentiate the effects of decision-making rationality
from learning problems arising from delayed feedback.
Our results confirm the hypothesis that delays complicate the attribution of causality
between action and payoff and therefore can hinder learning (Einhorn and Hogarth 1985;
10
Levinthal and March 1993). Furthermore, our analysis shows that performance can still be
significantly sub-optimal in very easy learning tasks, specifically, even when the payoff
landscape is unchanging, smooth, and has a unique optimum. Moreover, the organization can
learn to believe that the sub-optimal performance is really the best it can do. Finally, the results
are robust to different learning algorithms and the difficulty of learning in the presence of delays
appears to be rooted in the misperception of the delays between actions and outcomes, rather
than the specific learning procedures we examined.
These results suggest interesting research opportunities, including extending these
algorithms; exploring interaction of delays with feedback noise, perception lags, and ruggedness
of landscape; investigating the possibility of learning about the structure of time delays; and
explaining empirical cases of learning failure. Moreover, the results have practical implications
in the design of accounting and incentive systems as well as evaluation of firm performance.
In the next section, we describe the model context and structure and discuss the different
learning procedures in detail. The "Results and Analysis" section presents a base case
demonstrating that all four learning procedures can discover the optimum allocation in the
absence of action-payoff delays. Next we analyze the performance of the four learning
procedures in the presence of action-payoff delays, followed by tests to examine the robustness
of these results under different parameter settings. We close with a discussion of the
implications, limitations, and possible extensions.
Modeling learning with delayed feedback
Resource allocation problems provide an appropriate context for studying the effect of time
delays on learning. First, many important situations in multiple levels of analysis involve
allocating resources among different types of activities with different delays between allocation
and results. Examples include an individual allocating her time between work and education, a
factory allocating resources between production, maintenance, and process improvement, and a
government allocating budget between subsidies, building roads, and building schools.
Moreover, these situations often involve tradeoffs between short-term and long-term results
and therefore raise important questions about the short-term vs. long-term success of different
social entities. For example, the factory can boost output in the short run by cutting
maintenance, but in the long run, output falls as breakdowns increase. Other examples include
11
learning and process improvement (Repenning and Sterman 2002), investment in proactive
maintenance (Allen 1993), and environmental degradation by human activity (Meadows and
Club of Rome 1972). These tradeoffs suggest that individuals, organizations, and societies can
often fail to learn from experience to improve their performance in these allocation and decisionmaking tasks.
Previous studies motivated formulation of our model. Repenning and Sterman (2002) found
that managers learned the wrong lessons from their interactions with the workforce as a result of
different time delays in how various types of worker activity influence the system's performance.
Specifically, managers seeking to meet production targets had two basic options: (1) increase
process productivity and yield through better maintenance and investment in improvement
activity; or (2) pressure the workforce to "work harder" through overtime, speeding production,
taking fewer breaks, and, most importantly, by cutting back on the time devoted to maintenance
and process improvement. Though the study found, consistent with the extensive quality
improvement literature, that "working smarter" provides a greater payoff than working harder,
many organizations find themselves stuck in a trap of working harder, resulting in reduced
maintenance and improvement activity, lower productivity, greater pressure to hit targets, and
thus even less time for improvement and maintenance (Repenning and Sterman 2002).
In parallel to this setting, our model represents an organization engaged in a continuoustime resource allocation task. The organization must allocate a fixed resource among different
activities. The payoff generated by these decisions can depend on the lagged allocation of
resources, and there may be different delays between the allocation of resources to each activity
and its impact on the payoff. The organization receives outcome feedback about the payoff and
past resource allocations, and attempts to learn from these actions how to adjust resource
allocations to improve performance. As a concrete example, consider a manufacturing firm
allocating the organizational resources (e.g., employee time) among three activities: production,
maintenance, and process improvement. These activities influence production, but with different
delays. Time spent on production yields results almost immediately. There is a longer delay
between a change in maintenance activity and machine uptime (and hence production). Finally,
it takes even longer for process improvement activity to affect output.
The organization gains experience by seeing the results of past decisions and seeks to
increase production based on this experience. It has some understanding of the complicated
mechanisms involved in controlling production captured in the mental models of individual
12
members and in organizational routines, and it may be aware of the existence of different delays
between each activity and observed production. Consequently, when evaluating the
effectiveness of its past decisions, the organization takes these delays into consideration (e.g., it
does not expect last week's process improvement effort to enhance production today). However,
the organizational mental model of the production process may be imperfect, and there may be
discrepancies between the length of the delays it perceives and the real delays.
1- Allocation and payoff
The organization continuously allocates a fraction of total resources to activity j of m
possible activities at time t, 15(t)2 where:
Fj(t) =l
For j:1,...,m
(1)
J
In our simulations we assume m = 3 activities, so the organization has two degrees of freedom.
Three activities keep the analysis simple while still permitting different combinations of delay
and impact for each. Total resources, R(t), are assumed to be constant so R(t) = R, and
resources allocated to activity j at time t, 4(t) are:
Aj (t) = Fj (t) * R
(2)
These allocations influence the payoff with a delay. The payoff at time t is determined
by the Effective Allocation, Aj (t), which can lag behind Aj with a payoff generation delay of
Tj. We assume a fixed, non-distributed
A j (t) = Aj (t - Tj)
delay for simplicity 3 :
(3)
The payoff generation delays are fixed but can be different for each activity.
In our manufacturing firm example, R is the total number of employees, Fj (t) is the fraction of
the fraction of people working on activity j (j: producing, maintenance, process improvement)
and Aj (t) is the number of people working on activity j. In our example, the delays in the
impact of these activities on production (the payoff) are presumably ordered approximately as
2 The
model is formulated in continuous time but simulated by Euler integration with a time step
of 0.125 period. Sensitivity analysis shows little sensitivity of the results to time steps < 0.2 .
3 Different types of delay, including first- and third-order Erlang delays, were examined; the
results were qualitatively the same.
13
0
Tproduction< Tmaintenance< Timprovement.
For simplicity we assume the payoff, P(t), to be a constant-returns-to-scale, CobbDouglas function of the effective allocations, which yields a smooth landscape with a single
peak, allowing us to eliminate learning problems that arise in more complicated landscapes:
P(t) = nA
(t)a
J
, yaj
=1
(4)
1
2- Perception delays
The organization perceives its own actions and payoff and uses this information to learn
about the efficiency of different allocation profiles and to develop better allocations. We assume
the organization accounts for the delays between past allocations and payoffs, but recognize that
its estimate of the length of these delays may not be correct. In real systems perceived payoff
and payoff, P(t), are different, but to give the learning algorithms the most favorable
circumstances, we assume that measurement and perception to be fast and unbiased. The
organization accounts for the delays between allocations and payoff based on its beliefs about the
length of the payoff generation delays, T,. It attributes the current observed payoff, P(t), to
allocations made Tj periods ago, so the action associated to the current payoff, A, (t), is:
Aj (t) = Aj (t - Tj)
(5)
The values of Aj (t) and P(t) are used as inputs to the various learning and decisionmaking procedures representing the organization's behavior.
3- Learning and decision-making procedures
Having perceived its own actions and payoff streams, the organization learns from its
experience, that is, updates its beliefs about the efficiency of different allocations and selects
what it believes is a better set of allocations. We developed four different learning models to
explore the sensitivity of the results to different assumptions about how organizations learn. The
inputs to all of these algorithms are the perceived payoff and action associated with that payoff,
P(t) and Aj (t); the outputs of the algorithms are the allocation decisions Fj (t) .
The learning algorithms differ in their level of rationality, information processing
requirements, and assumed prior knowledge about the shape of the payoff landscape. Here,
rationality indicates organization's ability to make the best use of the information available by
trying explicitly to optimize its allocation decisions. Information processing capability indicates
14
organization's capacity for keeping track of and using information about past allocations and
payoffs. Prior knowledge about the shape of the payoff landscape determines the ability to use
off-line cognitive search (Gavetti and Levinthal 2000) to find better policies.
We use four learning algorithms4 denoted as Reinforcement, Myopic Search, Correlation and
Regression.
In all the learning algorithms, the decision-maker has a mental representation of
how important each activity is. We call these variables "Activity Value," Vj (t) . The activity
values are used to determine the allocation of resources to each activity (equations 12-14 explain
how allocation of resources is determined based on Activity Values.) The main difference across
the different learning algorithms is how these activity values are updated. Below we discuss the
four learning algorithms in more details. The complete formulation of all the learning algorithms
can be found in the technical appendix (1-1).
1- Reinforcement learning: In this method, the value (or attractiveness) of each activity is
determined by (a function of) the cumulative payoff achieved so far by using that alternative.
Attractiveness then influences the probability of choosing each alternative in the future.
Reinforcement learning has a rich tradition in psychology, game theory and machine learning
(Erev and Roth 1998; Sutton and Barto 1998). It has been used in a variety of applications, from
training animals to explaining the results of learning in games and designing machines to play
backgammon (Sutton and Barto 1998)5.
In our model, each perceived payoff, P(t), is associated with the allocations believed to
be responsible for that payoff, Aj (t) . We increase the value of each activity, Vj(t), based on its
contribution to the perceived payoff. The increase in value depends on the perceived payoff
itself, so a small payoff indicates a small increase in the values of different activities while a
4 We tried other algorithms, including Q-learning model used by Denrell et al. (2004), and
decided to focus on the four reported here because they are more efficient. The discussion and
some results for the Q-learning model are reported in the technical appendix (-1). In summary,
to converge to the optimal policy, that model requires far more data than the reported algorithms.
5 The literature on reinforcement learning has different goals and algorithms in different fields.
In the context of machine learning, reinforcement learning is developed as an optimization
algorithm. Game theory literature on reinforcement learning focuses on providing simple
algorithms that give good descriptive accounts of individual learning in simple games. Our use
of reinforcement learning here is more aligned with the game theoretic reinforcement learning
models. For a survey of machine learning literature on reinforcement learning see Sutton and
Barto's book (1998) and Kaelbling et al. (1996). Also see technical appendix (1-1) for the
results on a Q-learning model motivated by machine learning literature.
15
large payoff increases the value much more. Therefore large payoffs shift the relative weight of
different activity values towards the allocations responsible for those better payoffs.
Vj(t)P(t)Rifceme
Aj (t) - V, (t) / Reinforcement Forgetting Time
(6)
Our implementation of reinforcement learning includes a forgetting process, representing
the organization's discounting of older information. Discounting old information helps the
algorithm adjust the activity values better. Equation 6 shows the main formulation of the
Reinforcement algorithm. Here both "Reinforcement Power" and "Reinforcement Forgetting
Time" are parameters specific to this algorithm. "Reinforcement Power" indicates how strongly
we feedback the payoff as reinforcement, to adjust the "Activity Values" and therefore
determines the speed of converging to better policies. "Reinforcement Forgetting Time" is the
time constant for depreciating the old payoff reinforcements.
Details of the model formulations are included in the technical appendix (1-1), table 1.
The Reinforcement method is a low information, low rationality procedure: it continues to do
what has worked well in the past, adjusting only slowly to new information, and does not attempt
to extrapolate from these beliefs about activity value to the shape of the payoff landscape or to
use gradient information to move towards higher-payoff allocations.
2- Myopic search: In this method the organization explores neighboring regions of the decision
space (at random). If the last allocation results in a better payoff than the aspiration (based on
past performance), the action values are adjusted towards the explored allocation, if the
exploration results in a worse payoff, the action values remain unchanged. This procedure is a
close variation of Levinthal and March's (1981) formulation for adaptation of search propensity
for refinement and innovation. It is also similar to the underlying process for many of the
stochastic optimization techniques where, unaware of the shape of the payoff landscape, the
algorithm explores the landscape and usually moves to better policies upon discovering them.
Optimization models assume decisions switch instantly to better policies, if found, for the
next step. In reality, even if individual decisions can change fast, underlying value system
adjusts slowly, due to the time required to aggregate new information, process the information,
and make changes to the routines; that is, to the factors that lead to organizational inertia. To be
behaviorally more realistic, we assume activity value, V, (t), adjusts gradually towards the
value-set suggested by the last allocation, V,* (t) = A / Ak
(Equation 7). V * (t) is the last
16
allocation if the payoff improved in the last step and remains to be the current allocation value if
the payoff did not change significantly or decreased.
d Vj (t) = (V (t) - V (t))/2
dt
(7)
where A is the Value Adjustment Time Constant. The formulation details for Myopic search
method can be found in the technical appendix (1-1), table 2. The myopic method is a low
rationality method with medium information processing. It compares the previous payoff with
the result of a local search and does not attempt to compare multiple experiments; it also does
not use information about the payoffs in the neighborhood to make any inferences about the
shape of the payoff landscape.
3- Correlation: This method uses principles from attribution theories to model learning. In our
setting, learning can be viewed as how organizational decision-makers attribute different
allocations to different payoffs and how these attributions are adjusted as new information about
payoffs and allocations are continuously perceived. Several researchers have proposed different
models for explaining how people make attributions. Lipe (1991) reviews these models and
concludes that all major attribution theories are based on the use of counterfactual information.
However it is difficult to obtain counterfactual information (information about contingencies that
were not realized) so she proposes the use of covariation data as a good proxy. The correlation
algorithm is based on the hypothesis that people use the covariation of different activities with
the payoff to make inferences about action-payoff causality.
In our model, the correlations between the perceived payoff and the actions associated
with those payoffs let the organization decide whether performance would improve or deteriorate
if the activity increases. A positive (negative) correlation between recent values of Aj (t) and
P(t) suggests that more (less) of activity j will improve the payoff. Based on these inferences
the organization adjusts the activity values, V (t) , so that positively correlated activities
increase above their current level and negatively correlated activities decrease below the current
level.
d V (t) = Vj (t).f (Action PayoffCorrelationj(t))2
dt
f()=l,
f'(x)>O
(8)
The formulation details for the correlation algorithm are found in technical appendix (1-1), Table
3. At the optimal allocation, the gradient of the payoff with allocation will be zero (the top of the
17
payoff hill is flat) and so will the correlation between activities and payoff. Therefore the change
in activity values will become zero and the organization settles on the optimum policy.
The correlation method is a moderate information, moderate rationality approach: more
data are needed than required by the myopic or reinforcement methods to estimate the
correlations among activities and payoff. Moreover, these correlations are used to make
inferences about the local gradient so the organization can move uphill from the current
allocations to allocations believed to yield higher payoffs, even if these allocations have not yet
been tried.
4- Regression: This method is a relatively sophisticated learning algorithm with significant
information processing requirements. We assume that the organization knows the correct shape
of the payoff landscape and uses a correctly specified regression model to estimate the
parameters of the payoff function.
By observing the payoffs and the activities corresponding to those payoffs, the
organization receives the information needed to estimate the parameters of the payoff function.
To do so, after every few periods, it runs a regression using all the data from the beginning of the
learning task6 . From these estimates the optimal allocations are readily calculated. Raghu and
colleagues (Raghu, Sen et al. 2003) use a similar learning algorithm to model how simulated
fund managers learn about effectiveness of different funding strategies. For the assumed
constant returns to scale, Cobb-Douglas function, the regression is:
log(P(t)) =log(a0 ) +
The estimates ofaj ,aj,
aj. log(Aj(t)) + e(t)
(9)
are evaluated every "Evaluation Period," E. Based on these estimates,
the optimal activity values, V, (t) are given by:
Vj,(t) Max(aj */ ai ,0)
(10)
The organization then adjusts the action values towards the optimal values (see Equation 7). See
Table 4 in technical appendix (1-1) for equation details.
6 Discounting
older information does not make any positive difference here since the payoff
function does not change during the simulation.
7 In the case of negative a* , resulting Vj,(t) 's are adjusted to a close point on the feasible action
space. This adjustment keeps the desired action values ( Vj (t) 's) feasible (positive).
18
The regression algorithm helps test how delays affect learning over a wide range of
rationality assumptions. The three other algorithms represent an organization ignorant about the
shape of the payoff function, while in some cases organization have at least partial understanding
of the structure and functional forms of the causal relationships relating the payoff to the
activities. Although in feedback-rich settings, mental models are far from perfect and calculation
of optimal decision based on the understanding of the mechanisms is often cognitively infeasible
(Sterman 1989; Sterman 1989), the regression algorithm offers a case of high rationality to test
the robustness of our results. Table I summarizes the characteristics of the different algorithms
on the three dimensions of rationality, information processing capacity and prior knowledge of
payoff landscape.
Table I - Sophistication and rationality of different learning algorithms
Dimension Rationality
Information
Prior Knowledge of
Algorithm
Processing Capacity Payoff Landscape
Reinforcement
Low
Low
Low
Myopic Search
Low
Medium
Low
Correlation
Medium
Medium
Low
Regression
High
High
High
The tradeoff between exploration and exploitation is a crucial issue in learning (March 1991;
Sutton and Barto 1998). On one hand the organization should explore the decision space by
trying some new allocation policies if she wants to learn about the shape of the payoff landscape.
On the other hand pure exploration leads to random allocation decisions with no improvement.
So exploration is needed to learn about payoff landscape and exploitation is required if any
improvement is to be perceived.
We use random changes in resource allocation to capture the exploration/exploitation
issue in our learning models. The "Activity Values" represent the accumulation of experience
and learning by the organization. In pure exploitation, the set of activity values, Vj (t),
determines the allocation decision. Specifically, the fraction of resources to be allocated to
activity j would be:
F, (t) = V (t),
Vj(t)
(11)
The tendency of the organization to follow this policy shows its tendency to exploit the
experience it has gained so far. Deviations from this policy represent experiments to explore the
19
payoff landscape. We multiply the activity values by a random disturbance to generate
Operational Activity Values, Vj (t) 's, which are the basis for the allocation decisions.
V, (t) = Vj(t) +tNi (t)
(12)
Nj (t)= Ni (t)*
V(t)
(13)
j (t)
(14)
J
F, (t)
Vi()/
Nj (t) is a first order auto-correlated noise term specific to each activity. It generates a
stream of random numbers with autocorrelation. In real settings, the organization experiments
with a policy for some time before moving to another. Such persistence is both physically
required (organizations cannot instantly change resource allocations), and provides organizations
with a large enough sample of data about each allocation to decide if a policy is beneficial or not.
"Activity noise correlation time", 6, captures degree of autocorrelation in the stream Nj (t) .
High values of a represent organizations who only slowly change the regions of allocation space
they are exploring; a low value represents organizations who explore quickly from one allocation
to another in the neighborhood of their current policy8 . We use Adjusted Pink Noise, N. (t) for
evaluating the Operational Action Values, Vj (t) 's, because different algorithms have different
magnitudes of Action Value and comparable exploration terms need to be scaled.
The standard deviation of the noise term, which determines how far from the current
allocations is explored, depends on where the organization finds itself on the payoff landscape.
If its recent explorations have shown no improvement in the payoff, it concludes that it is near
the peak of the payoff landscape and therefore extensive exploration is not required
(alternatively, it concludes that the return to exploration is low and reduces experimentation
accordingly). If, however, recent exploration has resulted in finding significantly better regions,
it concludes that it is in a low-payoff region and there is still room for improvement so it should
keep on exploring, so Var ( Nj (t)) remains large:
Var (Nj (t))= g (RecentPayoff Improvement(t)),
We set the mean of Nj(t) = 0, so the decision-maker has no bias in searching different regions of
the landscape. We also truncate the values of PN so that N.(t) > -I to ensure that Vj (t) > 0
8
(Equation 12).
20
g(O)=Minimum Exploration Variance, g'(x)>O
(15)
In this formulation if the current payoff is higher than recent payoff, Recent Payoff Improvement
will be increased, if not, it will decay towards 0. Details of exploration equations can be found at
Table 5 of technical appendix (I-1).
Results and Analysis
In this section we investigate the behavior of the learning model under different
conditions. We first explore the basic behavior of the model and its learning capabilities when
there are no delays. The no-delay case helps compare the capabilities of the different learning
algorithms and provides a base against which we can compare the behavior of the model under
other conditions. Next we introduce delays between activities and payoff and analyze the ability
of the different learning algorithms to find the optimal payoff, for a wide range of parameters.
In running the model, the total resource, R, is 100 units and the payoff function exponents
are set to 1/2, 1/3 and 1/6 for activities one, two and three, respectively. The optimal allocation
fractions are therefore 1/2, 1/3 and 1/6, and optimal payoff is 36.37. Starting from a random
allocation, the organization tries to improve its performance in the course of 300 periods.9
We report two aspects of learning performance: (1) How close to optimal the simulated
organization gets, and (2) how fast it converges to that level of performance, if it converges at
all. The percentage of optimal payoff achieved by the organization at the end of simulation is
reported as Achieved Payoff Percentage. Monte-Carlo simulations with different random noise
seeds for the exploration term (equation 13) and for the randomly chosen initial resource
allocation give statistically reliable results for each scenario analyzed. To facilitate comparison
of the four learning algorithms, we use same noise seeds across the four models therefore any
differences in a given run are due only to the differences among the four learning procedures and
not the random numbers driving them.
1- Base case
The base case represents an easy learning task where there are no delays between actions
and the payoff. The organization is also aware of this fact and therefore she has a perfect
9 The simulation horizon is long enough to give the organization opportunity to learn, while
keeping the simulation time reasonable. In the manufacturing firm example we have chosen
different time constants so that a period represents one month. Therefore the 300 period horizon
would represents about 27 years, ample time to examine how much the organization learns.
21
understanding of the correct delay structure. In this setting, we expect all the learning algorithms
to find the optimum solution. Figure 1 shows the trajectory of the payoff for each learning
algorithm, averaged over 100 simulation runs. The vertical axis shows the percentage of the
optimal payoff achieved l ° by each learning algorithm.
Rearession Performance of Different Aleorithm s
100
90
80
70
60
0
30
60
90
120
150
180
Time (Period)
210
240
270
300
Figure 1- Percentage of payoff relative to optimal in the base case, averaged over 100 runs.
When there are no delays, all four learning algorithms converge to the optimal resource
allocation. For comparison, the average payoff achieved by a random resource allocation
strategy is 68% of optimal (since the 100 simulations for each learning algorithm start from
randomly chosen allocations, all four algorithms begin at an average payoff of 68%.)
An important aspect of learning is how fast the organization converges to the allocation
policy it perceives to be optimal. In some settings and with some streams of random numbers it
is also possible that the organization does not converge within the 300 period simulation horizon.
10Note that by optimum we mean the best payoff one can achieve over the long-term by
pursuing any policy which need not to be the highest possible payoff. For example, with a one
period delay for activity I and no delay for the two others, the organization can allocate all 100
units to activity I during the current period and allocate all the resource between two other
activities during the next period. Under these conditions, it can achieve higher than optimum
payoff for the next period, at the expense of getting no payoff this period (because activities 2
and 3 receive no resources) and the period after the next (because activity I receives no resources
in that period). A constant returns to scale payoff function prevents such policies from yielding
higher payoffs than the constant allocations in longterm.
22
Table 2 reports the payoffs, the fraction of simulations that have converged as well as the
average convergence time'" (for those that converged) in the base case.
Regression, correlation, and myopic algorithms find the optimum almost perfectly and
reinforcement goes a long way towards finding the optimum payoff. Majority of simulations
converge prior to the 300 period horizon and the average convergence times range from a low of
38 periods for the regression algorithm to a high of 87 periods for the myopic model.
Table 2- Achieved Payoff, Convergence Time and Percentage Converged for the Base case.
Statistics from 400 simulations.
Learning Algorithm
Regression
Estimates gp
Correlation
Myopic
a
I
a
It
a
.
a
Reinforcement
Variable
Achieved Payoff
percentage at period300
99.96
0.04
99.91
0.07
99.43
1.57
95.18
5.02
Convergence Time
37.97
3.84
75.39
32.68
87.26
60.34
40.28
29.25
Percentage Converged'
1 99.75
99.75
98.75
90
2-The Impact of delays
Most models of organizational learning share an assumption that actions instantly payoff
and the results are perceived by the organization with no delay. Under this conventional
assumption, all our learning algorithms reliably find the optimum allocation and converge to it.
In this section we investigate the results of relaxing this assumption.
A basic variation is to introduce a delay in the impact of one of the activities, leaving the
delay for the other two other activities at zero. We simulate the model for 9 different values of
the "Payoff Generation Delay" for activity one, TI, ranging from 0 to 16 periods, while we keep
the delays for other activities at 0. In our factory management example the long delay in the
impact of activity I is analogous to process improvement activities that take a long time to bear
fruit. Because activity one is the most influential in determining the payoff, these settings are
l We consider a particular learning procedure to have converged when the variance of the
payoff falls below 1% of its historical average. If later the variance increases again, to 10% of its
average at the time of convergence, we reset the convergence time and keep looking for the next
instance of convergence.
12 Percentage of simulations converging to some policy at the end of 300 period simulations.
These numbers are reported based on 400 simulations.
23
expected to highlight the effect of delays more clearly. Delays as high as 16 period are realistic
in magnitude, when weighed against the internal dynamics of the model. For example, in the
base case, it takes between 38 (for regression) to 87 (for myopic) periods for different learning
models to converge when there is no delays, suggesting that dynamics of exploration and
adjustment delays unfold in longer time horizons than different delays tested (See technical
appendix (1-1 ), Table 6, for an alternative measure of speed of internal dynamics).
We analyze two setting for the perceived payoff generation delay. First, we examine the
results when the delays are correctly perceived: we change the perceived payoff generation delay
for activity one, 1t at the same values as lT, so that we have no misperception of delays:
T = Tj. This scenario informs the effect of delays when they are correctly perceived and
accounted for. Then, we analyze the effect of misperception of delays by keeping the perceived
payoff generation delay for all activities, including activity one, at zero. This setting corresponds
to an organization that believes all activities affect the payoff immediately.
Figure 2 reports the behavior of the four learning algorithms under the first setting, where delays
are correctly perceived. Under each delay time the Average Payoff Percentage achieved by
organization at the end of 300 periods and the Percentage Converged 13are reported. In the
"Average Payoff Percentage" graph, the line denoted "Pure Random" represents the case where
the organization selects its initial allocation policy randomly and sticks to that, without trying to
learn from its experience. Results are averaged over 400 simulations for each delay setting.
Performance
PercentageConverged
100
-
90°
801
· 70soI
a.
<E
1001
-
-
70
70
___
-
.---
___
__
___
__
___
__
<go~~~~~~~~~~~
0
so
O
60
0
40
!
201
50
0
2
4
8f
8
10
12
14
18
PayoffGenerationDelayfor Action1
-
Regression-Correlaton -*- Myopic-Reinforcement
0
2
4
6
8
10
12
14
16
PayoffGenerationDelayfor Action1
-Random
Figure 2- Payoff Percentage and Percentage Converged with different, correctly perceived, time
delays for first activity. Activities 2 and 3 have no delay in influencing the payoff.
13 Convergence times are (negatively) correlated with the fraction of runs that converge and
therefore are not graphed here.
24
The performances of all the four learning algorithms are fairly good. In fact regression model
always finds the optimum allocation policy under all different delay settings if it has a correct
perception of delays involved and the average performance of all the algorithms remain over
94% performance line. The convergence percentages declines, as continuously exploring the
optimal region, a significant number of simulations fail to converge to the optimal policy, even
though their performance levels are fairly high.
PercentageConverged
Performance
xIuU
100
o
o90
a 80
\
80
O 60
Cn~~~~~~~~~~~
'E3.- i.
-h
.
a
a
=%
,
-
0;
A
80
i
1
-
0
6_
M
40
a
20
.
a)
50
I_
__
0
_
_______.
2
4
,
6
nV
8
10
12
14
16
PayoffGenerationDelayfor Action1
-
Regression-e-Correlation -Myopic
.__
0
2
4
6
8
10
12
14
16
PayoffGenerationDelayfor Action1
-*-Reinforcement -- Random
Figure 3- Average Payoff Percentage and Percentage Converged with different time delays for
first activity, when perceived delays are all zero. Results are averaged over 400 simulations.
Figure 3 shows the performance of four learning algorithms under the settings with misperceived
delays. This test suggests that the performances of all of the learning algorithms are adversely
influenced with the introduction of misperceived delays. It also shows a clear drop in the
convergence rates as a consequence of such delays.
In summary, the following patterns can be discerned from these graphs. First, learning is
significantly hampered if the perceived delays do not match the actual delays. The pattern is
consistent across the different learning procedures. As the misperception of delays grow all four
learning procedures yield average performance worse than or equal to the random allocation
policy (68% of optimal under this payoff function). However, correctly perceived delays have a
minor impact on performance.
Second, under biased delay perception the convergence percentages decrease significantly,
suggesting that algorithms usually keep on exploring as a result of inconsistent information they
receive. More convergence happen when the delays are perceived correctly. Meanwhile, still a
notable fraction of simulation runs do not converge under correctly perceived delays, which
suggest that existence of delays by themselves complicate the task of organization for
25
converging on the optimal policy. More interestingly, under misperceived delays, some
simulations converge to sub-optimal policies for different algorithms. This means that the
organization can under some circumstances end up with an inferior policy concluding that this is
the best payoff it can get, and stops exploring other regions of the payoff landscape, even though
there is significant unrealized potential for improvement.
Third, different learning algorithms show the same qualitative patterns. They also show some
differences in their precise performance, where under misperceived delays, the more rational
learning algorithms, regression and correlation, perform better in short delays and under-perform
in long delays. In fact regression and correlation models under-perform the random allocation
(p<0.000 ), while regression always finds the optimal allocation when the delay structure is
correctly perceived.
In short, independent of our assumptions about the learning capabilities of the
organization, a common failure mode persists: when there is a mismatch between the true delay
and the delay perceived by the organization, learning is slower and usually does not result in
improving the performance. In fact the simulated organization frequently concludes that an
inferior policy is the best it can achieve, and sometimes, trying to learn from their experience, the
organizations reach equilibrium allocations that yield payoffs significantly lower than the
performance of a completely random strategy. To explore the robustness of these results, we
investigate the effect of all model parameters on the performance of each learning algorithm.
3-Robustness of results
Each learning model involves parameters whose values are highly uncertain. To explore
the sensitivity of the results to these parameters we conducted sensitivity analysis over all the
parameters of each learning procedure. In this analysis we aim at investigating the robustness of
the conclusions we have derived in the last section about the effect of delays on learning and
answer the following questions:
-
Is there any asymmetry in the effect of misperceived delays on performance? In reality
organizations are more prone to underestimating, rather than overestimating, the delays
(e.g. managers usually don't overestimate how long it takes the process improvement to
payoff), so finding out about such asymmetries is important and consequential.
-
Is there any nonlinear effect of delays on learning? In specific, there should be a limit to
negative effects of delays on performance. Do such saturation effects exist and if so at
what delay levels do they become important?
26
We conducted a Monte Carlo analysis, selecting each of the important model parameters
randomly from a uniform distribution over a wide range (One to four times smaller/bigger than
the base case. See the complete listing and ranges in the technical appendix (1-1), table 7). We
carried out 3000 simulations, using random initial allocations in each.
Simple graphs and tables are not informative about these multidimensional data. We
investigate the results of this sensitivity analysis using regressions with three dependent
variables: Achieved Payoff Percentage, Distance Traveled, and Probability of Convergence.
Distance traveled measures how far the algorithm has traveled on the action space, and is a
measure of the cost of search and exploration.
For each of these variables and for each of the learning algorithms, we run a regression
over the independent parameters in that algorithm (See technical appendix (1-1) for complete
listing). The regressors also include the Perception Error, which is the difference between the
Payoff Generation Delay and its perceived value; Absolute Perception Error; and Squared
Perception Error. Having both Absolute Perception Error and Perception Error allows us to
examine possible asymmetric differences for misperceiving errors. The second order term will
allow inspection of nonlinearities and saturation of delay effects. To save space and focus on the
main results, we only report the regression estimates for the main independent variables (Delay,
Perception Error, Absolute Perception Error, and Squared Perception Error), even though all the
model parameters are included on the right hand side of the reported regressions. OLS is used
for the Achieved Payoff Percentage and Distance Travelled; logistic regression is used for the
Convergence Probability. Tables 3 5 show the regression results, summary statistics are
available in the technical appendix (1-1), Table 8. All the models are significant at p < 0.001.
The following results follow from these tables:
-
Increasing the absolute difference between the real delay and the perceived delay always
decreases the achieved payoff significantly (Table 4, row 4). The coefficient for this
effect is large and highly significant, indicating the persistence of the effect across
different settings of parameters and different learning algorithms. Absolute perception
error also usually increases the cost of learning (Distance Traveled), even though the
effect is not as pronounced as that of achieved payoff (Compare Tables 3 and 4, Rows 4).
-
Delay alone appears to have much less influence on payoff. The regression model is
insensitive to correctly perceived delays and the rest of the models show a small but
significant decline of performance (Table 3, Row 3).
27
Three of the models show asymmetric behavior in presence of misperceived delays. For
Regression and Correlation, underestimating the delays is more detrimental than
overestimating them (Table 3, Row 5) while Myopic is better off underestimating the
delays rather than overestimating them. In all models the absolute error has a much
larger effect than perception error itself (Table 3, Compare Rows 4 and 5).
There is a significant nonlinear effect of misperceived delays on performance (Table 3,
Row 6). This effect is in the expected direction, so higher perception errors have smaller
incremental effect on learning and the effect saturates (second order effect neutralizes the
first order effect) when the delay perception error is around 14 (except for regression
which has a much smaller saturation effect).
28
V
v
.-.
6_ O
2_
0 0 °
0
)0
E C)0o
°v
°v
C)-
0C~
O vo 0e
W)C
ad
c00l.
.nO.>°O-
00O v
o o o-)
o
clab o0_0
(cl~~~c
eF
N
N
oo
Co
~6
0 -0
00 <D0
-
0
-
-
N
C)
-
0
0
000
0
., N
C C0
ONo
00
ool
00
O
6
0
ON
00
8S
--c
.OO
rrN~0099
000
1
96,W
6
a
O
O$
Cl
_"I/_ _ _ _
:
0
m
e
0.0
0
'O cl 0 0 ONc
°v °V
V
-a0
v
oo
- _ _ Y)v'oc0
0 Vo0c 0
_C00
.
('N
_* o~oo
£ Q
,OC
n o
c
O.
c
V 0· 00
..000
no
_)
-
_
-
'.O-rN
00
-~ c -
-
0O
0
O
0 c0 N _
a
e_
W
00
.N
Q
0
-
(14
ON
"0O
'n
a,
.E00
C N
0
c c
N
0
cC
0
-
C0o-
6i. I
0
0
O
v
-
0 '0
O
c
·000
6
0od000
00-
0
0
'.0
S. _C__
CON<
_C
6o00000
D
-
-;
- t V '00o
v
e
5O -q)55c
5 O
CCCIC
c
V NiIQl
.2C
00 'T ONcq1
,~CI5
e
ol
Nmo
000 00 N.0
'0
6-
C)~
lu
C)I
go
a
1.
W
a
cl
2
~
~ a(i
6
'lol
~
<
I
.,~~(
~
'.
(N
0
a0
I
0
~ ~ o
~
llld
I
I
I
I
I
.0
-
0
0
6 0 o o0
>
0
0
0
0
O
4
0
aL
C)
c
C)
0 0
0
~~3 0 0 0 0 0
-u
a
000 -
0 V.
O0
0.
a
o
0
C
>0,
06a
u"0
6
m
't
0
0
0
e)
02
0
6
Y
W)
Nd'( N'.
'AN
3
a
[(' N
I
9
0
0 l
._
I.o
0a
oa '0
a~
9
L
C
Pt
v
C0)
0O
.
co
-o ~~
~i0~
.M
6
- CS
l ln00C
OCN ( N O' C
w0*
C
C)
r
'(
tA
C·
0
0
-
Convergence is influenced both by delay and by error in perception. So higher delays,
even if perceived correctly, make it harder for the algorithms to converge (Table 5).
These results are in general aligned with the analysis of delays offered in the last section
and highlight the robustness of the results. The analysis shows that the introduction of delay
between resource allocations and their impact causes sub-optimal performance, slower learning,
and higher costs for learning.
Discussion
The results show that learning is slow and ineffective when the organization
underestimates the delay between an activity and its impact on performance. It may be objected
that this is hardly surprising because the underlying model of the task is mis-specified (by
underestimation of the delays). A truly rational organization would not only seek to learn about
better allocations, but would also seek to test its assumptions about the true temporal relationship
of actions and their impact; if its initial beliefs about the delay structure were wrong, experience
should reveal the problem and lead to a correctly specified model. We do not attempt to model
such second-order learning here and leave the resolution of this question to further research.
Nevertheless, the results suggest such sophisticated learning is likely to be difficult.
First, estimating delay length and distribution is difficult and requires substantial data' 4;
in some domains, such as the delay in the capital investment process, it took decades for
consensus about the length and distribution of the delay to emerge (see Sterman 2000, Ch. I I
and references therein). Results of an experimental study are illuminating: in a simple task (one
degree of freedom) with delays of one or two periods, Gibson (2000) shows that individuals had
a hard time learning over 120 trials. A simulation model fitted to human behavior and designed
to learn about delays, learns the delay structure only after 10 times more training data.
4 The Q-learning model discussed in the technical appendix (1-1) allows us to show
mathematically why the complexity of the state-space (which correlates well with time and data
needed to learn about delay structure) grows exponentially with the length of delays. Details can
be found in technical appendix (1-1); however, the basic argument is that if the action space has
the dimensionality O(D), then the possibility for one period of delay requires us to know the last
action as well as the current one to realize the payoff, increasing the dimensionality to O(D.D)
and for delays of length K to O(DK+l).
30
Second, the experimental research suggests people have great difficulty recognizing and
accounting for delays, even when information about their length and content is available and
salient (Sterman 1989 (a); Sterman 1989 (b); Sterman 1994); in our setting there are no direct
cues to indicate the length, or even the presence, of delays. Third, the outcome feedback the
decision maker receives may be attributed to changing payoff landscape and noise in the
environment, rather than to a problem with the initial assumptions about the delay structure.
Decline in convergence and learning speed should be the main feedback for the organization to
decide that its assumptions about the delay structure are erroneous. However, in the real world,
where the payoff is not solely determined by the organization's actions, several other explanatory
factors, from competitors' actions to sheer good luck and supernatural beliefs, come first in the
explanation of the inconsistent payoff feedback.
Our results are robust to rationality assumption of learning algorithm. Effects of time
delays persist for a large range of model complexity and rationality that is attributable to human
organizations; therefore, one can have higher confidence in applicability of these results to realworld phenomena. This is not to claim that no learning algorithm can be designed to account for
delays' 5; rather, for the range of learning algorithms and experiment opportunities realistically
applicable to organizations, delays between action and results become a significant barrier to
learning. This conclusion is reinforced by the amount of experimentation needed to learn about a
mismatch between the real-world delay structure and what the aggregate mental model of the
organization suggests.
A few different mechanisms contribute to failure of learning in our setting. First, in the
absence of a good mental model of the delay structure, the organization fails to make reasonable
causal attributions between different policies and results (the credit assignment problem (Samuel
1957; Minsky 1961)). Moreover, in the context of resource allocation, an increase in resources
allocated to any activity naturally cuts down the resources allocated to the other activities;
therefore, observing payoff of long-term activities requires going through a period of low payoff.
15There is a rich literature in machine learning, artificial intelligence, and optimal control theory
that searches for optimum learning algorithms in dynamic tasks. (See some examples at:
Kaelbling 1993; Barto, Bradtke et al. 1995; Kaelbling, Littman et al. 1996; Santamaria, Sutton et
al. 1997; Tsitsiklis and VanRoy 1997; Bertsekas 2000). The focus of this literature is on optimal
policies, rather than behavioral decision/learning rules and therefore these models are often
higher in information-need and level of rationality than the range of empirically plausible
decision-rules for human subjects or organizations.
31
For example, investing resources in process improvement will cut down the short-term payoff by
reducing the time spent on producing. When delays are not correctly perceived, the transitory
payoff shortfall can be attributed to lower effectiveness of the long-term activity, thereby
providing a mechanism for learning wrong lessons from experience. Finally, learning may be
difficult even when the delays are correctly perceived by the organization, especially when the
organization does not have a perfectly specified model of the payoff landscape. In the presence
of delays, even if they are correctly perceived, the information provided through payoff feedback
may be outdated and be relevant only for a different region of payoff landscape. Consider the
case where the delay is three periods and correctly perceived. In this case the organization
properly attributes the payoff to the decisions of 3 periods ago and correctly finds out in which
direction it could have changed the allocation decision 3 periods ago to improve the payoff.
However, during the intervening 3 periods it has been exploring different policies and therefore
the indicated direction of change in policy may no longer indicate the best direction to move.
Nevertheless, this type of error is not large in our experimental setting as a result of relatively flat
payoff surface near the peak.
Our modeling of learning includes simplifying assumptions that need to be clear in light
of their possible influence on learning performance and speed. First, in line with most other
learning models in the literature, these algorithms use stochastic exploration of different possible
actions, to learn from the feedbacks that they receive. This encourages a relatively large number
of exploratory moves compared to what most real-world situations allow and provides the
simulated organization with more data to learn from than most real settings. Moreover, costs and
resources are not modeled here and therefore there is no bound on exploration and no pressure on
the simulated organization to converge to any policy. These effects may increase the possibility
of convergence of the real-world organization and increase the speed of learning; however, it is
conceivable that their effect on quality of learning is not positive, possibly leading to more
learning of wrong lessons than the current analysis suggests. Such reinforcement of harmful
strategies is both theoretically interesting and practically important.
For a concrete example, consider a firm facing a production shortfall. Pressuring the
employees to work harder generates a short-run improvement in output, as they reallocate time
from improvement and maintenance to production. The resulting decline in productivity and
equipment uptime, however, comes only after a delay. It appears to be difficult for many
32
managers to recognize and account for such delays, so they conclude that pressuring their people
to work harder was the right thing to do, even if it is in fact harmful. Over time such attributions
become part of the organization's culture and have a lasting impact on firm strategy. Repenning
and Sterman (2002) show how, over time, managers develop strongly held beliefs that pressuring
workers for more output is the best policy, that the workers are intrinsically lazy and require
continuous supervision, and that process improvement is ineffective or impossible in their
organization--even when these beliefs are false. Such convergence to sub-optimal strategies as
a result of delays in payoff constitutes an avenue through which temporal complexity of payoff
landscape can create heterogeneity in firm performance (Levinthal 2000).
Future extensions of this research can take several directions. First, there is room for
developing more realistic learning models that capture the real-world challenges that
organizations face. Possible extensions can be designing more realistic payoff landscapes (e.g.,
modeling the plant explicitly) and going beyond simple stimuli-response models (e.g., including
culture formation). Second, it is intriguing to look into whether there is any interaction effect
between delays and rugged landscape. Third, it is possible to investigate the effect of delays on
learning in the presence of noise in payoff and measurement/perception delays. Fourth,
distribution of payoff over time may have important ramifications for routines learnt by
organizations. For example, it is conceivable that payoff that is distributed smoothly through
time does not pass the organizational attention threshold and is thus heavily discounted.
Therefore, activities that lead to distributed payoff can be under-represented in learnt routines.
Fifth, these results can help us better explain some empirical cases of learning failure. Sixth, one
can look at learning with delays in a population of organizations. Finally, it is important to look
at the feasibility of the second-order learning, i.e., learning about the delay structure.
Our research also has important practical implications. The results highlight the
importance of employing time horizons longer than typical strategies for evaluating performance.
This is important both in designing incentive structures inside the organization as well as in the
market-level evaluation of the firm's performance. Empirical evidence highlights this
conclusion. For example, Hendricks and Singhal (200 1) show the inefficiency of financial
markets, in that stock prices do not effectively reflect the positive effects of total quality
management programs on firm performance at the time the information about the quality
initiative becomes known to the public. Moreover, more effective data-gathering and accounting
33
procedures can be designed that explicitly represent the delays between different activities/costs
and the expected/observed payoff, and therefore improve the chances of organizational learning
in the presence of delays.
In sum, the results of our analysis suggest that learning can be significantly hampered and
slowed when the delays between action and payoff are not correctly taken into account. In fact,
under relatively short delays we repeatedly observe that all learning algorithms fail to achieve the
performance of a random allocation policy. The robustness of these results under a large
parameter space and their independence from level of rationality and information processing of
models adds to their external validity and generalizability. This research highlights the
importance of explicitly including the time dimension of action-payoff relationship in learning
research and offers a new dimension to explain several cases of learning failure in the real world
and the heterogeneity of firms' strategies.
34
References:
Allen, K. 1993. Maintenance diffusion at du pont: A system dynamics perspective. Sloan School
of Management. Cambridge, M.I.T.
Argote, L. and D. Epple. 1990. Learning-curves in manufacturing. Science 247(4945): 920-924.
Argyris, C. 1999. On organizational learning. Maiden, Mass., Blackwell Business.
Argyris, C. and D. A. Scho5n.1978. Organizational learning: A theory of action perspective.
Reading, Mass., Addison-Wesley Pub. Co.
Barto, A. G., S. J. Bradtke and S. P. Singh. 1995. Learning to act using real-time dynamicprogramming. Artificial Intelligence 72(1-2): 81-138.
Bertsekas, D. P. 2000. Dynamic programming and optimal control. Belmont, Mass., Athena
Scientific.
Busemeyer, J. R., K. N. Swenson and A. Lazarte. 1986. An adaptive approach to resource-
allocation.OrganizationalBehavior and Human DecisionProcesses38(3): 318-341.
Carroll, J. S., J. W. Rudolph and S. Hatakenaka. 2002. Learning from experience in high-hazard
organizations. Research in Organizational Behavior, Vol 24 24: 87-137.
Cyert, R. M. and J. G. March. 1963. A behavioral theory of thefirm. Englewood Cliffs, N.J.,,
Prentice-Hall.
DeGeus, A. P. 1988. Planning as learning. Harvard Business Review 66(2): 70-74.
Denrell, J., C. Fang and D. A. Levinthal. 2004. From t-mazes to labyrinths: Learning from
model-based feedback. Management Science 50(10): 1366-1378.
Denrell, J. and J. G. March. 2001. Adaptation as information restriction: The hot stove effect.
Organization Science 12(5): 523-538.
Diehl, E. and J. Sterman. 1995. Effects of feedback complexity on dynamic decision making.
OrganizationalBehaviorand Human DecisionProcesses62(2): 198-215.
Einhorn, H. J. and R. M. Hogarth. 1985. Ambiguity and uncertainty in probabilistic inference.
Psychological Review 92(4): 433-461.
Erev, I. and A. E. Roth. 1998. Xxxpredicting how people play games: Reinforcement learning in
experimental games with unique, mixed strategy equilibria. American Economic Review
88(4): 848-881.
Friedman, M. 1953. The methodology ofpositive economics. Essays in positive economics.
Chicago, IL, Chicago University Press.
Gavetti, G. and D. Levinthal. 2000. Looking forward and looking backward: Cognitive and
experiential search. Administrative Science Quarterly 45(1): 113-137.
Gibson, F. P. 2000. Feedback delays: How can decision makers learn not to buy a new car every
time the garage is empty? OrganizationalBehaviorand Human DecisionProcesses
83(1): 141-166.
Hendricks, K. B. and V. R. Singhal. 2001. The long-run stock price performance of firms with
effective tqm programs. Managmenet Science 47(3): 359-368.
Herriott, S. R., D. Levinthal and J. G. March. 1985. Learning from experience in organizations.
American Economic Review 75(2): 298-302.
Huber, G. 1991. Organizational learning: The contributing processes and literature. Organization
Science 2(1): 88-115.
Janis, I. L. 1982.Groupthink: Psychologicalstudies ofpolicy decisionsandfiascoes. Boston,
Houghton Miffin.
Kaelbling, L. P. 1993. Learning in embedded systems. Cambridge, Mass., MIT Press.
35
Kaelbling, L. P., M. L. Littman and A. W. Moore. 1996. Reinforcement
learning: A survey.
Journal ofArtificial IntelligenceResearch 4: 237-285.
Lant, T. K. and S. J. Mazias. 1992. An organizational learning model of convergence and
reorientation. Organization Science 3: 47-71.
Lant, T. K. and S. J. Mezias. 1990. Managing discontinuous change - a simulation study of
organizational learning and entrepreneurship. Strategic Management Journal 11: 147179.
Leonard-Barton, D. 1992. Core capabilities and core rigidities - a paradox in managing new
product development. Strategic Management Journal 13: 11-125.
Levinthal, D. 2000. Organizational capabilities in complex worlds. The nature and dynamics of
organizational capabilities. S. Winter. New York., Oxford University Press.
Levinthal, D. and J. G. March. 1981. A model of adaptive organizational search. Journal of
EconomicBehavior and Organization2: 307-333.
Levinthal, D. A. 1997. Adaptation on rugged landscapes. Management Science 43(7): 934-950.
Levinthal, D. A. and J. G. March. 1993. The myopia of learning. Strategic Management Journal
14: 95-112.
Levitt, B. and J. G(.March. 1988. Organizational learning. Annual Review of Sociology 14: 319340.
Lipe, M. G. 1991. Counterfactual reasoning as a framework for attribution theories.
Psychological Bulletin 109(3): 456-471.
March, J. G. 1991. Exploration and exploitation in organizational learning. Organization Science
2(1): 71-87.
Meadows, D. H. and Club of Rome. 1972. The limits to growth; a reportfor the club of rome's
project on the predicament of mankind.New York,, Universe Books.
Milgrom, P. and J. Roberts. 1990. The economics of modern manufacturing - technology,
strategy, and organization. American Economic Review 80(3): 511 -528.
Minsky, M. 1961. Steps toward artificial intelligence. Proceedings Institute of Radio Engineers
49: 8-30.
Nelson, R. R. and S. G. Winter. 1982. An evolutionary theory of economic change. Cambridge,
Mass., Belknap Press of Harvard University Press.
Paich, M. and J. Sterman. 1993. Boom, bust, and failures to learn in experimental markets.
Management Science 39(12): 1439-1458.
Raghu, T. S., P. K. Sen and H. R. Rao. 2003. Relative performance of incentive mechanisms:
Computational modeling and simulation of delegated investment decisions. Management
Science 49(2): 160-178.
Repenning, N. P. and J. D. Sterman. 2002. Capability traps and self-confirming attribution errors
in the dynamics of process improvement. Administrative Science Quarterly 47: 265-295.
Rivkin, J. W. 2000. Imitation of complex strategies. Management Science 46(6): 824-844.
Rivkin, J. W. 2001. Reproducing knowledge: Replication without imitation at moderate
complexity. Organization Science 12(3): 274-293.
Samuel, A. 1957. Some studies in machine learning using the game of checkers. IBM Journal of
Researchand Development3: 210-229.
Santamaria, J. C., R. S. Sutton and A. Ram. 1997. Experiments with reinforcement
learning in
problems with continuous state and action spaces. Adaptive Behavior 6(2): 163-217.
Senge, P. M. 1990.Thefifth discipline: The art andpractice of the learning organization.New
York, Currency Doubleday.
36
Sengupta, K., T. K. Abdel-Hamid and M. Bosley. 1999. Coping with staffing delays in software
project management: An experimental investigation. Ieee Transactions on Systems Man
and CyberneticsPart a-Systemsand Humans29(1): 77-91.
Siggelkow, N. 2002. Evolution toward fit. Administrative Science Quarterly 47(1): 125-159.
Sterman, J. D. 1989. Misperception of feedbak in dynamic decision making. Organizational
behaviorand human decisionprocesses 43: 301-335.
Sterman, J. D. 1989. Modeling managerial behavior: Misperceptions of feedback in a dynamic
decision making experiment. Management Science 35(3): 321-339.
Sterman, J. D. 1994. Learning in and about complex systems. System Dynamics Review 10(2-3):
91-330.
Sutton, R. S. and A. G. Barto. 1998. Reinforcement learning: An introduction. Cambridge, The
MIT Press.
Tsitsiklis, J. N. and B. VanRoy. 1997. An analysis of temporal-difference learning with function
approximation. Ieee Transactions on Automatic Control 42(5): 674-690.
Tushman, M. L. and E. Romanelli. 1985. Organizational evolution: A metamorphosis model of
convergence and revolution. Research in Organizational Behavior 7: 171-122.
Warren, H. C. 1921. A history of the association psychology. New York Chicago etc., C.
Scribner's sons.
Watkins, C. 1989. Learning from delayed rewards. Ph.D. Thesis. Cambridge, UK, Kings'
College.
37
Essay 2- Heterogeneity and Network Structure in the Dynamics of Diffusion:
Comparing Agent-Based and Differential Equation Models'6
Abstract
When is it better to use agent-based (AB) models, and when should differential equation (DE)
models be used? Where DE models assume homogeneity and perfect mixing within
compartments, AB models can capture heterogeneity in agent attributes and in the network of
interactions among them. The costs and benefits of such disaggregation should guide the choice
of model type. AB models may enhance realism but entail computational and cognitive costs
that may limit sensitivity analysis and model scope. Using contagious disease as an example, we
contrast the dynamics of AB models with those of the analogous mean-field DE model. We
examine agent heterogeneity and the impact of different network topologies, including fully
connected, random, Watts-Strogatz small world, scale-free, and lattice networks. Surprisingly,
in many conditions differences between the DE and AB dynamics are not statistically significant
for key metrics relevant to public health, including diffusion speed, peak load on health services
infrastructure and total disease burden. We discuss implications for the choice between AB and
DE models, level of aggregation, and model boundary. The results apply beyond epidemiology:
from innovation adoption to financial panics, many important social phenomena involve
analogous processes of diffusion and social contagion.
Keywords: Agent Based Models, Networks, Scale free, Small world, Heterogeneity,
Epidemiology, Simulation, System Dynamics, Complex Adaptive Systems, SEIR model
Introduction
Spurred by growing computational power, agent-based modeling (AB) is increasingly
applied to physical, biological, social, and economic problems previously modeled with
nonlinear differential equations (DE). Both approaches have yielded important insights. In the
social sciences, agent models explore phenomena from the emergence of segregation to
organizational evolution to market dynamics (Schelling 1978; Levinthal and March 1981; Carley
1992; Axelrod 1997; Lomi and Larsen 2001; Axtell, Epstein et al. 2002; Epstein 2002;
Tesfatsion 2002). Differential and difference equation models, also known as compartmental
models, have an even longer history in social science, including innovation diffusion (Mahajan,
16
Co-authored with John Sterman
38
Muller et al. 2000) epidemiology (Anderson and May 1991), economics (Black and Scholes
1973), (Samuelson 1939), and many others.
When should AB models be used, and when are DE models most appropriate? Each
method has strengths and weaknesses. Nonlinear DE models often have a broad boundary
encompassing a wide range of feedback effects but typically aggregate agents into a relatively
small number of states (compartments). For example, models of innovation diffusion may
aggregate the population into categories including unaware, aware, in the market, recent
adopters, and former adopters ((Urban, Houser et al. 1990; Mahajan, Muller et al. 2000).
However, the agents within each compartment are assumed to be homogeneous and well mixed;
the transitions among states are modeled as their expected value, possibly perturbed by random
events. In contrast, AB models can readily include heterogeneity in agent attributes and in the
network structure of their interactions; like DE models, these interactions can be deterministic or
stochastic.
The granularity of AB models comes at some cost. First, the extra complexity
significantly increases computational requirements, constraining the extent of sensitivity analysis
to investigate the robustness of results. A second cost of agent-level detail is the cognitive burden
of understanding model behavior. Linking the observed behavior of a model to its structure
rapidly becomes more difficult as model complexity grows. A third tradeoff concerns the
communication of results: the more complex a model, the harder it is to communicate the basis
for its results to stakeholders and decision makers, and thus have an impact on policy. Finally,
limited time and resources force modelers to trade off disaggregate detail and the scope of the
model boundary. Model boundary here stands for the richness of the feedback structure captured
endogenously in the model. For example, an agent-based demographic model may portray each
individual in a population and capture differences in age, sex, and zip code, but assume
exogenous fertility and mortality; such a model has a narrow boundary. In contrast, an aggregate
model may lump the entire population into a single compartment, but model fertility and
mortality as depending on food per capita, health care, pollution, norms for family size, etc., each
of which, in turn, are modeled endogenously; such a model has a broad boundary. DE and AB
models may in principle fall anywhere on these dimensions of disaggregation and scope,
including both high levels of disaggregate detail and a broad boundary. In practice, however,
where time, budget, and computational resources are limited, modelers must trade off
39
disaggregate detail and breadth of boundary. Understanding where the agent-based approach
yields additional insight and where such detail is unimportant is central in selecting appropriate
methods for any problem.
The stakes are large. Consider potential bioterror attacks. Kaplan, Craft, and Wein
(2002) used a deterministic nonlinear DE model to examine smallpox attack in a large city,
comparing mass vaccination (MV), in which essentially all people are vaccinated after an attack,
to targeted vaccination (TV), in which health officials trace and immunize those contacted by
potentially infectious individuals. Capturing vaccination capacity and logistics explicitly, they
conclude MV significantly reduces casualties relative to TV. In contrast, Eubank et al. (2004)
and Halloran et al. (2002), using different AB models, conclude TV is superior, while Epstein et
al. (2004) favor a hybrid strategy. The many differences among these models make it difficult to
determine whether the conflicting conclusions arise from relaxing the perfect mixing and
homogeneity assumptions of the DE (as argued by Halloran et al. (2002)) or from other
assumptions such as the size of the population (ranging from 10 million for the DE model to
2000 in Halloran et al. to 800 in Epstein et al.), capacity constraints on immunization,
parameters, etc. (Koopman 2002; Ferguson, Keeling et al. 2003; Kaplan and Wein 2003).
Kaplan and Wein (2003) show that their deterministic DE model closely replicates the Halloran
et al. results when simulated with the Halloran et al. parameters, including scaled down
population and initial attack size, indicating that differences in the logistics of MV and TV are
responsible for the different conclusions, not differences in mixing and homogeneity. Halloran
and Longini (2003) respond that the equivalence may not hold when population is scaled up to
city size due to imperfect mixing and agent heterogeneity, but do not test this proposition
because simulation of the AB model with 10 million agents is computationally prohibitive.
Here we carry out controlled experiments to compare AB and DE models in the context
of contagious disease. We choose disease diffusion for four reasons. First, the effects of
different network topologies for contacts among individuals are important in the diffusion
process (Davis 1991; Watts and Strogatz 1998; Barabasi 2002; Rogers 2003), providing a strong
test for differences between the two approaches. Second, the dynamics of contagion involve
many important characteristics of complex systems, including positive and negative feedbacks,
time delays, nonlinearities, stochastic events, and agent heterogeneity. Third, there is a rich
literature on epidemic modeling. The DE paradigm is well developed in epidemiology (for
40
reviews see Anderson and May (1991) and Hethcote (2000)), and AB models with explicit
network structures are on the rise (for reviews see Newman (2002; 2003) and Watts (2004)).
Finally, diffusion is a fundamental process in diverse physical, biological, social, and
economic settings. Many diffusion phenomena in human systems involve processes of social
contagion analogous to infectious disease, including word of mouth, imitation, and network
externalities. From the diffusion of innovations, ideas, and new technologies within and across
organizations, to rumors, financial panics and riots, contagion-like dynamics, and formal models
of them, have a rich history in the social sciences (Bass 1969; Watts and Strogatz 1998;
Mahajan, Muller et al. 2000; Barabasi 2002; Rogers 2003). Insights into the advantages and
disadvantages of AB and DE models in epidemiology can inform understanding of diffusion in
many domains of concern to social scientists and managers.
Our procedure is as follows. We develop an AB version of the classic SEIR model, a
widely used nonlinear deterministic DE model. The DE version divides the population into four
compartments: Susceptible (S), Exposed (E), Infected (I), and Recovered (R). In the AB model,
each individual is separately represented and must be in one of the four states. The same
parameters are used for both AB and DE models. Therefore any differences in outcomes arise
only from the relaxation of the mean-field assumptions of the DE model. We run the AB model
under five widely used network topologies (fully connected, random, small world, scale-free, and
lattice) and test each with homogeneous and heterogeneous agent attributes such as the rate of
contacts between agents. We compare the DE and AB dynamics on a variety of key metrics
relevant to public health, including cumulative cases, peak prevalence, and the speed the disease
spreads.
As expected, different network structures alter the timing of the epidemic. The behavior
of the AB model under the connected and random networks closely matches the DE model, as
these networks approximate the perfect mixing assumption. Also as expected, the stochastic AB
models generate a distribution of outcomes, including some in which, due to random events, the
epidemic never takes off. The deterministic DE model cannot generate this mode of behavior.
Surprisingly, however, even in conditions that violate the perfect mixing and
homogeneity assumptions of the DE, the differences between the DE and AB models are not
statistically significant for many of the key public health metrics (with the exception of the
lattice network, where all contacts are local). The dynamics of different networks are close to
41
the DE model: even a few long-range contacts or highly connected hubs seed the epidemic at
multiple points in the network, enabling it to spread rapidly. Heterogeneity also has small effects.
We also examine the ability of the DE model to capture the dynamics of each network
structure in the realistic situation where key parameters are poorly constrained by biological and
clinical data. Epidemiologists often estimate potential diffusion, for both novel and established
pathogens, by fitting models to the aggregate data as an outbreak unfolds (Dye and Gay 2003;
Lipsitch, Cohen et al. 2003; Riley, Fraser et al. 2003). Calibration of innovation diffusion and
new product marketing models is similar (Mahajan et al. (2000)). We mimic this practice by
treating the A13simulations as the "real world" and fitting the DE model to them. Surprisingly,
the fitted model closely matches the behavior of the AB model under all network topologies and
heterogeneity conditions tested. Departures from homogeneity and perfect mixing are
incorporated into the best-fit values of parameters.
The concordance between DE and AB model suggests DE models remain useful and
appropriate in many situations, even when mixing is imperfect and agents are heterogeneous.
Since time and resources are always limited, modelers must trade off the data requirements and
the computational burden of disaggregation against the breadth of the model boundary and the
ability to do sensitivity analysis. The detail possible in AB models is likely to be most helpful
where the contact network is highly clustered, known, and stable, where results depend delicately
on agent heterogeneity or stochastic events (as in small populations), where the relevant
population is small enough to be computationally feasible, and where time and budget permit
sensitivity testing. DE models will be most appropriate where network structure is unknown,
labile, or moderately clustered, where results hinge on the incorporation of a wide range of
feedbacks among system elements (a broad model boundary), where large populations must be
treated, and where fast turnaround or extensive sensitivity analysis are required.
The next section reviews the literature comparing AB and DE approaches. We then
describe the structure of the epidemic models, the design of the simulation experiments, and
results, closing with implications and directions for future work.
A spectrum of aggregation assumptions: AB and DE models should be viewed as regions in a
space of modeling assumptions rather than incompatible modeling paradigms. Aggregation is
one of the key dimensions of that space. Models can range from lumped deterministic
differential equations, to stochastic compartmental models in which the continuous variables of
42
the DE are replaced by discrete individuals, to event history models where the states of
individuals are tracked but their network of relationships is ignored, to dynamic network models
where networks of individual interactions are explicit (e.g., Koopman et al. (2001). Modelers
need to assess the cost and benefits of these alternatives relative to the purpose of the analysis.
A few studies compare AB and DE models. Axtell et al. (1996) call for "model
alignment" or "docking" and illustrate with the Sugarscape model. Edwards et al. (2003)
contrast an AB model of innovation diffusion with an aggregate model, finding that the two can
diverge when multiple attractors exist in the deterministic model. In epidemiology, Jacquez and
colleagues (Jacquez and O'Neill 1991; Jacquez and Simon 1993) analyzed the effects of
stochasticity and discrete agents in the SIS and SI models, finding some differences for very
small populations. However, the differences practically vanish for populations above 100.
Heterogeneity has also been explored in models that assume different mixing sites for population
members. Anderson and May (1991, Chapter 12) show that the immunization fraction required
to quench an epidemic rises with heterogeneity if immunization is implemented uniformly but
falls if those most likely to transmit the disease are the focus of immunization. Ball et al. (1997)
and Koopman et al. (Koopman, Chick et al. 2002) find expressions for cumulative cases and
global epidemic thresholds in stochastic SIR and SIS models with global and local contacts,
concluding that the behavior of deterministic and stochastic DE models can diverge for small
populations. Keeling (1999) formulates a DE model that approximates the effects of spatial
structure even when contact networks are highly clustered. Chen et al. (2004) develop AB
models of smallpox, finding the dynamics generally consistent with DE models. In sum, AB and
DE models of the same phenomenon sometimes agree and sometimes diverge, especially for
smaller populations.
Model Structure: The SEIR model is a deterministic nonlinear differential equation model in
which all members of a population are in one of four states-Susceptible,
Exposed, Infected, and
Recovered. Infected (symptomatic) individuals can transmit the disease to susceptibles before
they are "removed" (i.e., recover or die). The exposed compartment captures latency between
infection and the emergence of symptoms. Depending on parameters, exposed individuals may
be infectious, and can alternatively be called early stage infectious. Typically, the exposed have
more contacts than the infectious because they are healthier and are often unaware that they are
potentially contagious, while the infectious are ill and often self-quarantined.
43
SEIR models have been successfully applied to many diseases. Additional states are
often introduced to capture more complex disease lifecycles, diagnostic categories, therapeutic
protocols, population heterogeneity, assortative mixing, birth/recruitment of new susceptibles,
loss of immunity, etc. (Anderson and May (1991) and Murray (2002) provide comprehensive
discussion of the SEIR model and its extensions). In this study we maintain the boundary
assumptions of the traditional SEIR model (four stages, fixed population, permanent immunity).
The DE implementation of the model makes several additional assumptions, including perfect
mixing and homogeneity of individuals within each compartment, exponential residence times in
infected states, and mean field aggregation (the flows between compartments equal the expected
value of the sum of the underlying probabilistic rates for individual agents).17 To derive the
differential equations, consider the rate at which each infectious individual generates new cases:
ci,*Prob(Contactwith Susceptible)*Prob(TransmissionlContact
with Susceptible)
(1)
where the contact frequency cis is the expected number of contacts between infectious individual
i and susceptible individual s per time period; homogeneity implies cis is a constant, denoted c1s,
for all individuals i and s. If the population is well mixed, the probability of contacting a
susceptible individual is simply the proportion of susceptibles in the total population, S/N.
Denoting the probability of transmission given contact between individuals i and s, or infectivity,
as iis(which, due to homogeneity, equals ils for all i, s), and summing over the infectious
population yields the total flow of new cases generated by contacts between I and S populations,
cis*ils*l*(S/N). The number of new cases generated by contacts between the exposed and
susceptibles is formulated analogously, yielding the total Infection Rate,f
f =(C,.*it *E + cs *is *I)*(S/N).
(2)
Assuming each compartment is well mixed and denoting the mean emergence (incubation) time
and disease duration E and 6, respectively, yields:
e = E/E and r = I/.
Thefull model isthus:
dS
= -f,
dt
(3)
dE
= f-e,
dt
dl
dR
- =e- r,
=r.
dt
dt
(4)
17 Due to data limitations the parameters c, i, and N are often lumped together as P=c*i/N; [ is
estimated from aggregate data such as reported incidence and prevalence. We distinguish these
parameters since they are not necessarily equal across individuals in the AB implementation.
44
The perfect mixing assumption in eq (3) implies residence times in the E and I stages are
distributed exponentially. Exponential residence times are not realistic for most diseases, where
the probability of emergence and recovery is initially low, then rises, peaks and falls. Note,
however, that any lag distribution can be captured through the use of partial differential
equations, approximated in the ODE paradigm by adding additional compartments within the
exposed and infectious categories (Jacquez and Simon 2002). Here we maintain the traditional
SEIR single-compartment structure since the exponential distribution has greater variance than
higher-order distributions, maximizing the difference between the DE and AB models.
The AB model relaxes the perfect mixing and homogeneity assumptions. Each
individual j is in one of the four states S[j], E[j], I[j], and R[j] forj E (1,
, N). State transitions
depend on the agent's state, agent-specific attributes such as contact frequencies, and interactions
with other agents as specified by the contact network among them. The aggregate rates f, e, and r
are the sum of the individual transitions flj], e[j], and r[j]. The technical appendix (2-1) details
the formulation of the AB model and shows how the DE model can be derived from it by
assuming homogeneous agents and applying the mean-field approximation.
Table I- Base case parameters. The Technical appendix (2-1) documents the construction of
each network used in the AB simulations.
Parameter (dimensionless)
ES
Infectivity, Exposed
Infectivity, Infectious
is
Basic reproduction rate
R0
Average links per node
k
Prob of long-range links (SW) Psw
Scaling exponent (scale-free)
Y
0.05
0.06
4.125
10
0.05
2.60
Parameter
Total population
Contact rate, Exposed
Contact rate, Infectious
Average incubation time
Average duration of illness
N
CES
s
s
6
200
4
1.25
15
15
Units
Person
1/Day
1/Day
Day
Day
_
The key parameter in epidemic models is the basic reproduction number, Ro, the expected
number of new cases each contagious individual generates before recovering, assuming all others
are susceptible. We use parameters roughly similar to smallpox (Halloran, Longini et al. 2002;
Kaplan, Craft et al. 2002; Ferguson, Keeling et al. 2003).18Smallpox has a mid-range
reproduction number, Ro
3 6 (Gani and Leach 2001), and therefore is a good choice to observe
potential differences between DE and AB models: diseases with R0 < 1 die out quickly, while
Smallpox-specific models often add a prodromal period between the incubation and infectious
stages (e.g. Eubank et al. (2004)); for generality we maintain the SEIR structure.
18
45
those with Ro >> 1, e.g. chickenpox and measles, generate a severe epidemic in (nearly) any
network topology. We set Ro 4 for the DE (Table 1). The AB models use the same values for
the infectivities and expected residence times, and we set individual contact frequencies so that
the mean contact rate in each network structure and heterogeneity condition is the same as the
DE model. We use a small population, N = 200, all susceptible except for two randomly chosen
exposed individuals. While small compared to settings of interest in policy design, e.g., cities,
the effects of random events and network structure are likely to be more pronounced in small
populations (Kaplan and Wein 2003), providing a stronger test for differences between the DE
and AB models. A small population also reduces computation time, allowing more extensive
sensitivity analysis. The DE model has 4 state variables, independent of population size;
computation time is negligible for all population sizes. The AB model has 4*N states, and must
also track interactions among the N individuals. Depending on the density of the contact
network, the AB model can grow at rates up to O(N2). With a population of only 200 the AB
model contains 5 to 50 thousand variables, depending on the network, plus associated parameters
and random number calls. We report sensitivity analysis of population size below. The technical
appendix (2-1) includes the models and full documentation.
Experimental design: We vary both the network structure of contacts among individuals and
the degree of agent heterogeneity in the AB model and compare the resulting dynamics to the DE
version. We implement a full factorial design with five network structures and two
heterogeneity conditions, yielding ten experimental conditions. In each condition we generate an
ensemble of 1000 simulations of the AB model, each with different realizations of the random
variables determining contacts, emergence, and recovery. Each simulation of the random, small
world, and scale-free networks uses a different realization of the network structure (the
connected and lattice networks are deterministic). Since the expected values of parameters in
each simulation are identical to the DE model, differences in outcomes can only be due to
differences in network topology, agent heterogeneity, or the discrete, stochastic treatment of
individuals in the AB model.
Network topology: The DE model assumes perfect mixing, implying anyone can meet anyone
else with equal probability. Realistic networks are far more sparse, with most people contacting
only a small fraction of the population. Further, most contacts are clustered among neighbors
46
with physical proximity.' 9 We explore five different network structures: fully connected (each
person can meet everybody else), random (Erdos and Renyi 1960), small-world (Watts and
Strogatz 1998), scale-free (Barabasi and Albert 1999), and lattice (where contact only occurs
between neighbors on a ring). We parameterize the model so that all networks (other than the
fully connected case) have the same mean number of links per node, k = 10 (Watts 1999).
The fully connected network corresponds to the perfect mixing assumption of the DE
model. The random network is similar except people are linked with equal probability to a
subset of the population. To test the network most different from the perfect mixing case, we
also model a one-dimensional ring lattice with no long-range contacts. With k = 10 each person
contacts only the five neighbors on each side. The small world and scale-free networks are
intermediate cases with many local and some long-distance links. These widely used networks
characterize a number of real systems (Watts 1999; Barabasi 2002). We set the probability of
long-range links in the small world networks to 0.05, in the range used by (Watts 1999). We
build the scale-free networks using the preferential attachment algorithm (Barabasi and Albert
1999) in which the probability a new node links to existing nodes is proportional to the number
of links each node already has. Preferential attachment yields a power law for the probability
that a node has k links, Prob(k) = ak - . Empirically y typically falls between 2 and 3; the mean
value of y in our experiments is 2.60. Scale free networks contain a few "hubs" with many links,
with most others mainly connected to these hubs. The technical appendix (2-1) details the
construction of each network.
Agent Heterogeneity: Each individual has four relevant characteristics: expected contact rate,
infectivity, emergence time, and disease duration. In the homogeneous condition (H=) each
agent is identical with parameters set to the values of the DE model. In the heterogeneous
condition (H-) we vary individual contact frequencies. Focusing on contact frequencies reduces
the dimensionality of required sensitivity analysis at little cost. First, only the product of the
contact rate and infectivity matters to R0o and the spread of the disease (see note 2). Second, the
exponential distributions for c and 6 introduce significant variation in realized individual
incubation and disease durations even when the expected value is equal for all individuals.
19 One measure of clustering is the mean clustering coefficient(l lN)Yi2ni l[k i (k i -1)] where n,
is the total number of links among all nodes linked to i and ki is the number of links of the ith
node (Watts and Strogatz 1998).
47
Heterogeneity in contacts is modeled as follows. Given that two people are linked (that
they can come into contact), the frequency of contact between them depends on two factors.
First, how often does each use their links, on average: some people are gregarious; others shy.
Second, time constraints may limit contacts. At one extreme, the frequency of link use may be
constant, so that people with more links have more total contacts per day, a reasonable
approximation for some airborne diseases and easily communicated ideas: a professor may
transmit an airborne virus or simple concept to many people with a single sneeze or comment,
(roughly) independent of class size. At the other extreme, if the time available to contact people
is fixed, the chance of using a link is inversely proportional to the number of links, a reasonable
assumption when transmission requires extended personal contact: the professor can only tutor a
limited number of people each day. We capture these effects by assigning individuals different
propensities to use their links, lij],yielding the expected contact frequency for the link between
individuals i and j, c[ij]:
(5)
c[ij]=Kc*[i]*X]/ (k[i]*kj])
where kUj]is the total number of links individual j has,
T
captures the time constraint on contacts,
and K is a constant chosen to ensure that the expected contact frequency for all individuals equals
the mean value used in the DE model. In the homogeneous condition
T
=
and Xj] = 1 for all j
so that in expectation all contact frequencies are equal, independent of how many links each
individual has. In the heterogeneous condition, those with more links have more contacts per
day ( = 0), and individuals have different propensities to use their links. We use a uniform
distribution with a large range, 3[j] - U[0.25, 1.75].
Calibrating the DE Model: In real world applications the parameters determining the basic
reproduction rate Ro are often poorly constrained by biological and clinical data. For emerging
diseases such as vCJD, BSE and avian flu data are not available until the epidemic has already
spread. Parameters are usually estimated by fitting models to aggregate data as an outbreak
unfolds; SARS provides a typical example (Dye and Gay 2003; Lipsitch, Cohen et al. 2003;
Riley, Fraser et al. 2003). Because Roalso depends on contact networks that are often poorly
known, models of established diseases are commonly estimated the same way (e.g., (Gani and
Leach 2001)). To mimic this protocol we treat the AB model as the "real world" and estimate the
parameters of the SEIR DE to yield the best fit to the cumulative number of cases. We estimate
48
the infectivities (iESand is), and incubation time () by nonlinear least squares in 200 AB
simulations randomly selected from the full set of 1000 in each network and heterogeneity
condition, a total of 2000 calibrations (see the technical appendix (2-1)). Results assess whether a
calibrated DE model can capture the stochastic behavior of individuals in realistic settings with
different contact networks.
Results: For each of the ten experimental conditions we compare the simulated patterns of
diffusion on four measures relevant to public health. The maximum symptomatic infectious
population (peak prevalence, Imax)indicates the peak load on public health infrastructure
including health workers, immunization resources, hospitals and quarantine facilities. The time
from initial exposure to the maximum of the infected population (the peak time, Tp) measures
how quickly the epidemic spreads and therefore how long officials have to deploy those
resources. The fraction of the population ultimately infected (thefinal size, F) measures the total
burden of morbidity and mortality. Finally, we calculate the fraction of the epidemic duration in
which the infectious population in the DE model falls within the envelope encompassing 95% of
the AB simulations (the envelopment fraction, E9 5). The envelopment fraction indicates what
fraction of the time the trajectory of the DE falls significantly outside the range of outcomes
caused by stochastic variations in the behavior of individual agents. 20 To illustrate, figure I
compares the base case DE model with a typical simulation of the AB model (in the
heterogeneous scale-free case). The sample scale-free epidemic grows slower than the DE (Tp =
60 vs. 48 days), has similar peak prevalence (Imax= 25% vs. 27%), and ultimately afflicts fewer
people (F = 89% vs. 98%).
Figure 2 shows the symptomatic infectious population (I) in 1000 AB simulations for
each network and heterogeneity condition. Also shown are the mean of the ensemble and the
trajectory of the DE model with base case parameters. Table 2 reports the results for the fitted
DE models; Tables 3-5 report T, Imax,F and E9 s for each condition, and compare them to the
base and fitted DE models. Except for the lattice, the AB dynamics are qualitatively similar to
the DE model. Initial diffusion is driven by the positive contagion feedbacks as exposed and
infectious populations spread the infection, further increasing the prevalence of infectious
individuals. The epidemic peaks when the susceptible population is sufficiently depleted that the
20 We calculate E9 s for 125 days, when cumulative cases in the base DE model settles within 2%
of its final value. Longer periods further reduce the discrepancy between the two models.
49
(mean) number of new cases generated by contagious individuals is less than the rate at which
they are removed from the contagious population.
Figure 1. Left: DE model with base case parameters (Table 1), showing the prevalence fraction for each
disease stage. Right: Typical simulation of the equivalent AB model (the heterogeneous condition of the scale
free network).
0
0c
0
.-
0
30
60 Days 90
120
150
0
30
60 Days 90
120
150
Table 2. Median estimated basic reproduction number, Ro = CES*iES*E
+ ClS*ilS*6,for the DE
model fit to the recovered population in 200 randomly selected runs of the AB model for each
cell of the experimental design.
Parameter
Connected
R
Small World
H,
H.
H*
H.
2.96
3.09
2.55
2.90
2.43
1.71
0.62
0.58
0.52
0.67
0.61
Median
0.999
0.999
0.999
0.999
0.999
a
0.025
0.049
0.017
0.050
0.042
Median
CES*iEs*E+CIs*i
lSa
z
Scale-free
4.19
H=
Implied Ro=
Random
H,
H=
Lattice
H*
H=
3.33
2.52
1.56
1.35
0.88
0.66
0.81
0.55
0.999
0.998
0.998
0.985
0.987
0.071
0.040
0.059
0.056
0.043
H,
Departures from the DE model, while small, increase from the connected to the random,
scale free, small world, and lattice structures (Figure 2; tables 3-5). The degree of clustering
explains these variations. In the fully connected and random networks the chance of contacts in
distal regions is the same as for neighbors. The positive contagion feedback is strongest in the
connected network because an infectious individual can contact everyone else, minimizing local
contact overlap. In contrast, the lattice has maximal clustering. When contacts are localized in a
small region of the network, an infectious individual repeatedly contacts the same neighbors. As
these people become infected the chance of contacting a susceptible and generating a new case
declines, slowing diffusion, even if the total susceptible population remains high.
50
In the deterministic DE model there is always an epidemic if Ro > 1. Due to the stochastic
nature of interactions in the AB model, it is possible that no epidemic occurs or that it ends early
as the few initially contagious individuals recover before generating new cases. As a measure of
early burnout, table 3 reports the fraction of cases where cumulative cases remain below 10%.21
Early burnout ranges from 1.8% in the homogeneous connected case to 5.9% in the
heterogeneous lattice. Heterogeneity raises the incidence of early burnout in each network since
there is a higher chance that the first patients will have low contact rates and will recover before
spreading the disease. Network structure also affects early burnout. Greater contact clustering
increases the probability that the epidemic burns out in a local patch of the network before it can
jump to other regions, slowing diffusion and increasing the probability of global quenching.
The impact of heterogeneity is generally small, despite a factor of seven difference
between the lowest and highest contact frequencies. Heterogeneity yields a slightly smaller final
size, F, in all conditions: the mean reduction over all ten conditions is 0.09, compared to a mean
standard deviation in F across all conditions of 0.19 the expected difference in F is small
relative to the expected variation in F from differences due to stochastic events during the
epidemic. Similarly, heterogeneity slightly reduces Tp in all conditions (by a mean of 4.5 days,
compared to an average standard deviation of 25 days). Maximum prevalence (Imax)also falls in
all conditions (by 1.5%, compared to a mean standard deviation of 5%). In heterogeneous cases
the more active individuals tend to contract the disease sooner, causing a faster take-off
compared to the homogeneous case (hence earlier peak times). These more active individuals
also recover sooner, reducing the mean contact frequency and hence the reproduction rate among
those who remain faster than the homogeneous case. Subsequent diffusion is slower, peak
prevalence is smaller, and the epidemic ends sooner, yielding fewer cumulative cases. However,
these effects are small relative to the variation from simulation to simulation within each network
and heterogeneity condition caused by the underlying stochastic processes for contact,
emergence and recovery.
Figure 2 (next page): Comparison of the DE SEIR model to the AB model. Graphs show symptomatic cases
as % of population (I/N). Each panel shows 1000 simulations of the AB model for each network and
heterogeneity condition. The thin black line is the base case DE model with parameters as in Table 1. The
thick red lines show the mean of the AB simulations. Also shown are the envelopes encompassing 50%, 75%
21 Except
for the lattice, the results are not sensitive to the 10% cutoff. The technical appendix
(2-1) shows the histogram of final size for each network and heterogeneity condition.
51
and 95% of the AB simulations.
_
Homogeneous
40
I
Heterogeneous
____
i
-_
- __
__7
__
-
i
I
I
30
a)
a)
0
0
20
I
10
i
0
AA
"u
ii
i
30
iI
E
O
rc
i
20
i
i
i
10
iI
i[
0
AllI
-·v
I
i
I
I
30
U2
20
U
U
Cu,
T
10
-t -
-
-
- - I-
_
0
Al
I'
X
i
,
30
/x i
U
I
%
,
20
E
co
I"'
I
,
:
a.
\
i~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I~~~
i
i
I
10
i
I
i
0
i
1~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I
,
...
.!.: .
i
. .....
..
.
I ,
I
20
C
--1
I
d
30
U
|
!
f
4U
ix
|
-
I
r
iI
I.
10
i
_
t
..
Wm-
0
=|-
I
............
75
0
150
Time (Day)
225
30( 0
w
52
75
150
Time (Day)
225
30(
Table 3. Mean and standard deviation of Final Size (F) for the AB simulations. */** indicates
the base DE value of F (0.98) falls outside the 95/99% confidence bound defined by the
ensemble of AB simulations. These distributions are typically nonnormal (e.g., F is bounded by
100%). Assuming normality would further reduce the significance of the differences between
the AB and DE models. (2) The percentage of AB simulations with F < 0.10 indicates the
Drevalence of early burnout. (3) Mean and standard deviation of F for the fitted DE models.
. _ _ _ __ _ _
r - _
_,
_
_
_
__ _
_
/
Scale-free
Random
Connected
H.
H.
H*
H.
H,
1
AB Mean
0.97
0.90*
(a)
0.13)
(0.19)
0.92*
(0.15)
0.86**
(0.17)
2
% F < 0.10
3
Fitted DE p
1.8
0.98
(0.07)
4.4
0.91
(0.17)
2.7
0.92
(0.15)
3.8
0.86
(0.18)
(a)
Small World
H.
H*
0.90**
0.84**
(0.18)
3.9
(0.20)
5.6
0.90
(0.18)
0.82
(0.22)
Lattice
H.
0.65
H*
0.51**
(0.29)
(0.26)
H*
0.92
(0.17)
0.83**
(0.21)
2.6
0.92
(0.17)
4.8
0.83
(0.22)
3.1
5.9
0.62
(0.28)
0.50
(0.26)
Table 4. Mean and standard deviation of peak time Tp and peak prevalence Imaxin 1000
simulations of the AB model for each experimental condition. In the base DE model,. */**
indicates the values of in the base DE model (Tp = 48 days and Imax= 27.1%) fall outside the
95/99% confidence bounds defined by the ensemble of AB simulations. Also reported are the
mean and standard deviation of Tp and Ima,,for the fitted DE models.
Connected
E
|
p
a
[1
|*
I
>
C
'~
=
.
]
Scale-free
Small World
H.
Ho
H.
Random
H*
H.
H*
H.
AB Mean
AB a
Fitted DE p
Fitted DE
49.8
10.8
51.3
9
44.9
12.4
49.6
21.3
52.8
13.4
56.6
21.3
49.5
14.2
58
37.7
45.2
13.1
61.2
56.5
42.7
13.3
49.1
29
86.5
31.5
AB Mean
29.1
27.1
26.5
25.1
26.8
AB a
Fitted DE I
4.9
26.9
6.3
26.7
5.2
24.6
5.6
24.2
Fitted DE a
2.8
13.2
9.6
18.1
Lattice
Ho
H.
H
83.6
36.4
75.2
50.9
90.5
79.7
82.1
83.2
25.2
34.3
84.4
57
102.8
77.8
25.3
18.1*
16.5*
9.3**
8.5**
5.9
33.6
6.6
23.4
4.8
5.3
3.4
3.4
17
14.9
11
7.8
42.9
10.8
4.4
5
19.9
9.9
Table 5. Envelopment fraction E9 5. The fraction of the first 125 days (the time required for
cumulative cases to settle within 2% of its final value in the base DE) in which the infectious
population falls within the 95% confidence interval created by the ensemble of AB simulations
in each network and heterogeneity scenario. Reported for both the base and fitted DE cases.
I
Connected
H.
H*
Random
H.
I
Ho
I
Scale-free
H.
53
Ho
I
Small World
H.
H
I
Lattice
H.
I
H
Base DE
Fitted DE
(a)
1.00
0.99
(0.07)
1.00
0.98
(0.09)
1.00
0.99
(0.04)
1.00
0.97
(0.05)
1.00
0.95
(0.15)
1.00
0.96
(0.10)
0.71
0.98
(0.05)
0.70
0.99
(0.05)
0.55
0.91
(0.16)
0.54
0.94
(0.11)
Consider now the differences between the DE and AB cases by network type.
Fully Connected: The fully connected network corresponds closely to the perfect mixing
assumption of the DE. As expected, the base DE model closely tracks the mean of the AB
simulations, never falling outside the envelope capturing 95% of the AB realizations (E 95 =
100% for both H= and HO).The differences in peak time and peak prevalence between the base
DE and AB model are small and insignificant in both H= and Ho conditions. Final size is not
significantly different in the H= condition. In the Hi condition F is significantly smaller (p <
0.05), but the difference, 8%, is small relative to the 19% standard deviation in F within the AB
ensemble 2 2 .
Random: Results for the random network are similar to the connected case. The trajectory of
the base DE model never falls outside the 95% confidence interval defined by the AB
simulations. Differences in Tp and maxare not significant. The difference in cumulative cases is
statistically significant for both heterogeneity conditions, but small relative to the variation in F
within each condition. The incidence of early burnout rises somewhat because the random
network is sparse relative to the fully connected case, increasing contact overlap.
Scale-Free: Surprisingly, diffusion in the scale free network is also quite similar to the base DE,
even though the scale free network departs significantly from perfect mixing. The envelopment
fraction E95 is 100% for both H= and H#. Peak prevalence and peak time are not significantly
different from the base DE values. Though most nodes have few links, once the infection reaches
a hub it quickly spreads throughout the population. However, as the hubs are removed from the
infectious pool, the remaining infectious nodes have lower average contact rates, causing the
epidemic to burn out at lower levels of diffusion. Yet while final size is significantly lower than
The reported significance levels are based on the actual distribution of 1000 AB simulations in
each experimental condition. These distributions are typically nonnormal (e.g., F is bounded by
100%). Hence some differences in key metrics are significant even though they are small
relative to the standard deviation for the AB simulations. The conventional assumption of
normality would further reduce the significance of the differences between the AB and DE
models.
22
54
the DE in both heterogeneity conditions, the differences are small relative to the standard
deviation in F caused by random variations in agent behavior within the ensemble of AB
simulations.
Small World: Small world networks are highly clustered-most
contacts are with immediate
neighbors, and they lack highly connected hubs. Consequently diffusion is slower compared to
the DE and the connected, random, and scale-free networks. Slower diffusion relative to the DE
causes the envelopment fraction to fall to about 70%, and peak prevalence is significantly lower
than the DE. Nevertheless, the few long-range links are sufficient to seed the epidemic
throughout the population. Thus in the H= condition, cumulative cases and peak time are not
significantly different from the values of the base DE model. The main impact of heterogeneity
is greater dispersion and a reduction in final size.
Lattice: In the one-dimensional ring lattice individuals can only contact their k immediate
neighbors. Hence diffusion is far slower than in the DE (E95 is only about 55%). The epidemic
advances roughly linearly in a well-defined wave-front of new cases trailed by symptomatic and
then recovered individuals. Such waves are observed in the spread of plant pathogens, where
transmission is mostly local, though in two dimensions more complex patterns are common
(Bjornstad, Peltonen et al. 2002; Murray 2002). Because the epidemic wave front reaches a
stochastic steady state in which recovery balances new cases, the probability of burnout is
roughly constant over time. In the other networks early burnout quickly becomes rare as longrange links seed the epidemic in multiple locations (the technical appendix (2-1) shows
distributions of final size). Final size is much lower in the lattice than the base DE, though
interestingly, the variance is higher as well, so that in the H= condition the differences in F and
Tp are not significant (Imaxis significantly
lower in both heterogeneity conditions).
Calibrated DE Model: In practice key parameters such as RO and incubation times are poorly
constrained and are estimated by fitting models to aggregate data. Table 2 summarizes the
results of fitting the DE model to 200 randomly selected AB simulations in each experimental
condition. The differences between the AB and best-fit DE models are very small. The median
R2 for the fit to the recovered population exceeds 0.985 in all scenarios. The infectious
population nearly always remains inside the envelope containing 95% of the AB simulations
(Table 5). Across all ten experimental conditions the values of F, Tp, and Ima,,in the fitted
55
models are not statistically different from the means of the AB simulations in 94.8% of the cases
(range 89.0% -- 99.5%). The DE model fits extremely well even though it is clearly misspecified
in all but the homogeneous fully connected network. As the network becomes increasingly
clustered and diffusion slows, the estimated parameters adjust accordingly, yielding a trajectory
nearly indistinguishable from the mean of the AB simulations. The estimated parameters,
however, do diverge from the mean of those characterizing agents in the underlying model. As
expected from the relationship between Ro and final size,23 for more heterogeneous, and more
clustered, networks, where final size is smaller, the estimated Ro is also smaller. For example,
the fitted parameters for small-world networks show longer incubation times () and lower
infectivity of exposed (ES) compared to the base DE values, yielding slower diffusion and
smaller Ro and final size than the base DE values.
Sensitivity to Population Size: To examine the robustness of the results to population size we
repeated the analysis for N = 50 and 800.24 The results change little over a factor of 16. For most
network types (Random, Small World, Lattice), the final fraction of the population infected is
slightly larger (and therefore closer to the value of the DE) in larger populations (similar in other
two networks), and the rate of early burnout falls. Population size has little impact on other key
metrics due to two offsetting effects. As expected, the impact of discrete individuals and
stochastic state transitions is greater in small populations, increasing the variability in the
realizations of the AB model. But greater variability reduces the ability to distinguish between
the mean of the AB results and the DE. Details are reported in the technical appendix (2-1).
Discussion: Agent-based simulation adds an important new method to the modeling toolkit.
Yet computational and cognitive limits require all modelers to make tradeoffs. Agent-based
models allow detailed representation of individual attributes and networks of relationships, but
increase computational requirements. Traditional deterministic compartment models are
computationally efficient, but assume perfect mixing and homogeneity within compartments. By
explicitly contrasting agent-based and deterministic differential equation models of epidemic
diffusion we test the importance of relaxing the perfect mixing and agent homogeneity
Ro = -In(l - F)/F for SEIR models (Anderson and May (1991).
Mean links per node (k) must be scaled along with population. We use the scaling scheme that
preserves the average distance between nodes in the random network (Newman 2003), yielding
k=6 and k= 18 for N=50 and N=800.
23
24
56
assumptions. With the exception of the ring lattice, where there are no long-range contacts, there
are few statistically significant differences in key public health metrics between the mean of the
AB trajectories and the deterministic DE model. Mean peak times, determining how long public
health officials have to respond to an epidemic, are not significantly different from the base case
DE in any network or heterogeneity condition. Similarly, mean peak prevalence, determining
the maximum load on health infrastructure, does not differ significantly from the base DE for the
connected, random, and scale free networks. The difference in final epidemic size between the
mean of the AB simulations and the base DE model is statistically significant in 7 of the 10
experimental conditions, but the magnitude of the difference is less than one standard deviation
in all but the lattice. (While a pure lattice is unrealistic in modeling human diseases due to the
high mobility of modern society, it may be appropriate in other contexts such as the spread of
diseases among immobile plant populations.) The AB representation does offer insight
unavailable in the DE: the epidemic fizzles out in a small fraction of cases even though the
underlying parameters yield an expected value for the basic reproduction rate greater than one.
The more highly clustered the network, the greater the incidence of early burnout. The
deterministic DE model cannot generate such behavior.
Overall, however, differences between the DE and the mean trajectory of the AB
simulations are small relative to the variation in realizations of the AB model with a given
topology and heterogeneity characteristics. While clustering increases the departure of the mean
trajectory of the AB model from the DE, it also increases the variability of outcomes.
Intuitively, the location of infectious individuals makes no difference in the connected network
homogeneous case, but matters more in highly clustered networks with greater heterogeneity.
The increase in variability with clustering and heterogeneity means that the ability to identify
network topology from a given trajectory does not grow as fast as the differences in means might
suggest.
The differences between the DE and mean of the AB simulations are also small relative
to the uncertainty in key parameters. For example, Rofor smallpox is estimated to be in the
range 3 - 6 (Gani and Leach 2001). Varying Ro over that range in the DE (by scaling both
contact frequencies, cEs and cs, proportionately, while holding all other parameters constant)
causes F to vary from 94.1% to 99.7%, peak prevalence to vary from 21.7% to 31.8%, and peak
time to vary from 36 to 64 days. Variations in epidemic diffusion due to parameter uncertainty
57
are comparable to the differences caused by relaxing the perfect mixing and homogeneity
assumptions.
Compared to the homogeneous case, heterogeneity in transmission rates causes slightly
earlier peak times as high-contact individuals rapidly seed the epidemic, followed by lower
diffusion levels as the high-contact individuals are removed, leaving those with lower average
transmission probability and a smaller reproduction rate. These results are consistent with the
analysis of heterogeneity for SI and SIS models (Veliov 2005). Such dynamics were also
observed in the HIV epidemic, where initial diffusion was rapid in subpopulations with high
contact rates. Such heterogeneity is often captured in DE models by adding additional
compartments to distinguish the highly connected and hermit types.
Since parameters characterizing emerging (and some established) diseases are poorly
constrained, epidemiologists typically fit models to aggregate data. We tested the impact of this
protocol by treating the realizations of the AB model as the "real world" and fitting the DE
model to them. Surprisingly, the deterministic DE model replicates the behavior of the stochastic
agent-based simulations very well, even when contact networks are highly clustered and agents
highly heterogeneous. Calibration results also highlight an important methodological issue. The
parameter values obtained by fitting the aggregate model to the data from an AB simulation (and
therefore from the real world) do not necessarily equal the mean of the individual-level
parameters governing the micro-level interactions among agents. Aggregate parameter estimates
not only capture the mean of individual attributes such as contact rates but also the impact of
network structure and agent heterogeneity. Modelers often use both micro and aggregate data to
parameterize both AB and DE models; the results suggest caution must be exercised in doing so,
and in comparing parameter values in different models (Fahse, Wissel et al. 1998).
The results inform phenomena beyond epidemics. Analogous processes of social
contagion (imitation, word of mouth, etc.) play important roles in many social and economic
phenomena, from innovation diffusion to crowd behavior to marketing (Strang and Soule 1998;
Rogers 2003). Moreover, modelers tackling policy issues related to innovation and product
diffusion face tradeoffs on choice of modeling assumptions similar to those studying epidemics
(Gibbons 2004'). Therefore, the results may provide insight into the costs and benefits of agentlevel disaggregation in modeling of social contagion.
How, then, should modelers choose between the AB and DE frameworks? The model
58
purpose, time, and resources available condition the choice. If the network structure is known,
stable, and unaffected by the diffusion dynamics themselves, one can examine the effect of
creating and removing nodes and links to simulate random failures or targeted attacks. Such
questions cannot be answered with aggregate models. Data availability is a major concern. Data
on contact network and distribution of agent attributes are often hard to obtain and highly
uncertain, requiring extensive sensitivity analysis to ensure robust results. Disaggregating the
model into AB representation without data to support the formulation of network structure and
agent heterogeneity increases the computational cost and dimensionality of sensitivity analysis
but for many cases adds little value.
The purpose of the model also determines the risks of different methods. The costs of
avoidable morbidity and mortality due to inadequate public health response are high. Some
researchers therefore suggest it is prudent to use the well-mixed DE model, where diffusion is
fastest, as a worst case limit (Kaplan and Lee 1990; Kaplan 1991). The same argument may
suggest greater attention to network structure in studies of innovation diffusion and new product
launch where policymakers seek to promote, rather than suppress, diffusion.
Model complexity can be expanded in different directions. Modelers can add detail,
disaggregating populations by attributes such as location, decision rules, and relationship
networks. Alternatively they can expand the model boundary to include feedbacks with other
subsystems. For example, the results reported here assume fixed parameters including network
structure, contact rates and infectivities. All are actually endogenous. As prevalence increases,
people change their behavior. Self quarantine and the use of safe practices disrupts contact
networks, cuts contact frequencies, and reduces the probability of transmission given a contact.
From staying home, increased hand washing, and use of masks (for SARS) to abstinence,
condom use, and needle exchange (for HIV), endogenous behavior change can have large effects
on the dynamics (Blower, Gershengorn et al. 2000). In some cases behavior change tends to be
stabilizing, for example, as self-quarantine and use of safe practices reduce Ro. In others
behavior change may amplify the epidemic, for example as people fleeing a bioterror attack seed
outbreaks in remote areas and make contact tracing more difficult, or when development of more
effective drugs increase risky behavior among AIDS patients (Lightfoot, Swendeman et al.
2005). Such feedback effects may swamp the impact of network structure, heterogeneity, and
random events and should not be excluded in favor of greater detail.
59
Before concluding, we consider limitations and extensions. The experiments considered
the classic SEIR model. Further work should consider the robustness of results to common
elaborations such as loss of immunity, other distributions for emergence and recovery,
recruitment of new susceptibles, non-human disease reservoirs and vectors, etc. Note, however,
that by using the classic SEIR model, with only four compartments, we maximize the difference
between the aggregation assumptions of the DE and AB representations. In practical
applications, however, DE models often disaggregate the population more finely to account for
heterogeneity arising from, e.g., sex, age, behavior, location, contact frequencies, assortative
mixing and other attributes that vary among different population segments and cause clustering
in contact networks. Such disaggregation further reduces the differences between the DE and
AB representations. A major challenge, however, is choosing the number and definitions of
additional compartments optimally. The results suggest AB simulations may be used effectively
to design compartment models that capture important heterogeneity and network effects using
the fewest additional compartments-if
the network structure is reasonably well known and
stable.
While we examined a wide range of network structures and heterogeneity in agent
attributes, the agent models contain many parameters that could be subject to sensitivity analysis,
including the mean number of links per node, the probability of long range links (in the small
world network), and the scaling exponent (in the scale-free case). Other dimensions of agent
heterogeneity and other networks could be examined, including networks derived from field
study (Ahuja and Carley 1999). The boundary could be expanded to include endogenously the
many effects that alter parameters such as contact rates and network structure as an epidemic
progresses, e.g. through panic or policies such as quarantine and immunization.
Finally, while diffusion is an important process in physical and social systems, our study
does not address the differences between AB and DE models in other contexts.
Conclusion: The declining cost of computation has led to an explosion of agent-based
simulations, offering modelers the ability to capture individual differences in attributes, behavior,
and networks of relationships. Yet no matter how powerful computers become, limited time,
budget, cognitive capabilities, and client attention mean modelers must always choose whether to
disaggregate to capture the attributes of individual agents, expand the model boundary to capture
additional feedback processes, or keep the model simple so that it can be analyzed thoroughly.
60
When should agent-based models be used, and when should traditional models be used? We
compared agent-based and differential equation models in the context of contagious disease.
From SARS to HIV to the possibility of bioterrorism, understanding the diffusion of infectious
diseases is critical to public health and security. Further, analogous processes of social contagion
play important roles in many social and economic phenomena, from innovation diffusion to
crowd behavior.
We first compared the classic SEIR model to equivalent agent-based models with
identical (mean) parameters. We tested network structures spanning a wide range of clustering,
including fully connected, random, scale-free, small world, and lattice, each tested with
homogeneous and heterogeneous agents. Surprisingly, the DE model often captures the
dynamics well, even in networks that diverge significantly from perfect mixing and
homogeneity. The presence of even a few long-range links or highly connected hubs can seed an
epidemic throughout the population, leading to diffusion patterns that, in most cases, are not
significantly different from the dynamics of the differential equation model.
Second, in practical applications, parameters governing diffusion are poorly constrained
by independent data, for both emerging and established diseases (and for new products). Hence
model parameters are typically estimated from data on incidence and prevalence as diffusion
occurs. We assess this practice by treating individual realizations of the agent-based simulations
as the 'real world' and estimating key parameters of the DE model from these 'data'. The
calibrated DE model captures the dynamics extremely well in all network and heterogeneity
conditions tested. Overall, the differences between the agent-based and differential equation
models caused by heterogeneity and network structure are small relative to the range of
outcomes caused by stochastic variations in the behavior of individual agents, and relative to the
uncertainty in parameters. The results suggest extensive disaggregation may not be warranted
unless detailed data characterizing network structure are available, that structure is stable, and
the computational burden does not prevent sensitivity analysis or inclusion of other key
feedbacks that may condition the dynamics.
61
References:
Ahuja, M. K. and K. M. Carley. 1999. Network structure in virtual organizations. Organization
Science 10(6): 741-757.
Anderson, R. M. and R. M. May. 1991. Infectious diseases of humans : Dynamics and control.
Oxford; New York, Oxford University Press.
Axelrod, R. 1997. The dissemination of culture - a model with local convergence and global
polarization.Journal of ConflictResolution41(2): 203-226.
Axtell, R. L., J. Axelrod, J. M. Epstein and M. Cohen. 1996. Aligning simulation models: A case
study and results. Computational and Mathematical Organization Theory 1(2): 123-141.
Axtell, R. L., J. M. Epstein, J. S. Dean, G. J. Gumerman, A. C. Swedlund, J. Harburger, S.
Chakravarty, R. Hammond, J. Parker and M. Parker. 2002. Population growth and
collapse in a multiagent model of the kayenta anasazi in long house valley. Proceedings
of the NationalAcademy of Sciences of the UnitedStates ofAmerica 99: 7275-7279.
Ball, F., D. Mollison and G. Scalia-Tomba. 1997. Epidemics with two levels of mixing. The
Annals of Applied Probability7(1): 46-89.
Barabasi, A. L. 2002. Linked: How everything is connected to everything else and what it means.
Cambridge, Massachusetts, Perseus.
Barabasi, A. L. and R. Albert. 1999. Emergence of scaling in random networks. Science
286(5439): 509-512.
Bass, F. 1969. A new product growth model for consumer durables. Management Science 15:
215-227.
Bjornstad, O. N., M. Peltonen, A. M. Liebhold and W. Baltensweiler. 2002. Waves of larch
budmoth outbreaks in the european alps. Science 298(5595): 1020-1023.
Black, F. and M. Scholes. 1973. Pricing of options and corporate liabilities. Journal of Political
Economy 81(3): 637-654.
Blower, S. M., H. B. Gershengorn and R. M. Grant. 2000. A tale of two futures: Hiv and
antiretroviral therapy in san francisco. Science 287(5453): 650-654.
Carley, K. 1992. Organizational learning and personnel turnover. Organization Science 3(1): 2046.
Chen, L. C., B. Kaminsky, T. Tummino, K. M. Carley, E. Casman, D. Fridsma and A. Yahja.
2004. Aligning simulation models of smallpox outbreaks. Intelligence and Security
Informatics, Proceedings 3073: 1-16.
Davis, G. 1991. Agents without principles? The spread of the poison pill through the
intercorporate network. Administrative Science Quarterly 36: 583-613.
Dye, C. and N. Gay. 2003. Modeling the sars epidemic. Science 300(5627): 1884-1885.
Edwards, M., S. Huet, F. Goreaud and G. Deffuant. 2003. Comparing an individual-based model
of behavior diffusion with its mean field aggregate approximation. Journal ofArtificial
Societiesand Social Simulation6(4).
Epstein, J. M. 2002. Modeling civil violence: An agent-based computational approach.
Proceedingsof the NationalAcademy of Sciences of the UnitedStates ofAmerica 99:
7243-7250.
Epstein, J. M., D. A. T. Cummings, S. Chakravarty, R. M. Singa and D. S. Burke. 2004. Toward
a containmentstrategyfor smallpox bioterror:An individual-basedcomputational
approach, Brookings Institution Press.
62
Erdos, P. and A. Renyi. 1960. On the evolution of random graphs. Publications of the
MathematicalInstitute of the HungarianAcademyof Sciences 5: 17-61.
Eubank, S., H. Guclu, V. S. A. Kumar, M. V. Marathe, A. Srinivasan, Z. Toroczkai and N.
Wang. 2004. Modelling disease outbreaks in realistic urban social networks. Nature
429(13 May 2004): 180-184.
Fahse, L., C. Wissel and V. Grimm. 1998. Reconciling classical and individual-based approaches
in theoretical population ecology: A protocol for extracting population parameters from
individual-based models. The American Naturalist 152(6): 838-852.
Ferguson, N. M., M. J. Keeling, J. W. Edmunds, R. Gani, B. T. Grenfell, R. M. Anderson and S.
Leach. 2003. Planning for smallpox outbreaks. Nature 425: 681-685.
Gani, R. and S. Leach. 2001. Transmission potential of smallpox in contemporary populations.
Nature 414(6865): 748-751.
Gibbons, D. E. 2004. Network structure and innovation ambiguity effects on diffusion in
dynamic organizational fields. Academy of Management Journal 47(6): 938-951.
Halloran, E. M., I. M. Longini, N. Azhar and Y. Yang. 2002. Containing bioterrorist smallpox.
science 298: 1428-1432.
Halloran, M. E. and I. M. Longini. 2003. Smallpox bioterror response - response. Science
300(5625): 1503-1504.
Hethcote, H. W. 2000. The mathematics of infectious diseases. Siam Review 42(4): 599-653.
Jacquez, G. and P. O'Neill. 1991. Reproduction numbers and thresholds in stochastic epidemic
models. I. Homogeneous populations. Mathematical Biosciences 107: 161-186.
Jacquez, J. A. and C. P. Simon. 1993. The stochastic si model with recruitment and deaths .1.
Comparison with the closed sis model. Mathematical Biosciences 117(1-2): 77-125.
Jacquez, J. A. and C. P. Simon. 2002. Qualitative theory of compartmental systems with lags.
Mathematical Biosciences 180: 329-362.
Kaplan, E. H. 1991. Mean-max bounds for worst-case endemic mixing models. Mathematical
Biosciences 105(1): 97-109.
Kaplan, E. H., D. L. Craft and L. M. Wein. 2002. Emergency response to a smallpox attack: The
case for mass vaccination. Proceedings of the National Academy of Sciences 99(16):
10935-10940.
Kaplan, E. H. and Y. S. Lee. 1990. How bad can it get? Bounding worst case endemic
heterogeneous mixing models of hiv/aids. Mathematical Biosciences 99: 157-180.
Kaplan, E. H. and L. M. Wein. 2003. Smallpox bioterror response. science 300: 1503.
Keeling, M. J. 1999. The effects of local spatial structure on epidemiological invasions.
Proceedings of the Royal Society of London Series B-Biological Sciences 266(142 1):
859-867.
Koopman, J. S. 2002. Contolling smallpox. science 298: 1342-1344.
Koopman, J. S., S. E. Chick, C. P. Simon, C. S. Riolo and G. Jacquez. 2002. Stochastic effects
on endemic infection levels of disseminating versus local contacts. Mathematical
Biosciences 180: 49-71.
Koopman, J. S., G. Jacquez and S. E. Chick. 2001. New data and tools for integrating discrete
and continuous population modeling strategies. Population health and aging. 954: 268294.
Levinthal, D. and J. G. March. 1981. A model of adaptive organizational search. Journal of
EconomicBehavior and Organization2: 307-333.
63
Lightfoot, M., D. Swendeman, M. J. Rotheram-Borus, W. S. Comulada and R. Weiss. 2005. Risk
behaviors of youth living with hiv: Pre- and post-haart. American Journal of Health
Behavior 29(2): 162-171.
Lipsitch, M., T. Cohen, B. Cooper, J. M. Robins, S. Ma, L. James, G. Gopalakrishna,
S. K.
Chew, C. C. Tan, M. H. Samore, D. Fisman and M. Murray. 2003. Transmission
dynamics and control of severe acute respiratory syndrome. Science 300(5627): 19661970.
Lomi, A. and E. R. Larsen. 2001. Dynamics of organizations: Computational modeling and
organization theories. Menlo Park, Calif., AAAI Press/MIT Press.
Mahajan, V., E. Muller and Y. Wind. 2000. New-product diffusion models. Boston, Kluwer
Academic.
Murray, J. D. 2002. Mathematical biology. New York, Springer.
Newman, M. E. J. 2002. Spread of epidemic disease on networks. Physical Review E 66(1).
Newman, M. E. J. 2003. Random graphs as models of networks. Handbook of graphs and
networks. H. G. Schuster. Berlin, Wiley-VCH: 35-68.
Newman, M. E. J. 2003. The structure and function of complex networks. Siam Review 45(2):
167-256.
Riley, S., C. Fraser, C. A. Donnelly, A. C. Ghani, L. J. Abu-Raddad, A. J. Hedley, G. M. Leung,
L. M. Ho, T. H. Lam, T. Q. Thach, P. Chau, K. P. Chan, P. Y. Leung, T. Tsang, W. Ho,
K. H. Lee, E. M. C. Lau, N. M. Ferguson and R. M. Anderson. 2003. Transmission
dynamics of the etiological agent of sars in hong kong: Impact of public health
interventions. Science 300(5627): 1961-1966.
Rogers, E. M. 2003. Diffusion of innovations. New York, Free Press.
Samuelson, P. 1939. Interactions between the multiplier analysis and the principle of
acceleration. Review of Economic Statistics 21: 75-79.
Schelling, T. C. 1978. Micromotives and macrobehavior. New York, Norton.
Strang, D. and S. A. Soule. 1998. Diffusion in organizations and social movements: From hybrid
corn to poison pills. Annual Review of Sociology 24: 265-290.
Tesfatsion, L. 2002. Economic agents and markets as emergent phenomena. Proceedings of the
NationalAcademy of Sciences of the UnitedStates ofAmerica 99: 7191-7192.
Urban, G. L., J. R. Houser and H. Roberts. 1990. Prelaunch forecasting of new automobiles.
Management Science 36(4): 401-421.
Veliov, V. M. 2005. On the effect of population heterogeneity on dynamics of epidemic diseases.
Jouranl of MathematicalBiology Forthcoming.
Watts, D. J. 1999. Small worlds. Princeton, Princeton University Press.
Watts, D. J. 2004. The "new" science of networks. Annual Review of Sociology 30: 243-270.
Watts, D. J. and S. H. Strogatz. 1998. Collective dynamics of "small-world" networks. Nature
393(4 June): 440-442.
64
Essay 3- Dynamics of Multiple-release Product Development
Abstract
Product development (PD) is a crucial capability for firms in competitive markets. Building on
case studies of software development at a large firm, this essay explores the interaction among
the different stages of the PD process, the underlying architecture of the product, and the
products in the field. We introduce the concept of the "adaptation trap," where intendedly
functional adaptation of workload can overwhelm the PD organization and force it into
firefighting (Repenning 2001) as a result of the delay in seeing the additional resource need
from the field and underlying code-base. Moreover, the study highlights the importance of
architecture and underlying product-base in multiple-release product development, through their
impact on the quality of new models under development, as well as through resource
requirements for bug-fixing. Finally, this study corroborates the dynamics of tipping into
firefighting that follows quality-productivity tradeoffs under pressure. Put together, these
dynamics elucidate some of the reasons why PD capability is hard to build and why it easily
erodes. Consequently, we offer hypotheses on the characteristics of the PD process that increase
its strategic significance and discuss some practical challenges in the face of these dynamics.
Introduction
Firm performance and heterogeneity is a central topic of interest for researchers and
practitioners alike. According to the resource-based view of strategy, it is important to look
inside the firm for capabilities that distinguish it from its competitors (Wernerfelt 1984; Barney
1991; Peteraf 1993) and enable the firm to gain rents. A capability should be valuable, rare,
inimitable, and un-substitutable (Barney 1991) so that it can contribute to sustained competitive
advantage.
Researchers have suggested several broad explanations for why some capabilities elude
imitation and replication. Barney (1991) offers three main factors that result in inimitability:
history dependence, causal ambiguity, and social complexity. Dierickx and Cool (1989) note that
capabilities are stocks, and consequently discuss time compression diseconomies, asset mass
efficiencies, interconnectedness of stocks, asset erosion time constants, and causal ambiguity as
the main reasons why capabilities are hard to imitate. Finally, Amit and Schoemaker (1993)
65
suggest uncertainty, complexity, and organizational conflict as the main factors that create
heterogeneity in managerial decision-making and firm performance.
While these general explanations are important starting points, the resource-based
literature has not provided a detailed and grounded understanding of the capability evolution that
it suggests underlies the barriers to imitation (Williamson 1999). Opening the black-box of
capability is required for recognizing a capability independent of performance measures it is
supposed to explain, and for proposing testable hypotheses about the relative importance of
different capabilities. Such understanding would not only strengthen the resource-based view
theoretically; it is also needed to enhance the practical usefulness of the framework.
A few theoretical perspectives elaborate on barriers to imitation by discussing the factors
that hinder formation of successful strategies through adaptive processes. Leonard-Barton
(1992) introduces the concept of core rigidity, where routines underlying a firm's capabilities
resist change when the environment demands adoption of new capabilities -- for example, in the
face of architectural innovations (Henderson and Clark 1990). The organizational learning
literature highlights a similar barrier to learning, competency traps (Levitt and March 1988),
where accumulated experience with old routines reduces their operating cost and makes the new,
potentially superior, practices less rewarding. Finally, the complexity of imitation is elaborated
on through theoretical models of adaptation on rugged fitness landscapes (Levinthal 1997;
Rivkin 2001). This view of adaptation suggests that interdependency and interaction among the
different elements of the firm strategy (Milgrom and Roberts 1990) create multiple local peaks in
the landscape of fit between the organization's strategy and the environmental demands, making
it hard for the firm to achieve new combinations of strategic fit through incremental adaptation.
More recently, a few empirical studies have enriched the understanding of the nature of
capabilities and their dynamics. Henderson and Cockburn (1994) discuss how component and
architectural competence contribute to research and development productivity of pharmaceutical
firms. An intra-firm evolutionary perspective has been developed (Burgelman 1991; Lovas and
Ghoshal 2000) which views the evolution of strategies as processes of random variation and
selection inside a firm, and several case studies have elaborated on this perspective. (See the
special Summer 1996 issue of Strategic Management Journal on evolutionary perspectives on
strategy.) Finally, a few studies explore the nature and dynamics of organizational capabilities in
66
different industries, including banking, semi-conductors, automobiles, communications,
pharmaceuticals, and fast food (Dosi, Nelson et al. 2000).
These studies highlight that the complexity of capabilities resides in their evolution
through time and the path dependence in these dynamics. In this essay, drawing on two in-depth
case studies of the software development process, we propose a grounded theory of why
successful product development (PD) capability is hard to establish, and easily erodes when
established. Using the case study method (Eisenhardt 1989), combined with dynamic simulation
modeling (Forrester 1961; see Black, Carlile et al. 2004 for a detailed description of the method),
we develop a fine-grained and internally consistent perspective on multiple-release PD
capability: how it operates through time, and why it is hard to sustain. Through this theorybuilding practice we follow a tradition of using the micro-foundations of behavioral decision
theory to explain a phenomenon of interest in strategy (Zajac and Bazerman 1991; Amit and
Schoemaker 1993): operation and erosion of capabilities. Our results show 1) how the long-term
consequences of quality and productivity adjustment under pressure can deteriorate product
development capability, 2) how the adaptation trap makes it hard for organizations to sustain an
efficient product development process, and 3) how continuity in product base and architecture
reinforces these dynamics.
Context of Study: Multiple-release PD in the Software Industry
Product development is often highlighted as the prime example of a dynamic capability
(Eisenhardt and Martin 2000; Winter 2003). By creating innovative products that fulfill unmet
market needs, firms can create competitive advantage. Moreover, moving into new product
markets through successful product introduction enables firms to change their strategic
orientation; for example, Hewlett-Packard changed from an instruments company to a computer
company through product development (Burgelman 1991). In addition, since dynamic markets
demand continuous introduction of new products, companies in these markets cannot find a
substitute for the product development capability.
The practice of new product development has changed in many ways. While new
products were once developed separately in independent releases, development of product
families in multiple releases is gaining prominence in high-speed competitive markets
67
(Sanderson and Uzumeri 1997; Cusumano and Nobeoka 1998; Gawer and Cusumano 2002)25.
Multiple-release product development entails building a central set of components around which
new products in a product family are developed. Each new model of the product adds a few new
features to the existing architecture and product base. This process has several benefits, since it
allows the firm to get the first product to the market faster and with fewer resources; it provides
for learning about market and internal processes from one model to another; it allows for other
players in the market to build around the central platform, thereby creating positive externalities
for a product line (Gawer and Cusumano 2002); and it enhances competitiveness by increasing
control over the pacing of product introduction (Eisenhardt and Brown 1998).
For example, Palm currently has three major product families for the PDA market: Zire,
Tungsten, and Treo. Each product family has had different product releases/models shaped
around a specific platform. For example, the Treo platform combines features of cell phone and
classical PDA, and each new model has included new features, e.g., still camera, MP3 player,
and video camera. Most software products have traditionally followed the principles of the
multiple-release development process, even though using a different vocabulary. In the software
industry each new release (=model2 6) of a product (= product family) builds on the past releases
by adding new features to the current code base (= product base). This multiple release
development strategy operates across most common software products we use, from operating
systems (e.g., Mac OS, Windows, DOS) to different applications (e.g., MS Excel, Matlab,
Stata).
Compared to single-product development, multiple-release product development brings
more continuity across releases of a product, since not only do different releases share design and
development resources, but they also have similar pools of customers, the same architecture, and
a similar product base. These connections among different models highlight the importance of
an integrative view of product development which encompasses the dynamics across multiple
releases of a product family. While there is a rich literature looking at product development
(PD) in general (see Brown and Eisenhardt (1995) and Krishnan and Ulrich (2001) for reviews)
25 Multiple-release product development conceptually has overlap with platform-based PD in definition and its
applications. We use the former term because the main focus of this study is the interaction between different
releases of a project and not the development of a common platform.
26Throughout the text we use these terms interchangeably. We tend to use the software vocabulary when talking
specifically about the two case studies, and use the general terms when discussing the implications for PD in
general.
68
and the recent body of work on development of products based on a common platform is
growing (e.g., Sanderson and Uzumeri 1995; Meyer, Tertzakian et al. 1997; Cusumano and
Nobeoka 1998; Muffatto 1999; Krishnan and Gupta 2001), there has been little research on the
dynamics of the multiple-release product development process.
This gap is also observable in the literature on project dynamics. There is a rich set of
studies discussing single-project dynamics (e.g., Cooper 1980; AbdelHamid and Madnick 1991;
Ford and Sterman 1998). These studies have introduced the concept of the rework cycle and
have discussed several different processes, including the effect of pressure on productivity and
quality, morale, overtime, learning, hiring and experience, phase dependency, and coordination,
which endogenously influence the performance of different projects. However, with the
exception of Repenning's work (2000; 2001), little attention has been spent on the dynamics
when consequent product models are developed by the same development organization.
Repenning (2000; 2001) looks at multiple-project R&D systems and discusses tipping
dynamics that follow resource allocation between the concept design and development phases of
a PD process. He shows that there potentially exist two equilibria with low and high efficiency
for the PD organization, and a tipping point (a state of the system at which the behavior changes
significantly) between the two. These results, however, rely on the assumption that development
gets higher priority and that concept design adds enough value through avoiding future errors in
development to justify it on efficiency grounds. Moreover, this work fails to consider the quality
considerations of the product development when the product is out of the PD organization.
In this study we focus on the dynamics of multiple-release product development with an
emphasis on the effects of products in the field and the architecture and product-base. As a result,
we take into account the continuity between different releases of a product. Moreover, we focus
our analysis on dynamics that make it hard to build and sustain the efficient PD capability that is
crucial to the success of a firm in competitive markets.
The software industry is well suited as the setting for this study. In the absence of
production and distribution barriers, product development is the major part of the software
business. On the other hand, the rapid pace of change in the software industry makes it a great
example of a dynamic market. Therefore product development is a crucial dynamic capability for
a software firm. To differentiate the firm's performance from that of its competitors, a software
69
firm needs a strong PD capability that can produce products with more features, faster, and more
cheaply than the competitors.
Moreover, despite the importance of PD capability in this industry, software development
projects often cost more than budgeted, stretch beyond their deadlines, and fall short of
incorporating all their desired features (for a meta-survey see Molokken and Jorgensen 2003).
Cost and schedule overruns in software projects have persisted for over two decades, despite
significant technological and process improvements, such as object-oriented programming,
iterative and agile development, and product line engineering, among other things. The
challenge of learning to successfully manage software development is therefore non-trivial. Yet
this challenge is not insurmountable: Modest improvements in project measures (see Figure 1)
and development productivity (Putnam 2001) are observable in a long time horizon.
Furthermore, the size and complexity of software development projects are considered mediumrange (average size of about 100 person-months [Putnam 2001; Sauer and Cuthbertson 2004]),
allowing for an in-depth look at all the components of the PD process. Finally, there is a rich
literature on the software development process, including some of the pioneering work in
applying simulation models to project dynamics (AbdelHamid and Madnick 1991). We can
therefore build on this literature for the research at hand.
70
IT Project Performance
-1n
I1 UU'/o
90%
80%
I-n_ ©!I
-- J
70%
lII
I
I
I
I
----
i
I
I
I
i-I
I
I
I
I
1I
60%
50%
40%
30%
20%
10%
0%
I
I
1994
1996
1998
2000
2002
Source: StandishGroup Surveys
Figure 1- The performance of IT projects in the U.S. based on the bi-annual survey of the
Standish Group. Data is based on 8380 projects, mostly software development. Succeeded
includes projects that were on-time and on-budget. Failed group includes projects that were
never finished, and challenged group includes the rest. The data is compiled from different
sources on the Standish Group surveys, including their 1995 and 2001 surveys which are
available at their website: http://www.standishgroup.com/index.php.
Data Collection
This study uses interviews and archival data from two product development teams in a
single software organization (hereafter called "Sigma") to build a dynamic simulation model of
multiple-release product development. Sigma is a large IT company with over 16,000 employees
and several lines of products in hardware and software, which are distributed in multiple
locations around the world. In recent years the company management has decided on a strategic
shift towards the software side of the business and therefore strong emphasis is put on the
success of a few promising software products. The study was supported by the research
organization of the company, which was interested in learning about the factors important to the
71
success or failure of software projects. We were given complete access to internal studies and
assessments conducted within Sigma. Moreover, one of the research personnel who was familiar
with several software projects in the company helped us, both with connecting with the two cases
we studied and with the data-gathering effort.
We studied closely the development of two products, Alpha and Beta. The two cases
were selected because they had significantly different performance on the quality and schedule
dimensions, despite similarities in size and location. Despite a promising start, Alpha had come
to face many quality problems and delays while Beta's releases were on-time and had high
quality. Comparing and contrasting the processes in effect in two cases provided a rich context
to build a model capturing the common processes across two projects with different outcomes,
therefore helping shed light on how similar structures can underlie different outcomes.
Seventy semi-structured interviews (by phone and in person, ranging between 30 to 90
minutes) and archival data gathered in three months of fieldwork in Sigma inform this study.
Moreover, we participated in a few group meetings in the organization that focused on the
progress of products under study. The study is part of a longer research involvement with the
company; therefore, we continued to gather additional data and corroboration on different
themes after the three months of major fieldwork. These efforts included 20 additional interviews
and a group session which elicited ideas from 26 experienced members of the organization. The
interviews included members of all functional areas involved in development, services, and
sales, specifically architects and system engineers, developers, testers, customer service staff,
sales support personnel, and marketing personnel, as well as managers up to two layers in several
of these areas. Interviewees were given a short description of the project, were asked about their
role in the company and the process of work, and then they discussed their experiences in the
development of their respective products. The details of the data collection process are
documented in the technical appendix 3-1.
Data Analysis
Interviews were recorded and a summary of each interview was created soon after the
interview session. We used this information to build a simple model of how the mechanics of the
development process work. Moreover, in each interview we searched for factors contributing to
quality and delay issues and looked for corroborations or disagreements with old factors. Based
72
on these factors and the observations on the site, we generated several different hypotheses about
what processes contribute to, or avoid, delay and quality problems in the development of Alpha
and Beta. The complete set of these dynamic hypotheses is documented in the technical
appendix 3-2. We then narrowed down the list of hypotheses into a core set based on what
themes were more salient in the interviews and in the history of the products. This core set of
dynamic hypotheses created a qualitative theory to be formally modeled using the system
dynamics modeling framework (Forrester 1961; Sterman 2000). The modeling process included
first building a detailed technical model (documented in the technical appendix 3-3) and then
extracting the main dynamics in a simple simulation model reported in this essay.
The model allows us to understand the relationship between the process of product
development and the behavior of a PD organization. Using sensitivity analysis and different
simulation experiments, we learned about the dynamic consequences of interaction between the
mechanics of product development and decision-rules used to allocate resources and manage the
system.
In the rest of the essay, we use the simple simulation model to explain why creating,
managing, and sustaining multiple-release product development processes is difficult. More
specifically, we explain why PD capability in these settings can erode through random
environmental shocks and adaptation, and describe the role of product architecture and feedback
from the product in the field in the erosion of PD capability. Next, we provide a brief description
of the software development process in the two cases studied. In the following section, the
simple model of multiple-release product development is introduced and the dynamics of interest
are discussed in detail.
Overview of Development Process for Alpha and Beta
In Sigma, we studied the development of two products closely. Both products are
customer relationship management (CRM) solutions (or part of a CRM solution) and follow
similar general processes of software development, as described below. The development
process described here is for one release of a product; however, usually a new release is well
underway when the current release is launched into the market. Therefore, the different phases of
development overlap with each other for different releases.
73
The software development process includes three main phases. These phases can be
followed in a serial manner (waterfall development) or iteratively. Sigma's development
organization followed a largely waterfall approach, with some iterative elements. In the concept
design phase, the main features of the product are determined, the product architecture is
designed, and general requirements for developing different pieces of code are created. For
example, a new feature for the next release of a call-center software can be "linking to national
database of households and retrieving customer information as the call is routed to a customer
service agent." Software architects decide the method by which this new capability should be
incorporated into the current code, which new modules should be written, and how the new code
should interact with current modules of code. Subsequently a more detailed outline of the
requirements is developed that describes the inputs, outputs, and performance requirements for
the new modules, as well as modifications of the old code.
The next step of the software development process is developing the code. This is
usually the most salient and the largest share of the work. In this step, software engineers
(developers) develop the new code based on the requirements they have received from the
concept design phase. The quality of the development depends on several factors, including the
skills of individual developers, complexity of the software, adherence to good development
practices (e.g., writing detailed requirements and doing code reviews), quality of the code they
are building on, and quality of the software architecture (MacCormack, Kemerer et al. 2003).
Some preliminary tests are often run in this stage to insure the quality of the new code before it is
integrated with other pieces and sent to the quality assurance stage.
Quality assurance includes running different tests on the code to find problems that stop
the software from functioning as desired. Often these tests reveal problems in the code that are
referred back to developers for rework. This rework cycle continues until the software passes
most of the tests and the product can be released. It is important to note that not all possible
errors are discovered during testing. Often tests can cover only a fraction of possible
combinations of activities for which the software may be used in the field. Therefore, there are
always some bugs in the released software.
When the software is released, new customers buy and install it and therefore require
service from the company. The service department undertakes answering customer questions
and helping them use the software efficiently, as well as following up on bugs reported by
74
customers, and referring these bugs to the research and development organization. These bugs
need to be fixed on the customer site through patches and ad hoc fixes if they are critical to
customer satisfaction. Current-Engineering (CE) is the term used for this activity of creating
short-term fixes for bugs on a specific customer site. CE is often undertaken by the same part of
the organization that has developed the code, if not by the same pool of developers. If the bugs
are not critical, they can be fixed in later releases of the software. In this study Bug-Fixing refers
to the activity of fixing problems discovered from old releases in the new release of the product
and thus is different from CE. Consequently bug-fixing usually competes with addition of new
features to the next release.
The development, launch, and service activities discussed above focus on a single release
(version) of the software. However, different releases are interconnected. First, different
versions of software build on each other and hence they carry over the problems in code and
architecture, if these problems are not remedied through bug-fixes or refactoring of the
architecture. (Refactoring is improving the design and architecture of the existing code.)
Moreover, the sharing of resources between development and CE means that the quality of past
releases influences the resources available for the development of the current release.
Products Alpha and Beta share the general processes as discussed above. Product Alpha
is a Customer Relationship Management (CRM) software which has been evolving for over 8
years. The research and development team working on Alpha has fluctuated at about 150 fulltime employees who work in five main locations. This product has gone through a few
acquisitions and mergers; its first few releases have led the market; and it has been considered a
promising, strategic product for the company. However, in the past two years, long delays in
delivery and low quality have removed it from its leadership position in the market. In fact, the
recent low sales have cast doubt on its viability.
Product Beta is part of a complicated telecommunication switch that is developed largely
independent of the parent product but is tested, launched, and sold together with the parent
product. Beta includes both software and hardware elements and the majority of its over-80person R&D resources focus on the software part. Beta's R&D team is also spread across
multiple locations with four main sites, two of which overlap with Alpha product. Beta has had
exceptionally good quality and timely delivery consistently through its life. Nevertheless, in two
75
cases in its history, Beta had to delay a release to adhere to a synchronized release plan with the
parent product, which was delayed.
Model and Analysis
In this section we build a simple model of the multiple-release product development
process, in which the interactions among quality, productivity, and resource requirements for
products in the field and for maintenance of the old code are analyzed. We build and analyze the
model in three steps. First, a very simple model is developed to look at the productivity and
quality tradeoff in the development phase of the PD process. After analyzing this simple model,
we build into it a simple structure to represent what happens when the product is introduced into
the field and the dynamics that follow. Finally, we add the architecture and underlying codebase and analyze the full model.
The model is built based on the observations of PD processes described in the previous
section. Therefore the decision-rules are modeled based on the observation of individuals'
decision-making and action in practice, rather than theoretical or normative assumptions such as
rationality. Consequently the formulations of the model follow the tenets of behavioral decision
theory (Cyert and March 1963) and bounded rationality (Simon 1979) as applied to simulation
models (Morecroft 1983; Morecroft 1985).
Development Sector: Tipping point in productivity and quality tradeoff
Development of a new release of a software product entails adding new features to an
existing release or developing all the new features for the first release. In the same fashion, the
new car from an existing model has some new features added or modified on the existing
platform and a new version of a personal organizer builds on the last version with a few
modifications and additional features. Demandfor New Features2 7 usually comes from market
research, customer focus groups, benchmarking with competitors, and other strategic sources,
with an eye on internal innovative capabilities of the firm. In the case of the software products
studied, new features are proposed by sales, service, and marketing personnel, as well as product
managers who are in charge of finding out about what is offered by the competition and what is
most needed by the potential customers and prioritizing these features to be included in the
27 The
names of variables that appear in the model diagrams are italicized when they are first introduced.
76
future releases. Therefore the demand for new features largely depends on competitive pressures
and factors outside the direct control of the R&D department, even though the product managers
have some flexibility in selecting and prioritizing among hundreds of features that can be
included in a new release (see Figure 2).
The proposed features to be added in future releases accumulate in the stock of Features
Under Development until they are developed, tested, and released in the market to become part
of the Features in Latest Release of the software. Feature Release rate determines how fast the
PD organization is developing and releasing new features and therefore how often new releases
are launched. The quality of the new release depends on the Defects in Latest Release, which
accumulates Defect Introduction Rate as defects in the designed product go into the market.
Figure 2 shows how these variables represent the flow of tasks and the defects in the product.
Formally, the stocks (boxes) represent the integration of different flows (thick valves and
arrows); for example, Features Under Development is the integral of Feature Addition minus
Feature Release. Appendix A details the equations of the model.
Figure 2- The basic stock and flow structure for multiple-release new product development.
Demandfor New Features accumulates in the stock of Features Under Development until they
are developed and transferred into the Features in Latest Release through Feature Release.
Strong product development capability in this setting translates into getting the highest
rate of Feature Release given a fixed level of development resources. Companies that can
sustain such high rates of features release will have a competitive advantage through having
some combination of more features in their new releases, faster release times, and lower cost of
product development.
77
In its simplest form, Feature Release (FR) depends on the amount of resources available
for development (RD), the productivity of these resources (P), and the quality of their work (i.e.,
fraction of accomplished work that passes the quality criteria). This fraction depends on two
main factors, the quality of work, represented by the fraction of developed features that are
defective (Error Rate: e), and the comprehensiveness of testing and quality assurance practices,
represented by the fraction of errors discovered in testing (frT). Therefore the fraction of work
that is accepted is one minus the fraction of work rejected or frA=I-e*frT. Consequently Feature
Release can be summarized as:
FR=RD*P*(-1 e*frT)
( 1)
Here we measure the development resources (RD) as the number of individuals working
on development of new features. Productivity (P) represents how many features a typical
developer can create in one month and depends on several factors, including the skill and
experience of individual developers, the quality of design and requirements for the development,
the complexity of the code to be developed, the development process used (e.g., waterfall vs.
iterative), and the availability of different productivity tools. Similarly, Error Rate (e) depends
on several factors, including the quality of design and detailed requirements, complexity of
architecture, the scope of testing by developers (before the official testing phase), code reviews
and documentation, development process used, and the pressure developers face, among other
things. Finally, the fraction of errors caught in testing (frT) depends both on comprehensiveness
and coverage of test plans as well as on the quality of execution of, and follow-up on, testing.
What creates the dynamics in a product development process is the fact that resources,
productivity, quality, and testing are not constant, but often change through the life of a product
development organization working on multiple releases of a product. While exogenous changes
in these variables (e.g., economic downturn results in a lay-offand a reduction in available
resources) are important, they add little insight into internal dynamics of the PD processes and
offer little leverage for improvement. Therefore here we focus on those changes which are
endogenous to the PD process, for example, how resource availability changes because of the
quality of past releases. In what follows we expand the model to capture some of the main
endogenous effects (feedbacks) and analyze the results.
A central concept for understanding the capability erosion dynamics is the balance
between resources available and resources required for finishing a release on schedule. This can
78
be operationalized as Resource Gap (RG), the ratio of typical person.months of work required to
finish the development of pending Features Under Development by a scheduled date, to available
person.months until that date. Therefore the Resource Gap is a function of three variables: how
many features are under development, how much time the PD organization is given to develop
these features, and how many development resources are available to do this job. (See the exact
equations in Appendix A.) In the analysis of the determinants of the resource gap, we can
distinguish between Features Under Development that change endogenously due to model
dynamics, versus the time given for development and the resources available, which are mainly
managerial policy levers. Therefore we keep the resources available and the time to develop the
pending features constant when discussing the model dynamics in the absence of policy
interventions. Resource Gap is a measure of pressure on the development team and therefore we
use the two terms interchangeably.
The resource gap can be managed through controlling the amount of resources available
or the scheduled finish date. If pressure is consciously controlled, the desired impact of
increasing schedule pressure is to increase the productivity of developers so that the gap between
available resources and desired resources can be closed. In fact, in the case of software
engineering, it is estimated that under pressure people can work as much as 40% faster
(AbdelHamid and Madnick 1991). All else being equal, this balancing loop (Work Faster loop -see Figure 3) enables development teams to accomplish more work than in the absence of
pressure and potentially meet a challenging schedule. Managers commonly use this lever to
increase the productivity of the development process, as reflected in the comments of a program
manager:
"The thinking that [a higher manager]had, and we think we agree with this, is that if
your target is Xand everybody is marchingfor that target,you fill the timefor that target. Ifyou
are ahead of the game,people areprobably going to relax a little bit: becausewe aim ahead of
the game, we don't need to work so hard,we can take a longer lunch,we can do whatever;and
guess what: Murphy's law saysyou now won't make that target. So [ ]you shootfor Xminus
something,you drivefor X minus somethingso thatyou build in on Murphy's law to a point. "
However, there is a negative side to increasing work pressure. As pressure increases and
developers try to work faster to meet the schedule, they start to make more errors. This happens
because under pressure they take shortcuts, for example, by less detailed requirement
79
development, little documentation of the code, lack of code review, and poor unit testing (see
MacCormack, Kemerer et al. (2003) for a quantitative analysis of the effects of different
practices on quality and productivity). Moreover, the stress induced by pressure can increase
their error rate and further erode the quality. In the words of a technical manager in Alpha:
"There is
a lot ofpressure put on people to say how can you do this asfast asyou can,
andpeople fall back on the ways that they used to do it, so insteadof going through all this
process stuff [they would say] 'I know what to code, I canjust code it up. "'
Higher Error Rates surface in a few weeks when the code goes into the testing phase,
resulting in the need for more rework. Consequently rework increases the amount of work to be
done and therefore the pressure, closing a potentially vicious, reinforcing cycle (Rework Loop -Figure 3).
The error rate-productivity trade-off discussed above is well documented for a single
project in different domains, from software development to construction (Cooper 1980;
AbdelHamid and Madnick 1991). These effects were also salient in Alpha and Beta; for
example, developers in Alpha found little time to engage in process work that was supposed to
enhance the quality of the product, such as doing code reviews and making detailed requirement
plans. The strength of the trade-off between pressure and quality depends not only on individual
skills and experience, but also on the development process and incentive structure in place at the
PD organization. For example, in Alpha, the quality of development was hardly transparent to
management at the time of development; therefore, under acute pressure, developers had more
incentive to forgo good practices and sacrifice quality to increase the speed of development. A
developer observes:
a".."""".
am"...... "
r m"
are "..
m.
nmr
me
a
a
"Nobody really sees what we do. Youget rewardedwhen your name is next to a dollar
sign
If we wanted to get recognition,we couldput faults into [the software] so we know
exactly where they are, and then wait until a big customerputs a lot of money on it, and thengo
fix it quickly...This is not what we do [but]jumping and saving the customer is important [for
our reviews].Nobody knows what actually the quality is and what actually we do. "
We aggregate the effects of pressure on productivity and quality in two simple functions.
Fp(RG), the effect of Resource Gap (RG) on productivity, changes the productivity around a
80
normal level, PN,so that P=PN* Fp(RG). Similarly, Fe(RG) represents the effect of Resource Gap
on Error Rate and therefore Error Rate changes around its normal value eN:e=eN*Fe(RG).
Substituting in Equation (1) we can see how Feature Release depends on Resource Gap:
FR= FR=RD* PN* Fp(RG)*(I- eN*Fe(RG)*frT)
(2)
The exact behavior of this function depends on the shape of Fp and Fe and therefore on
the strength of the two effects, but in general both functions are upward sloping (increase in
Resource Gap increases both Productivity and Error Rate) and both saturate at some minimum
and maximum levels (Productivity and Error Rate cannot go to infinity or below zero). The
qualitative behavior of the system, however, depends only on the general shape of the Feature
Release with respect to different levels of Resource Gap. An upward sloping function suggests
that the higher the pressure, the higher the output of the development process is (although this
can be at odds with quality if a poor testing system lets the defects to go through). A downward
sloping function suggests that the least pressure is needed to get the best performance. Finally, a
more plausible shape is inverse-U shape, where up until some level of Resource Gap people will
work harder and make more progress, but above that level the increase in pressure is actually
harmful since people start to take shortcuts that harm the quality significantly and therefore
developers end up reworking their tasks and therefore releasing fewer acceptable features on
average. This shape in fact proves to be dominant for a wide variety of plausible Fe, Fp, and frT
values.
Figure 3 illustrates the causal relationships between Resource Gap, productivity, Error
Rate, and Feature Release. Two feedback loops of Work Harder and Rework are highlighted in
this picture. The Work Harder loop suggests that an increase in the Features Under
Development increases the demand for resources and therefore the Resource Gap. As a result,
productivity increases and the PD process releases more new features, reducing the stock of
Features Under Development (hence the Work Harder loop is balancing (B)). The Rework loop
highlights the fact that in the presence of more features to be developed and higher pressure,
people take shortcuts and work with lower quality, therefore increasing the Error Rate and
consequently the amount of rework needed to fix those errors. This results in slower Feature
Release and therefore potentially higher Features Under Development (hence the Rework loop is
reinforcing (R)).
81
Assuming the more plausible inverse-U shape for the outcome of the tradeoff between
these two loops, the total effect of Resource Gap on output of PD can be categorized into two
regions. For lower levels of Resource Gap and pressure, an increase in Features Under
Development (and therefore pressure) has a desirable net gain, namely, the PD organization will
produce more. However, above some threshold of Resource Gap, the relationship inverses, that
is, the PD organization will produce fewer new features as a result of an increase in the pressure.
-P
Ao
0
AL
0
L
18
I
Available
-
(Pressure)
Resources
Figure 3- The basic structure of development process and tradeoffs between effects of Resource
Gap on Error Rate and productivity. The graphs show the shape of these two effects, both
increasing as more pressure is added. The Feature Release vs. Resource Gap graph shows the
aggregate result of pressure on Feature Release for eN=0.4 and frT=0.9. The values on the axes
are not shown since the qualitative shapes of the functions are what matter in the analysis.
To understand the dynamics of this system, consider a PD organization in equilibrium,
i.e., with a fixed, constant demand for new features coming in, which the development team is
able to meet, resulting in equal Feature Release and demand for new features. A perturbation of
the PD organization with a transient increase in workload, surfaces the important dynamics. An
82
example of such a perturbation is a set of unplanned, new features that the higher management
asks the development team to incorporate into the next release, beyond the current workload.
Figure 4 portrays the two possible outcomes of such perturbation. In Figure 4-a (left), the
PD organization is working by developing six new features each month, until time ten, when the
demand increases to ten features per month and continues to be so for four months. During this
time, the additional demand results in more Features Under Development and therefore higher
levels of Resource Gap. This additional pressure in fact increases the overall Feature Release
over the equilibrium value, since the increase in the productivity is bigger than the negative
effect of the higher Error Rate. Consequently, after the short-term additional demand is removed
(feature demand goes back to six), the PD organization continues to release more new features
than demanded, and therefore is successful in bringing the stock of Features Under Development
back to its equilibrium value.
Figure 4- The reaction of the PD process under a pulse of increased demand for features. The
pulse lasts for 4 months and increases the feature demand (red); Features Under Development
(blue) represents the features demanded by the market but not yet developed. Feature release
(green) represents the output of the PD process. Figure 4-a, on the left hand side, represents a
case where additional pressure introduced by the pulse is managed through faster work and the
Features Under Development is moved back to its equilibrium level. Figure 4-b shows how the
PD organization fails to accommodate a slightly stronger pulse, resulting in erosion of PD
capability as the Feature Release drops down and stays low.
83
Now consider the same experiment with a slightly stronger perturbation (e.g., demand
going to 13 rather than 10). This time we observe a dramatic shift in the performance of the
simulated PD organization. In the new state, which, following Repenning (Repenning 2001) we
call Firefighting, the Feature Release fails to keep up with demand and in fact goes down, the
Features Under Development grow higher and higher since demand continues to exceed release,
and, in short, the PD organization moves into a mode of being excessively under pressure, being
behind the market demand, and developing fewer new features than they used to in the initial
equilibrium. What creates such a dramatic shift in behavior, from addition of just a few features
to the workload?
Maximum Capacity
I
J
U
0)
)
co
emand
0)
L.
C,
7(U
IL
Initial Pressure
Tipping Pressure
Resource Gap
Figure 5- Schematic inverse-U shape curve of Feature Release as a function of Resource Gap.
The picture highlights the stable equilibrium (where the feature demand meets the upward
sloping part of the curve), the tipping point (where the feature demand meets the downward
sloping part of the curve), and the maximum capacity (where the curve reaches its maximum).
A graphical representation of dynamics helps illuminate this question better. Figure 5
shows an inverse-U shape curve that relates Resource Gap to Feature Release. The equilibrium
level of feature demand is represented by a horizontal line. Initially the system is in equilibrium
at the point marked by stable equilibrium: the Resource Gap is at a level at which Feature
Release equals feature demand, and therefore the Features Under Development does not change
and the system remains in equilibrium. If the system is perturbed by a small pulse (reduction) in
84
demand, the Features Under Development increase (decrease). Hence the Resource Gap
increases (decreases) to a value right (left) of the stable equilibrium point, and that results in
values of Feature Release higher (lower) than feature demand. As a result, the Features Under
Development decrease faster (slower) than increase, and the stock goes down (up), reducing
(increasing) the Resource Gap until the system gets back to the stable equilibrium. The arrows
on the curve represent the direction towards which the system moves: for small perturbations in
work or schedule pressure the system adjusts itself back into the stable equilibrium. However,
there is a threshold of Resource Gap, the Tipping Pressure (see Figure 5), beyond which the
system's behavior becomes divergent. An increase in the Resource Gap over the tipping
pressure results in Feature Release values lower than feature demand. Therefore the stock of
Features Under Development keeps growing, exerting further counterproductive pressure; hence
the rework loop dominates in its vicious direction and the PD capacity goes down with no
recovery (for a more comprehensive treatment of similar dynamics in a different setting see
Rudolph and Repenning (2002)). In the first experiment, the initial perturbation brought the
Resource Gap to the right of the stable equilibrium but just short of passing the tipping point.
Therefore the simulated PD organization was able to recover by working faster, producing more,
and reducing the pressure. However, the perturbation in the second experiment pushed the
Resource Gap over the edge and beyond the tipping point. The PD organization was caught in a
cycle of working harder, making more errors, spending more time on rework, and sensing more
pressure under the unrelenting market demand.
Note that the tipping point is different from the point at which the maximum development
capacity is realized (see Figure 5). The tipping point is where the feature demand and Feature
Release meet on the downward sloping part of the release-pressure curve, while the maximum
Feature Release is achieved at the pressure which maximizes this curve. As a result, the Tipping
Pressure, as well as the stable Equilibrium Pressure, depend on the feature demand. Changing
feature demand shifts the tipping point and stable equilibrium because it shifts the horizontal
demand line on the curve, bringing them closer to (by more demand) or further from (by less
demand) each other and the maximum capacity.
The shift in tipping point and stable equilibrium highlights an important tradeoff between
a PD organization's operating capacity and its robustness. We can design more new features by
accepting a higher demand rate (e.g., through taking on more tasks in the planning for a new
85
release), but that brings the stable equilibrium and the tipping point closer, and therefore the PD
organization can tip into the firefighting mode more easily. For example, an unexpected demand
by higher management, a technical glitch, or departure of a few key personnel can tip the system
into firefighting mode. In the extreme, we can get the maximum output from the PD
organization by demanding features at the maximum capacity rate. In this case we bring the
stable equilibrium to the tipping point and therefore the system is at the tip of the slippery slope
into firefighting. While highly productive, such a system can easily fall into the firefighting trap
in the face of small unexpected perturbations in the environment of the PD work.
The dynamics discussed in this section look only at the development part of the PD
process. Therefore they do not account for the quality consequences of the features designed,
when they go outside of the PD organization. This is not a good assumption and has unrealistic
conclusions, e.g., the PD organization is better off to reduce its testing goals, since that way it
does not find the defects in the features designed, and consequently does not need to do any
rework, avoiding firefighting and increasing the capacity (in simple terms, ignorance is bliss). In
the next two sections we expand the model to account for two important consequences of the
quality of PD work, and the dynamics that ensue.
Current engineering: shifting capacity and creating an adaptation trap
New features introduced into the market have important effects for the product
development organization. First, there is a demand for customer support and services. The
service organization, which is often different from the R&D unit (as is the case in Sigma) is in
charge of direct interaction with these customers. However, if service people are unable to solve
a customer issue, the customer call is transferred to the R&D department. These transfers (called
escalations in Sigma) can arise because of complexity of the software, but also because some
defect in the code has created an unexpected issue at the client site. When such defects are
found, the development team often needs to decide on their fate. If the problem is interfering
with important needs of the customer, R&D personnel are allocated to fix the problem, usually
through quick, customized patches at the customer site. These activities are referred to as
current engineering (CE) and can take a significant share of R&D resources. In fact, with
product Alpha, 30-50 percent of development resources were allocated to CE work, while
successful management of quality in Beta had kept this number below 10 percent.
86
Current engineering is not the only connection between the PD organization and the
product in the field. Initial installation of the product also requires PD resources; therefore a
product design that facilitates quick, smooth installation can save the PD organization from
installation challenges. However, improving installability and serviceability of the product gets
lower priority when important new features should be added to keep up with the market demand.
A more subtle feedback of field performance is through sales level and its influence on
resource allocation across a portfolio of products. Delays and quality problems that characterize
a PD organization in trouble often reduce customer satisfaction and sales for a product. Since
resource allocation across different product lines in the portfolio often correlates with the success
of the product in the field, i.e., its sales pattern, poor quality and sales can cost the PD
organization its resources for development of future releases.
This section focuses on modeling and analyzing the effect of current engineering work.
Figure 6 shows the causal structure of the model that includes CE. Here the features flow into
the field as new releases are launched, and Defect Introduction captures the flow of defects into
the field along with these features. These defects create a demand for PD resources to be
allocated for current engineering (Resource Demandfor Current Engineering.) This demand
further increases the Resource Gap and can aggravate the Error Rate for the current Features
Under Development, since developers must fix customer-specific bugs rather than spend time on
quality work for the next releases (Current Engineering loop in Figure 6). The demand for CE
resources depends not only on the quality of the product in the field (operationalized as Defects
in Latest Release), but also on the sales, fraction of defects that create noticeable problems, and
productivity of doing CE work. Details of formulations appear in Appendix A. The analysis that
follows holds sales and fraction of noticeable problems constant and focuses on the dynamics in
the interaction of the Development and the Product in the Field Sectors.
87
Product In the Field
Development
:i
MU·
· ·
,
·
;
in
-
}
t'
E.,
'
itX
a
-
W(
A
·1~~~~~~~~1
;2
· *I
'!
'*M'hd*
'§
i
d
I'
An+
;
Devefoptit
"
t .1g.,>·r-0''~
r
>
'
,'
'
, ·
,
4r,
·; ·
Features in Latest
Release
I!'
frif;
5pm'ent;.iP
|
As
,
F.',
4
',
.>i~s
lin,
Defects in Latest
6,>
Release
....J-~~~~
.I
I*D
isn'
h0eaprra
.Om"n
uroeft
Figure 6- The structure of the model with the addition of products in the field. The new
structures are highlighted through black variable names and blue, thick causal links.
Two major dynamic effects follow the introduction of CE into the analysis. First, the
additional resources needed to serve the CE reduce the effective capacity for developing new
features. The magnitude of this effect depends on several factors that determine demand for CE
resources, some of which are endogenous to the dynamics of development and products in the
field. For example, very good quality in the development phase can significantly reduce the CE
resource demand since the product has few noteworthy bugs when introduced into the market.
Another factor is sales, which increases the CE demand, since it increases the number of
customers who may face problems. The strength of the sales effect also depends on the nature of
the product. For customizable products, different defects show up in different customer sites
because each customer uses a different set of features and those combinations reveal different
hidden problems in the design of the product. However, most defects in off-the-shelf products
are common across different customers and therefore, by developing one patch, the PD
organization serves several customers, reducing the burden of CE significantly. Other relevant
factors include the modularity of architecture and complexity of the code, since these determine
how resource-consuming the CE work is; the priority of CE work relative to new development;
88
the fraction of defects that surface in typical applications; and the power of the customers in
relationship with the focal organization.
In the case of Alpha and Beta, these factors added up to a potentially strong CE effect:
the products are customizable and the CE work gets a high priority because large customers
command special attention in their relationship with Sigma (partly because future sales depend
both on word-of-mouth and direct feedback from current customers). However, Alpha and Beta
faced different levels of pressure with regard to CE: Alpha had to allocate 30 to 50 percent of its
resources (at different times) to CE work, while this figure remained below 10 percent in Beta.
The difference between the two can be explained by the internal dynamics of the two cases.
Alpha had slipped into a firefighting mode, it faced significant quality problems, and the poor
quality of past releases took a toll on the complexity of the code-base and architecture, increasing
the time needed to figure out and fix a problem. On the other hand, Beta had largely succeeded
in avoiding the firefighting trap and therefore, with a good quality, had very few escalations and
little demand for CE work.
The second effect of bringing CE into the picture highlights an adaptation trap for the PD
organization. Imagine the adaptive path that the management can take to maximize the feature
release of the PD organization. Increasing the pressure (through additional demand or tighter
schedules), the management can monitor the performance of the system to see whether
incremental increases in pressure pay off by higher feature release, and stabilize where the
marginal return on additional pressure is zero. Such an adaptation process is analogous to hillclimbing on the Resource Gap-Feature Release curve in Figure 7, and is discussed as one of the
main ways through which organizational routines adapt and create better performance (Nelson
and Winter 1982).
89
6
Perceived
Max
moo"
5.25
Sustainable
Max Demand
4.5
(a
3.75
W1
3
0,
LL
3
0.80
0.90
1
1.10
1.20
1.30
1.40
1.50
1.60
Resource Gap
Figure 7- The curve for perceived (blue) and sustainable (red) curves for Feature Release vs.
Resource Gap. In the perceived curve, the demand for CE is kept at its initial value despite
adaptive increase in feature demand. The sustainable path comes from the same experience,
where the CE resource demand is given at its equilibrium value for each level of Error Rate and
feature release.
An important distinction is that the demand for CE does not increase at the time that the
pressure (and consequently the output and error rates) is increased. In fact, it takes at least a few
months before the product is out and then some more time before the defects show up and are
referred back to the PD organization. During this period, increases in pressure create unrealistic
boosts in the feature release rate, since the CE consequences are not realized. Therefore the
adaptation is not on the sustainable payoff landscape, which takes into account the future CE
demands; rather, it is on a temporary one with unrealistic returns to additional pressure (see
Figure 7). As a result, the adaptation process points to an optimum feature demand that is indeed
beyond the capacity of the PD organization to supply long-term , and when the CE work starts to
flow in, the PD organization finds itself on the slippery slopes of firefighting. Pushed this far,
not only does CE demand increase the pressure, but also the tighter schedule and higher demands
that were justified in light of adaptive process, keep up their pressure through an increasing
backlog of features which should be incorporated in the new releases.
.
mm.a
6 a
a.
..
mm
a a a
a
a.
urn
a
...
m
Iu
. ..
mm a I
a
maaaum
The strength of the adaptation trap depends on the magnitude of decoupling between the
sustainable adaptation path and the perceived path. Decoupling increases the extent to which
90
I.....
demand on PD organization could increase beyond the sustainable level. Therefore higher
decoupling increases the magnitude of consequent collapse of the PD process performance and
strengthens the trap. The extent of decoupling depends on two main factors: the length of delay
in observing the CE resource demand, and the speed of adaptation. A longer delay in perceiving
the demand for CE resources, e.g., because of longer release cycle, will result in a longer time
horizon in which the adaptive processes work in the absence of the CE feedback. Therefore the
risk of overshoot in adaptation into the unsustainable region increases. Moreover, faster
adaptation, e.g., through faster learning cycles of pressure adjustment and performance
monitoring, increases the speed by which the PD organization expands the pressure and feature
demand, before observing the CE demand.
Architecture and code base: dynamics of bug-fixing and building on error
In this section we expand the simple model of the PD process to include the effects of the
architecture and code base in the dynamics of PD. In this discussion, we focus on two main
features of architecture and code base. First, we include fixing the bugs found in the past
releases when creating a new release, as well as improving the architecture to accommodate
further expansion of the code. Second, we discuss the effect of the problems in architecture and
code base on the error rate for working on the new features.
Releases which are on the market become old when newer products are launched.
Therefore Features in Latest Release move into the Underlying Code category, while the defects
that existed in them are removed from the field (stock of Defects in Latest Release) and become
part of the Defects in Underlying Code (see Figure 8). In this conceptualization we are lumping
together the underlying code base and the architecture of the software. Though these two are not
the same, they have important similarities in their dynamic effects on PD processes. Bug-fixing
is the process of fixing defects that have been found in the field (or found during testing, but
ignored under work pressure) in the underlying code. It is different from current engineering
since CE happens for specific customers who face a common issue, and the resulting fixes are
not integrated with the next release of the code, while bug-fixing incorporates improvements into
the main body of the code and will apply to all future customers. The parallel process for
architecture is refactoring, which means improving the design of existing code.
91
Quality of underlying architecture and code base both depend on the quality of past
development work. Low quality development not only reduces the quality of code base for
future work, but also impacts architecture, since taking shortcuts in development could spoil the
designed relationships between the different modules of the code. Moreover, although we do not
model the concept design phase explicitly in this model, the quality of architecture design
depends on similar pressures that impact the development phase. Therefore one can expect the
quality of the code base and the architecture to be positively correlated.
Architectureand Code Base
Figure 8- The model structure after addition of architecture and code base. New loops are
highlighted in thick blue causal links.
Bug-fixing and refactoring need resources from the PD organization and the demand for
these resources depends on the quality of the code base and architecture. This demand for
resources can further increase the Resource Gap, leading to more quality problems in a
reinforcing cycle we call Bug Fixing (Figure 8). The additional resources needed for bug-fixing
are often not separated from those who work on the development of new features, since bug fixes
92
are part of new releases. In fact, most software products have a release schedule with minor and
major releases, in which minor releases mainly focus on fixing bugs. Consequently, despite
lower visibility, bug-fixing can consume a considerable share of PD resources. For example, one
of the recent "minor" releases of Alpha included fixing over 1000 bugs, and proved to be as
time- and resource-intensive as any major release.
Including the bug-fixing activity in the picture of the PD process has some similar
consequences to addition of current engineering. First, the bug-fixing resource demand takes
away resources for development of new features, shifting the effective, long-term capacity of the
PD process further down. The magnitude of this shift depends on several factors, including the
quality of the code developed (Did we create a lot of bugs?), the strength of testing (Did we fail
to catch those bugs initially?), the priority given to fixing bugs (Do we care about fixing bugs?),
as well as the relative productivity of bug-fixing (Does it take more resources to fix a feature
than to develop it correctly in the first place?). In Appendix B we derive the mathematical
formula for the sustainable capacity of PD process under different parameter settings.
Second, the adaptation trap is made stronger through bug-fixing feedback. The bugfixing activities entail an even longer delay between the original development and the
observation of demand for bug-fixing resources. This delay includes the release of new features,
the reporting of the bugs, and the scheduling of these bugs in the work for future releases. The
longer delay suggests that there is even more time during which adaptive processes fail to
observe an overshoot of demand beyond sustainable PD capacity. Moreover, the downward shift
in the sustainable capacity of PD process suggests that the gap between the short-term feature
release and what can be sustained in the long-term is made wider. Consequently, the chances of
adaptation into an unsustainable region increase, and when the feedback of CE and bug-fixing
arrives, the PD organization finds itself further down the pressure-Feature Release curve.
If allocating resources to bug-fixing reduces the PD capacity and increases the chances of
slipping into firefighting mode, why not ignore this activity altogether? The problem is two-fold.
First, doing so will generate further customer dissatisfaction and demand for CE when old
problems surface in the new features. Moreover, ignoring bug fixes results in the accumulation
of defects in the underlying code and architecture, which can significantly increase the chances
of making errors in the current development work and trigger another vicious cycle (the Error
Creates Error loop, Figure 8).
93
The Error Creates Error loop, when active, is quite detrimental to the product line. On
the one hand, the effect of the quality of underlying code and architecture on the error rate is
beyond the effect of pressure and can drive quality further down. Such lower quality in fact
strengthens the rework, current engineering, and bug-fixing loops in the vicious direction. On
the other hand, in the face of longer delays before the architecture and code base effects surface,
typically firefighting starts with rework and current engineering loops driving the dynamics,
followed by the domination of Error Creates Error that seals the PD organization in firefighting.
Consequently, the activation of the Error Creates Error loop suggests that things have been going
wrong for quite a while; therefore reversing the trend requires a significant investment in fixing
bugs and refactoring the architecture, and in many cases may prove infeasible. In short, it is hard
to save a product line so far down into the firefighting path that it faces significant quality
problems because of the old architecture and code base quality.
An important factor in determining the propensity of a product to face the Error Creates
Error loop is the modularity of its architecture. The modularity of architecture can be
represented by the average number of other modules each module is interacting with. In a highly
modular product, each part is interacting with only a few neighboring parts. In a very integrated
product, however, every piece can be interacting with every other piece. In general, new pieces
can safely be added to the old code, as long as the pieces they are interacting with are not
defective. However, if the interacting pieces are defective, the Error Creates Error feedback
becomes important since all modules connected to the new one can impact its quality.
Therefore, in a highly modular architecture, we only need a few interacting pieces to be fixed to
avoid the feedback. In aggregate this entails keeping the density of errors low (percentage of
modules defective). However, in the case of an integral architecture, things can go wrong
because any of the interacting pieces are defective; therefore we need to control the absolute
number of defects in the code. Controlling density of errors in the code base is much easier than
controlling the absolute number. In fact, as long as the density of errors in the old releases is
constant, the density in the underlying code base does not increase while the absolute number
increases proportional with new features released. Therefore, the more we move from a modular
to an integral architecture, the more important the feedback of past errors on error rate becomes,
and the more important it becomes to spend resources on bug-fixing. As a senior researcher in
Sigma puts it: "A good architecturehelps control entropy."
94
The dynamics of bug-fixing and error-creates-error also highlight the importance of an
initially robust architecture to reap the benefits of multiple-release PD. Just as these dynamics
can be detrimental when they are activated in the negative direction, a robust, modular
architecture would allow for reliable and high-quality development of several releases of a
product, with much less resources than needed to develop similar products independently.
Discussion
In the model and analysis part of this essay, we built a simple model of multiple-release
product development that focuses on the interactions among the development of the product and
the product in the field, as well as the architecture and underlying code the new releases are built
on. When looking only at the development process, the tradeoff between quality and
productivity in the face of pressure can result in tipping dynamics. The level of Resource Gap
determines where the stable equilibrium and the tipping point for the system are, and the
additional pressure brings these two points closer together, as well as closer to the maximum
capacity of the PD organization. Consequently, there is a tradeoff between robustness and
capacity of PD process; increasing the demand on the system can be beneficial in terms of
additional output but increases the chances of tipping into firefighting mode as a result of
unexpected changes in the PD work environment. Moreover, beyond the tipping point, an
increase in pressure results in less output, driving the PD organization into increasing
firefighting.
When we expand this framework to include the products in the market, we observe the
current engineering dynamics. These dynamics further highlight the importance of quality since
resources need to be allocated to CE when product quality is low. This consideration reduces the
level of output a PD organization can sustain across multiple releases. Moreover, the delay
between changes in the quality and the observation of their effect on CE workload creates a trap
for the PD organization in which adaptation to increase output of the process can actually take
the system into the firefighting region before the problem is known through feedbacks on the
ground.
Finally, inclusion of underlying architecture and code base strengthens the continuity of
multiple release product development. First, bug-fixing requires resources and further reduces
the effective capacity of the PD process for developing new features. Moreover, the delays are
95
longer for bug-fixing, which makes the adaptation trap deeper and more significant. Finally, the
effect of the code base and architecture quality on the quality of current development strengthens
rework, current engineering, and bug-fixing dynamics and can seal the system in firefighting
mode by increasing the costs of change.
Multiple-release product development is continuous in nature since different releases of
the product are based on the same platform, share resources for development activity, trade
features from one release to another, and share resources for current engineering and bug-fixing.
Consequently, the firefighting dynamics discussed cannot be contained in a single development
project; rather, they pass on from one release to another in a contagious fashion. Once the PD
organization tips into firefighting, it is increasingly difficult for future releases to avoid these
dynamics. This is why these dynamics can erode the PD capability: they reduce the performance
of the PD organization in multiple consecutive development projects. In fact, once in the
firefighting mode of behavior, new norms can shape around low quality practices, making future
improvements harder. In contrast, if these dynamics are successfully avoided, the PD
organization will be able to stay in an equilibrium with low error rate, improve its processes, and
build a culture of high quality practices. A Beta manager reflects on his experience:
"[T]he error rate drives a lot of what the overall dynamicswould be. [In my group] if
we detect an error, not only do we go andfix the error, but I ask where in the process did we
break down we investigate2 or 3field escalationsa month here, and not many of those result
in a change if this error rate goes high, thenyou are constantlybattling the new investigation
and you can't take the time to step back and say why exactly it was introduced. When my team
goes back and looks at it and says, aha, we probably should have seen this in our code
inspection, we really take that to heart and when we focus on the next round of development,
they would say, "rememberthat thing, we are going to make sure that we are watchingfor that
kind of stuff " So it increases awareness.Again, we can do that because the error rate is low."
In short, the Rework, Current-Engineering, Bug-fixing, and Error Creates Error
dynamics can erode PD capability through sustained firefighting. The PD organization enters
firefighting not only as a result of unexpected demands when the organization is critically
loaded, but also does so following intendedly functional adaptation to get the maximum utility
out of the PD resources. Moreover, the dynamics pass on from one release of the product to
another, therefore eroding PD capability. These dynamics degrade the architecture and code
96
base of the product line to the extent that the PD organization has difficulty making high-quality
products any more, sealing the product line in the firefighting mode.
We developed the model based on the case study of two software products. However, the
basic dynamics are based on a few boundary assumptions that can hold in domains beyond the
software industry and therefore can inform understanding of multiple-release product
development processes. The first critical assumption is that there is a level of resource gap
beyond which the average output of the organization per resource unit decreases, rather than
increases. When this assumption is true, we have an inverse U-shape Resource Gap-Feature
Release curve and the rework dynamics exist. In fact, if this assumption is not true, the best
option for a PD organization may be overloading the group completely with an unmanageable
feature demand so that ever-increasing pressure maximizes the output; the practices of successful
PD organizations do not suggest the feasibility, not to mention the optimality, of such a policy.
Another assumption in the model is that the quality problems are not lost, i.e., there is a
cost for a low-quality product that the company will shoulder through some mechanism,
including fixing errors, warranty and lawsuit costs, lost revenues, lower customer loyalty, and
loss of brand name, among other costs. This might be close to a truism across the board, but the
cost of quality problems differs from one case to another. The higher the cost of finding errors in
later stages (e.g., when the product is in the field vs. when the product is under development), the
stronger are the firefighting and contagion dynamics. Moreover, there needs to be a link
between cost of quality and the resources available to develop future releases. This cost can be
direct, as is the case for sharing current-engineering and development resources in software, or
indirect, as implied by resource allocation policies that give more resources to more successful
products in a company's portfolio. In the latter case, future quality costs influence the amount of
resources available to future releases as long as they are accounted for in evaluating the
product's success. The existence of quality costs after launch and the effect of these costs on the
current development resources are enough to generate the Current-engineering and Bug-fixing
dynamic (or their equivalents in other settings) and therefore the contagion of issues from one
release to another.
The other dynamic (Error Creates Error) rests on a third assumption: that the quality of
the current release impacts the quality of work for the next release. This is usually true for the
architectural aspects of the product that have strong impact on the quality of work as well as
97
future concept design and architecture. In software, this assumption holds also for code quality,
since the documentation of code and its seamless integration with other pieces influence the
quality of future code to be built on top of it. Appendix C discusses the boundary conditions of
the model in more detail and elaborates on conditions under which these dynamics hold and
when they are not important.
Erosion of organizational capabilities is theoretically important. On the one hand
capability erosion can take away the competitive edge of organizations. This effect is especially
salient in the case of the PD capability which is the cornerstone of building other capabilities in
dynamic markets (Teece, Pisano et al. 1997; Eisenhardt and Martin 2000). In other words, it is
not only important to build a capability, but it is also important to be able to keep it.
Moreover, processes that erode organizational capabilities overlap with those that inhibit
imitation and replication of these capabilities. Tipping dynamics and adaptation traps not only
can erode existing capabilities, but also can stand in the way of building a successful PD
capability. These dynamics create multiple ways in the evolutionary path of capabilities for
things to go wrong, and therefore reduce the chance that the PD organization can navigate its
way into a high performance arrangement. For example, the tipping dynamics suggest that
fluctuations in the level of workload can potentially tip the PD organization into firefighting. As
a new firm attempts to build its PD capability, there are many instances where an unexpected
surge of workload can tip the PD organization into a degrading mode of working fast and doing
low-quality work. Furthermore, attempting to get the most out of the PD process early on, when
the resource demand from the field is not yet realized, can result in overloading the organization
through the adaptation trap dynamics.
Therefore the tipping dynamics, the adaptation trap, and the contagion of firefighting
dynamics highlighted in this study complement the literature exploring the mechanisms that
make capabilities hard to build and sustain (including core rigidities, competency traps, and
rugged payoff landscape). The sources of capability erosion discussed in this essay add to the
literature on barriers to imitation of capabilities by highlighting the role of time in the operation
of adaptive processes. Curiously, in the case of the adaptation trap, not only does the
organization fail to realize better opportunities (as is the case in landscape complexity, core
rigidity, and competency trap), but also may actually degrade its performance through
adaptation.
98
By elaborating on some of the challenges in the way of successful PD processes, this
study offers some factors for when PD capability is hard to imitate and under what conditions it
is easy to build. If the dynamics discussed here play a visible role in the large picture of product
development across different industries, one would expect the strength of these dynamics to
partially determine the difficulty of building and maintaining a successful PD organization.
Following the resource-based line of argument, one expects PD to explain a larger fraction of
firm heterogeneity in industries in which PD is hard to build and sustain. Therefore, our study
offers a few hypotheses on the strategic importance of PD across different industries:
HI: All else being equal,PD is strategicallymore importantin industrieswhere multiplereleasePD is dominant.
This first hypothesis suggests that the move towards multiple-release product
development increases the interconnection between different products and therefore increases the
significance of firefighting dynamics. As a result PD capability is harder to build and has a
higher competitive value in such industries.
H2: Flexibilityof PD processes increasestheirperformanceheterogeneityacrossfirms.
If the quality of PD is highly dependent on flexible processes that individuals should
follow, the effect of pressure on error rate will be stronger. The logic behind this assertion is that
flexible processes give individuals more flexibility to mitigate the pressure by cutting corners,
taking shortcuts, and using different practices that lead to lower quality in the long run. As a
result, the Resource Gap-Feature Release curve will have a sharper U-shaped curve and the
firefighting becomes more salient.
H3: PD capability is more heterogeneousin industrieswhere quality of theproducts
launchedinfluencesthe resourcesavailableto current development.
The strength of the adaptation trap depends on how much the resources available to a
project are influenced by the quality of products which are launched. The stronger the feedback
of past product quality on PD resources (e.g., through CE and bug-fixing), the stronger are the
capability erosion dynamics and the higher is the competitive value of PD in an industry.
H4: The delay betweendevelopmentand observingthe qualityfeedback on PD resources
increasesthe heterogeneityof PD organizationsacross an industry.
The decoupling between the perceived and sustainable Resource Gap-Feature Release
curves, and therefore the risk of falling into the adaptation trap depends on the delay in the effect
99
of old development quality on the current PD resources. The longer the delay between
development and feedback of quality on resources, the stronger is the adaptation trap and the
higher is the competitive value of PD in the industry.
H5: Quality consequencesof architectureandproduct base increase the heterogeneityof
PD organizations'performance in an industry.
In industries where architecture and product base have a strong influence on the quality
of current PD work, the dynamics of Error Creates Error and Bug-fixing are more salient and
make it harder to sustain a successful PD process. Therefore we expect that the stronger the
effect of current development on future architecture and product base, the higher would be the
value of PD capability.
Implications for Practice
This study highlights important practical challenges for companies which rely on their
product development capability. First, an important concern for the start-up companies is to get
their product to the market as fast as possible, with the least resources they can afford. This is
both because of limited availability of resources, and because of the importance of early presence
in the competitive market. Consequently, the quality of their PD work receives lower priority
than time to market and maximizing the feature release rate becomes the highest priority.
However, this is the recipe for adaptation trap: at the early stages, when current
engineering, bug-fixing, and other multiple-release dynamics are not active, the startup approach
yields a very high output which is not sustainable for future releases. During this period,
however, the norms of the PD organization are set and routines take shape based on potentially
unsustainable practices that want to get the product out as fast as possible. These routines make
it much harder during later releases to shift the organizing patterns of the PD process to
sustainable arrangements that emphasize better process and higher quality.
The challenge of managing a transition from the norms of quick-and-dirty design to
sustainable quality work is especially salient when companies expand through acquisitions. In
these settings, a large firm enters a new product market by acquiring a start-up in that market. If
the product is multiple-release in nature, e.g., software, the parent company will need to support
and encourage a transition from the start-up mindset to that of a successful multiple-release PD
process. In the absence of such a transition, the new line of product will slip into firefighting as
100
soon as current engineering and bug-fixing feedbacks activate. In fact, the challenges that one
product faces can then spill into other lines of development through the fact that resources are
shared among different PD groups.
Another important challenge in the light of the dynamics discussed in this study is the
strategy for managing customer demands and expectations. One of the main customer demands
is good product support after the sales, which includes current engineering. CE is the activity
with the least value added, compared to development and bug-fixing, because it only solves the
problems of one (or a few) customer(s), and does not influence the quality of future releases
significantly. However, demand for CE is very salient in the face of customer interaction.
Doing CE work solves an urgent problem, thereby it receives high visibility, and it can easily be
rewarded because it draws attention from different parts of the company, including the service
organization and the higher management, which want to see customer complaints addressed
quickly. Therefore, in practice, the PD organization often gives higher priority to CE than
warranted based on long-term considerations, thereby compromising the quality of future
releases to fix the current problems in the field. This prioritizing challenge is ironic because a
policy of ignoring customers' requests for firefighting may make them happier in the long run.
A different type of tradeoff with regard to customer demand is in the inclusion of new
features. It is generally assumed that including more features in the future releases of the
product benefits customers. Contrary to this belief, however, beyond the tipping point, trying to
incorporate more features will in fact reduce the output of the PD process. Under these
conditions, canceling features may in fact be a favor to customers, since it helps the PD process
to recover, to develop more features in total, and to get the next release out sooner and with
better quality. Therefore the relationship between customer satisfaction and the efforts to add
new features is contingent on the state of the product development organization: after some
point, pushing harder does harm to the employees as well as to the customers.
Finally, this study highlights the importance of evolution of the architecture and
underlying product base. If the resource gap in early releases compromises quality of
architecture and product base, the platform cannot last for long and will lose viability soon after
Error Creates Error feedback is activated. From a behavioral perspective, this is a noteworthy
risk: the feedback from quality of architecture and product base comes with long delays, and
therefore the immediate payoff to invest in architecture design, bug-fixing, and refactoring is
101
low. Under the resource gap pressure, the firm is more likely to skip these activities than the CE,
even though in the long run that can prove counter-productive.
By elaborating on the sources of erosion of PD capability, we can gain insight into some
practical ways to avoid the detrimental dynamics and to identify them before it is too late. The
following points detail these practical implications:
Setting the priorities in resource allocation between different activities: The allocation of
resources between different stages becomes critical when these different stages can be used to
achieve similar objectives. In a software setting, one can improve quality by having better
architecture design, better development, better testing, improved current engineering, and
additional bug fixing. However, the costs per unit of error avoided/fixed at each stage are very
different (Boehm 1981) and increase as we go towards later phases. The tipping-point and
adaptation-trap dynamics are reinforced by the mismatch between priorities given to these phases
and their benefits. If priorities were set according to benefits, then under pressure, the most
productive phases (in terms of quality of work and rework in next phases) would face the least
resource shortage, and therefore create the least harm, minimizing the amount of rework and cost
of fixing issues in later phases, and smoothing the inverse U-shaped curve. However, this often is
not the case. For example, initial design often gets lower priority than it deserves according to its
benefit in avoiding problems, and current engineering gets higher attention because of the
customer pressure. Therefore setting the priorities according to their respective effectiveness in
avoiding/solving problems is of high leverage. However, priorities often emerge from a set of
norms, operating procedures, and incentive structures used in an organization. Changing them
therefore requires reshaping some of the norms and procedures in the organization. The
appropriate practices are context-specific, yet a few examples of useful or harmful norms can
help clarify the idea further:
* Taking responsibility for bugs one creates can be very effective in reducing the cornercutting and taking of shortcuts, thereby increasing the priority given to developing the
software with high quality in the first place. A good example is Microsoft's practice of
having daily builds in which the person whose code breaks the build is responsible for setting
up the build the next day, as well as fixing his bugs immediately (Cusumano and Selby
1995). The practice is further reinforced by some nominal fines for the bugs, and even
stronger social sanctions, such as putting on a special hat if one's code had bugs the day
102
before. This example shows how a series of elements (in this case "daily builds," "norm of
fixing the bugs immediately," "norm of being responsible for build setup if you broke the
build the day before," "norm of social sanction in favor of quality") should come together to
create a coherent practice that changes the priority given to one activity (in this case initial
development).
*
Root-cause analysis of problems referred from the field can tie the problem to development
practices of individuals and thereby increase the priority of developing good code in the first
place. In Beta they applied this to all important bugs which were not found until the product
was in the field, used the results to find if the problem was a random mistake, a failure in the
process used by the group, or a failure to use the correct process. The latter had the most
negative effect on individual review, which had a clear message for members of the team:
don't skip process. Note that this method is very hard to use for a product which is already in
trouble, because root-cause analysis is time consuming and cannot be applied in a project
team which is under a lot of pressure and is receiving hundreds of bugs from the field every
month.
* Considering customer feedback in evaluating developers increases the priority of customers
and the current engineering activity, and therefore shifts the priority to CE rather than more
efficient parts of the project. The current personnel evaluation practice in Alpha reinforces
this problem: managers from different groups sit together and rank the project personnel. In
this setting only one manager is really aware of the quality of the work done by each
developer. The result is a process of alliance building and competition for attention in which
developers who have helped the services organization, or have positive comments from
customers, have a much better chance of success. In practice, the organization ends up with a
very strong incentive for current engineering and firefighting to save the customers, and with
little incentive to develop high-quality product in the first place, the recipe for falling into
firefighting.
Regulating the pressure: The pressure that results in cutting corners and shifting emphasis
away from quality is central to tipping and adaptation trap dynamics. Therefore, regulating that
pressure is one of the ways to avoid the detrimental dynamics. Pressure is often regulated
through subtle cues in the interactions in the organization. Examples include management
reaction to requests for vacation, work hours (especially those of the managers), conversations
103
around need for more work and coming deadlines, whether people are eating lunch in their
offices, and so on. What picture of pressure is painted in these interactions, and how this picture
is perceived with respect to importance of quality, are critical to the dynamics we discussed. The
actions of the management have a strong role in creating this picture and their mental model of
how pressure affects the output of the group is critical in shaping their actions. Therefore,
discussing these dynamics explicitly can help increase awareness and modify the management
stance on pressure. Moreover:
·
Knowledge of the pressure is often available on the ground, but the information gets filtered
out in communication with higher management. People fear to be perceived as shirking
responsibility, lazy, and weak, if they complain about pressure. Removing sources of
communication break-down is potentially helpful for streamlining the communication of the
excessive pressure. Potential tools include anonymous feedback, group sessions with an
external facilitator, and role-modeling of pushing back on extra pressure. Note that these
tools can only help create trust ifthe management team does not believe that extra pressure is
always helpful.
*
Buffer time: Putting an explicit buffer period at the end of the project (or at the end of
milestones in the middle of the project) can help remove the pressure and encourage people
to spend time on quality. The period should not be assigned for doing any task; rather, it
should be there to make sure that people don't feel overwhelmed by upcoming deadlines, that
all problems are satisfactorily fixed, and that unexpected events don't catch the team off
guard.
Portfolio management: The relationship between different releases, as well as different
products, through resource sharing, often makes tipping and adaptation trap dynamics
contagious. Therefore, if problems are at portfolio level, the leverage points would also be at this
level rather than at project level: if pressure persists at the portfolio level, most project-level
efforts to alleviate pressure will be futile. A critical, and common, error at portfolio level is to
overwhelm the PD organization with too many projects. Having a good sense of the real capacity
of the PD organization is hard to build, and it is easier to start a project than to kill it;
accordingly, there is often a bias towards having too many projects. A few ideas can help avoid
or at least alleviate this problem:
104
·
Measurement of the organization 's capacity: Historical data can be a good starting point to
measure the overall capacity of the development organization. Such measurement should
take into account not only the initial design and development, but also all the post-design
activities of current engineering, bug fixing, refactoring, etc. In software, measures of work
such as lines of code, number of changes to the code base (delta count), and numbers of new
modules exist. These measures each have their own drawbacks, yet they are better than the
alternative of having no measure of the work. Capacity can then be calculated based on the
number of people used historically to finish some specific amount of work, as well as its
corresponding post-development activities. A growth bias should be carefully included in
such measurement: if the organization is growing, the post-development activities are not at
their equilibrium level; rather, they lag behind the development. Therefore a simple scaling
of the past to future will downplay the significance of these activities, and will surprise the
organization when the CE work continues to grow while the new development has come to
equilibrium.
·
Strong gates to start new initiatives: It is much easier and less costly not to initiate projects
than to pull the plug on them later on. Having established gates to initiate new projects is
therefore crucial for successful management of the portfolio of projects. The current and
expected capacity/resource balance should be a strong part of consideration in allowing a
new project to start.
·
Having a resource buffer: Having a resource buffer helps avoid overwhelming the
organization, can allow for timely introduction of critical projects, and can help identify the
movement of the organization towards tipping point. A resource buffer can be best
implemented by adding a safety factor (on the order of 10-20%) to the bottom-up estimates
of finish dates. Besides the pressure regulation effects of such a buffer (discussed above), it
allows for the organization to avoid going over the tipping point, as a result of a regular crisis
in resource availability, customer demands, or technological change. Moreover, the buffer
allows for acceptance of strategically important time-sensitive projects before enough
resources are released for them through closing other active initiatives or acquiring more
resources. Finally, when the project level buffers show signs of being filled up, it is a timely
sign that the project loads are mounting beyond the capacity.
105
Monitoring the practices on the ground: The tipping dynamics are created because the
practices that help build quality are abandoned in favor of working faster. Therefore, monitoring
the practices used on the ground is required for detecting any potentially detrimental practice
shift and avoiding it before it is too late. Monitoring design and development practices is more
useful than monitoring CE load or other downstream activities, because it is too late to find out
about deterioration of quality when the product is out in the market, and the organization may
already have tipped into firefighting. The following issues need be considered in the monitoring
process:
·
A subtle dimension of the dynamics we discussed is that quality practices may change only a
little due to pressure, but the standards of acceptable practice also change based on the
practices used, and therefore the eroding standards allow for poorer performance in future.
Such standard adjustment amplifies the effect of pressure on quality and makes it harder to
detect. Empirical evidence suggest that this standard adjustment is asymmetrical, i.e., the
quality standards erode much faster than they build up; therefore the practices can degrade
following random, small changes in the pressure (Oliva and Sterman 2001). Consequently,
there is a need for external standards for practices used in the development organization. The
list of important practices is context-specific, yet such a list should be created, and explicit
standards need to be set for each practice based on benchmarking and best practice studies.
*
The measurement of these practices is not straightforward. Often people feel unsafe and
don't reveal what is against formally advised practices. Therefore, providing a safe
environment in which to discuss these issues is very important. A helpful consideration in
creating such an environment is to separate measurement and feedback/incentive processes.
To avoid measurement biases, the workers need to be sure that their response does not have
any impact on their evaluation and incentive structure. For example, it is possible to assign
the review of the practices to an outside team, e.g., the research arm of the organization. Of
course, the usefulness of such monitoring depends on how seriously it is used by the
management group in the organization.
This study has several limitations which open up room for further research. First, the
model is built based on two case studies in the software industry and therefore not all potential
processes that matter to dynamics of product development could be observed. For example, the
CE effect is very strong in software, but not so strong in the case of automobile development,
106
while manufacturability is central in the latter case but not important in software. Consequently,
the generalizability of the conclusions cannot be decided until further empirical work with a
larger number of PD organizations is conducted. The hypotheses developed in the discussion
section offer one avenue for such studies.
Another question that merits further attention is the persistence of the firefighting
dynamics. These dynamics persist despite the time for learning, multiple chances to experience
different policies (multiple releases), and high stakes for the PD organization. Some of the
discussions in this essay allude to this persistence question -- for example, the contagion of
dynamics from one release to another -- yet a deep understanding of this question is crucial if a
PD organization wants to enhance its learning from experience.
Finally, beyond generalizations about resource gap and quality, this study did not go into
details of remedies for firefighting dynamics and what needs be done to avoid them. For
example, one may expect loosely structured processes for development to increase the effect of
pressure on quality. However, it is possible to combine flexibility and high quality by frequent
synchronization of different development activities and providing quick feedback on quality of
recent work (Cusumano 1997). Recognizing and designing such processes is of great interest to
practitioners who need realistic ways for avoiding the adaptation trap and getting out of
firefighting, rather than a description of how one may fall into these dynamics.
107
References:
AbdelHamid, T. and S. Madnick. 1991. Software project dynamics: An integrated approach.
Englewood Cliffs, NJ, Prentice-Hall.
Amit, R. and P. J. H. Schoemaker. 1993. Strategic assets and organizational rent. Strategic
Management Journal 14(1): 33-46.
Barney, J. 1991. Firm resources and sustained competitive advantage. Journal of Management
17(1): 99-120.
Black, L. J., P. R. Carlile and N. P. Repenning. 2004. A dynamic theory of expertise and
occupational boundaries in new technology implementation: Building on barley's study
of ct scanning. Administrative Science Quarterly 49(4): 572-607.
Boehm, B. W. 1981. Software engineering economics. Englewood Cliffs, N.J., Prentice-Hall.
Brown, S. L. and K. M. Eisenhardt. 1995. Product development - past research, present findings,
and future-directions. Academy of Management Review 20(2): 343-378.
Burgelman, R. A. 1991. Intraorganizational ecology of strategy making and organizational
adaptation: Theory and field research. Organization Science 2(3): 239-262.
Cooper, K. G. 1980. Naval ship production: A claim settled and a framework built. Interfaces
10(6).
Cusumano, M. A. 1997. How microsoft makes large teams work like small teams. Sloan
Management Review 39(1): 9-20.
Cusumano, M. A. and K. Nobeoka. 1998. Thinking beyond lean: How multi-porject management
is transformingproduct developmentat toyota and other companies.New York, Free
Press.
Cusumano, M. A. and R. W. Selby. 1995. Microsoft secrets: How the world's most powerful
software company createstechnology,shapes markets, and managespeople. New York,
Free Press.
Cyert, R. M. and J. G. March. 1963. A behavioral theory ofthefirm. Englewood Cliffs, N.J.,,
Prentice-Hall.
Dierickx, I. and K. Cool. 1989. Asset stock accumulation and sustainability of competitive
advantage. Management Science 35(12): 1504-1511.
Dosi, G., R. R. Nelson and S. G. Winter. 2000. The nature and dynamics of organizational
capabilities. Oxford; New York, Oxford University Press.
Eisenhardt, K. M. 1989. Building theories from case study research. Academy of Management
Review 14(4): 532-550.
Eisenhardt, K. M. and S. L. Brown. 1998. Time pacing: Competing in markets that won't stand
still. Harvard Business Review 76(2): 59-+.
Eisenhardt, K. M. and J. A. Martin. 2000. Dynamic capabilities: What are they? Strategic
Management Journal 21(10-11): 1105-1121.
Ford, D. N. and J. D. Sterman. 1998. Dynamic modeling of product development processes.
System Dynamics Review 14(1): 31-68.
Forrester, J. W. 1961. Industrial dynamics. Cambridge, The M.I.T. Press.
Gawer, A. and M. A. Cusumano. 2002. Platform leadership . How intel, microsoft, and cisco
drive industry innovation. Boston, Harvard Business School Press.
108
Henderson, R. and 1. Cockburn. 1994. Measuring competence - exploring firm effects in
pharmaceutical research. Strategic Management Journal 15: 63-84.
Henderson, R. M. and K. B. Clark. 1990. Architectural innovation - the reconfiguration of
existing product technologies and the failure of established firms. Administrative Science
Quarterly 35(1): 9-30.
Krishnan, V. and S. Gupta. 2001. Appropriateness and impact of platform-based product
development. Management Science 47(1): 52-68.
Krishnan, V. and K. T. Ulrich. 2001. Product development decisions: A review of the literature.
Management Science 47(1): 1-21.
Leonard-Barton, D. 1992. Core capabilities and core rigidities - a paradox in managing new
product development. Strategic Management Journal 13: 111-125.
Levinthal, D. A. 1997. Adaptation on rugged landscapes. Management Science 43(7): 934-950.
Levitt, B. and J. G. March. 1988. Organizational learning. Annual Review of Sociology 14: 319340.
Lovas, B. and S. Ghoshal. 2000. Strategy as guided evolution. Strategic Management Journal
21(9): 875-896.
MacCormack, A., C. F. Kemerer, M. Cusumano and B. Crandall. 2003. Trade-offs between
productivity and quality in selecting software development practices. Ieee Software
20(5): 78-+.
Meyer, M. H., P. Tertzakian and J. M. Utterback. 1997. Metrics for managing research and
development in the context of the product family. Management Science 43(1): 88-111.
Milgrom, P. and J. Roberts. 1990. The economics of modem manufacturing - technology,
strategy, and organization. American Economic Review 80(3): 511-528.
Molokken, K. and M. Jorgensen. 2003. A review of surveys on software effort estimation. IEEE
International Symposium on Empirical Software Engineering, Rome, Italy.
Morecroft, J. D. 1983. System dybnamics: Portraying bounded rationality. OMEGA 11(2): 131142.
Morecroft, J. D. W. 1985. Rationality in the analysis of behavioral simulation models.
Management Science 31(7): 900-916.
Muffatto, M. 1999. Introducing a platform strategy in product development. International
Journal of ProductionEconomics60-1: 145-153.
Nelson, R. R. and S. G. Winter. 1982. An evolutionary theory of economic change. Cambridge,
Mass., Belknap Press of Harvard University Press.
Oliva, R. and J. D. Sterman. 2001. Cutting corners and working overtime: Quality erosion in the
service industry. Management Science 47(7): 894-914.
Peteraf, M. A. 1993. The cornerstones of competitive advantage - a resource-based view.
StrategicManagementJournal 14(3): 179-191.
Putnam, D. 2001. Productivity statistics buck 15-year trend. QSM articles andpapers.
Repenning, N. P. 2000. A dynamic model of resource allocation in multi-project research and
development systems. System Dynamics Review 16(3): 173-212.
Repenning, N. P. 2001. Understanding fire fighting in new product development. The Journal of
ProductInnovationManagement18:285-300.
Rivkin, J. W. 2001. Reproducing knowledge: Replication without imitation at moderate
complexity. Organization Science 12(3): 274-293.
Rudolph, J. W. and N. P. Repenning. 2002. Disaster dynamics: Undrstanding the role of quantity
in organizational collapse. Administrative Science Quarterly 47: 1-30.
109
Sanderson, S. and M. Uzumeri. 1995. Managing product families - the case of the sonywalkman. Research Policy 24(5): 761-782.
Sanderson, S. W. and M. Uzumeri. 1997. The innovation imperative : Strategies for managing
product models andfamilies. Chicago, Irwin Professional Pub.
Sauer, C. and C. Cuthbertson. 2004. The state of it project management in the uk 2002-2003.
Oxford, University of Oxford: 82.
Simon, H. A. 1979. Rational decision making in business organizations. American Economic
Review 69(4): 493-513.
Sterman, J. 2000. Business dynamics: Systems thinking and modeling for a complex world. Irwin,
McGraw-Hill.
Teece, D. J., G. Pisano and A. Shuen. 1997. Dynamic capabilities and strategic management.
StrategicManagementJournal 18(7): 509-533.
Wernerfelt, B. 1984. A resource-based view of the firm. Strategic Management Journal 5(2):
171-180.
Williamson, O. E. 1999. Strategy research: Governance and competence perspectives. Strategic
Management Journal 20(12): 1087-1108.
Winter, S. G. 2003. Understanding dynamic capabilities. Strategic Management Journal 24(10):
991-995.
Zajac, E. J. and M. H. Bazerman. 1991. Blind spots in industry and competitor analysis implications of interfirm (mis)perceptions for strategic decisions. Academy of
Management Review 16(1): 37-56.
110
Appendix A- Detailed model formulations
The following table lists the equations for the full model. To avoid complexity, all switches and
parameters used to break feedback loops are removed from the equations.
Allocated Bug Fix Resources = Allocated Resources[BugFix] Units: Person
Allocated Resources[Function] = Min(Desired Resources[Function]/SUM(Desired
Resources[Function!])*Total Resources, Desired Resources[Function]) Units: Person
Desired Resources[Development] = Des Dev Resources
Desired Resources[CurrentEng] = Desired Resources for Current Engineering
Desired Resources[BugFix] = Desired Bug Fix Resources
Development Resources Allocated = Allocated Resources[Development]
CE Resources Allocated = Allocated Resources[CurrentEng]
Allocated Bug Fix Resources = Allocated Resources[BugFix]
Average size of a new release = 40 Units: Feature
Bug Fixing = Allocated Bug Fix Resources*Productivity of Bug Fixes Units:
Feature/Month
Des Dev Resources = Features Under Development / Desired Time to Develop / Productivity
/ (-Normal Error Rate*Frac Error Caught in Test) Units: Person
Desired Bug Fix Resources = Errors in Base Code / Time to Fix Bugs / Productivity of Bug
Fixes Units: Person
Desired Resources for Current Engineering = Sales*(Errors in New Code
Developed+Effective Errors in Code Base)*Fraction Errors Showing Up / Productivity of CE
Units: Person
Desired Time to Develop = 10 Units: Month
Dev Work Pressure = if then else (Resource Pressure Deficit < 0.99, Resource Pressure
Deficit, ZIDZ (Des Dev Resources, Development Resources Allocated)) Units: Dmnl
Eff Architecture and Code Base on Error Rate = (Modularity Coefficient*"Tl Eff Arc & Base
on Error"(Fraction Old Features Defective)+(l -Modularity Coefficient)*Max (1, Errors in
Base Code)) Units: Dmnl
Eff Pressure on Error Rate = TI Eff Pressure on Error Rate (Dev Work Pressure) Units:
Dmnl
Effect of Pressure on Productivity = TI Eff Pressure on Productivity (Dev Work Pressure)
Units: Dmnl
Effective Errors in Code Base = Min (Effective Size of Old Release Portion, Old Features
Developed)*Fraction Old Features Defective Units: Feature
Effective Size of Old Release Portion = 0 Units: Feature
Error Becoming Old = Features Becoming Old*Fraction New Features Defective Units:
Feature/Month
Error Frac in Released Feature = Error Fraction*(l-Frac Error Caught in Test) / (I-Error
Fraction*Frac Error Caught in Test) Units: Dmnl
Error Fraction = Min (1, Normal Error Rate*EffArchitecture and Code Base on Error
Rate*Eff Pressure on Error Rate) Units: Dmnl
111
Error Generation = Feature Release*Error Frac in Released Feature Units: Feature/Month
Errors in Base Code = INTEG(Error Becoming Old-Bug Fixing, Errors in New Code
Developed*Time to Fix Bugs / Desired Time to Develop) Units: Feature
Errors in New Code Developed = INTEG(Error Generation-Error Becoming Old, Normal
Error Rate*(l-Frac Error Caught in Test)*New Features Developed / (l-Normal Error
Rate*Frac Error Caught in Test)) Units: Feature
External Shock = 0 Units: Feature
Feature Addition = Fixed Feature Addition+PULSE (Pulse Time, Pulse Length)*External
Shock / Pulse Length Units: Feature/Month
Feature Release = Rate of Development*Fraction of Work Accepted Units: Feature/Month
Features Becoming Old = New Features Developed / Time Between Releases Units:
Feature/Month
Features Under Development = INTEG(Feature Addition-Feature Release, Fixed Feature
Addition*Desired Time to Develop) Units: Feature
FINAL TIME = 100 Units: Month
Fixed Feature Addition = 4 Units: Feature/Month
Frac Error Caught in Test = 0.9 Units: Dmnl
Fraction Errors Showing Up = 0.1 Units: Dmnl
Fraction New Features Defective = ZIDZ (Errors in New Code Developed, New Features
Developed) Units: Dmnl
Fraction of Work Accepted = I-Error Fraction*Frac Error Caught in Test Units: Dmnl
Fraction Old Features Defective = ZIDZ (Errors in Base Code, Old Features Developed)
Units: Dmnl
Modularity Coefficient = 0.5 Units: Dmnl
New Features Developed = INTEG(Feature Release-Features Becoming Old, Average size of
a new release) Units: Feature
Normal Error Rate = 0.4 Units: Dmnl
Old Features Developed = INTEG(Features Becoming Old, Initial Old Features) Units:
Feature
Productivity = 0.1
Units: Feature/(Person*Month)
Productivity of Bug Fixes = Productivity*Relative Producitivty of Bug Fixing Units:
Feature/(Person*Month)
Productivity of CE = Productivity*Relative Productivity of CE Units:
Feature/(Person*Month)
Pulse Length = I Units: Month
Pulse Time = 10 Units: Month
Rate of Development = Development Resources Allocated*Productivity*Effect of Pressure
on Productivity
Units: Feature/Month
Relative Producitivty of Bug Fixing = 0.5 Units: Dmnl
Relative Productivity of CE = 1 Units: Dmnl
Resource Pressure Deficit = SUM (Allocated Resources[Function!]) / Total Resources
Units: Dmnl
Sales = 10 Units: /Month
Time Between Releases = Average size of a new release / Feature Release Units: Month
TIME STEP = 0.125 Units: Month
112
Time to Fix Bugs = 9 Units: Month [0,20]
"TI Eff Arc & Base on Error" ([(0,0)(1,2)],(0,1),(0.13,1.13),(0.28,1.34),(0.42,1.68),(0.54,1.84),(0.72,1.94),(1,2))
Units: Dmnl
TI EffPressure on Error Rate ([(0.5,0)-(2,2)],(0.5,0.8),(1,1),(1.15,1.1 1)
,(1.29,1.33),(1.40,1.54),(1.53,1.74),(1.68,1.89),(1.99,2))
Units: Dmnl
T1 Eff Pressure on Productivity ([(0,0)-(2,1.5)],(0,0.6),(0.26,0.61),(0.45,0.64)
,(0.64,0.73),(0.830.84),(1,1),(1.22,1.2),(1.43,1.28),( 1.77,1.32),(2,1.33)) Units: Dmnl
Total Resources = 100 Units: Person
Appendix B- The maximum PD capacity
With a few assumptions, we can draw the analytical expressions for long-term capacity of the
product development organization as modeled in the essay. These assumptions include:
-
That products remain in the field for a fixed time
-
Product modularity is not very high, therefore we need to keep the absolute level of
problems in the code base and architecture low.
The logic for deriving the capacity is simple. We can write the total resources needed for
development, CE, and bug-fixing in terms of Feature Release and Resource Gap. Since in this
model, allocation is proportional to request, the ratio of resources requested to those allocated are
the same across the three functions and equal resource gap. Therefore we can find the resources
allocated to each activity in terms of resource gap and Feature Release. Consequently, knowing
that when resource gap is above 1, the total resources allocated equal the total resources
available, we can find Feature Release in terms of resource gap, which is the equation we need to
find the optimum capacity.
Formally, we can write resources allocated to each activity:
RD~
FR
fp (RG).PN.(1-
R.
eN fe (RG).fr 7 )
FR.eN.fe(RG).(1- fr7 ).k
* 1
PN.(1-eNfe(RG).fr7).RP(,,
_ FR.eN.fe
(RG).(1-fr7.)
(4)
RG
*
1
PN.(1-eN.fe(RG).frr).RP,, RG
And note the conservation of resources:
R =RD + RCE+RBF
(6)
113
Plugging the equations 3-5 into equation 6, and solving for FR, we find the following
relationship that describes Feature Release as a function of resource gap:
_
FR
R'PN (1RG
fp(RG)
fe (RG).frT ).RG
+eN.f,(RG).(l- frT).[
k
RPE
+
1
(7)
RPF
Note that, as expected, in the special case where bug-fixing and current engineering don't need
any resources (e.g., by letting RPCEand RPBF go to infinity), the equation 7 is the same as
equation 2, as discussed in the text.
To find the maximum FR rate, one needs to take the derivative of FR with respect to RG (for
which we need specific fe and fp functions) and equate it to zero. Finding the optimum RG
(through numerical or analytical solution), we can plug it back into equation 7 to find the
optimum level of capacity.
Appendix C- Boundary conditions of the model
Here we review the main assumptions that different dynamics in the simulation model outlined
above depend on. This allows us to define the boundary conditions within which the conclusions
of the model are robust. We first discuss what assumptions each main dynamic is sensitive to,
and then elaborate on effects of different resource allocation schemes that can be used for
allocating resources between development, current engineering, and bug fixing activities.
The tipping dynamics:
In the tradeoff between productivity and quality, the tipping dynamics exist only if the aggregate
function has an inverse U-shape relationship between pressure and total output. In the case
where there is no current engineering and bug fixing, the inverse U-shape function exists if and
only if the following function has a maximum inside its feasible region (rather than on the
boundaries):
f,, (RG).(I - eN fe (RG).fr )
where fp is the function for effect of pressure on productivity, eN is the normal error rate, the fe()
is the function for effect of pressure on error rate, and the frT is the fraction of error caught in
testing.
Since the two functions f and fp are not defined explicitly, there is no general rule one can derive
to specify the boundary conditions for existence of such a maximum. Yet a few general patterns
are observable:
114
For the general class of functions fp=l+a.(RG-I) and fe= l+b.(RG-l1), the equation
simplifies to a second-order expression. Finding the maximum of that expression, and
putting the maximum in the boundary condition that the error rate should remain below I
(eN.(l+b.(RG-l))<l), we get the general condition for existence of inverse U-shape
function: eN.(1-b/a)+l/frT <2
This is a relatively weak assumption since a<l
(a= means we can completely compensate any resource gap by working harder) and frT
is often not so small (we find a significant fraction of errors before releasing the product).
Therefore for this general class of functions, the inverse U-shape assumption is realistic
for many real-world settings.
The higher values of normal error (eN) rate and fraction of errors caught in testing (frT)
increase the possibility that the shape of the function is inverse-U. In other words, if the
process has a high error rate, and if we have a good testing capacity, we are more prone
to the tipping dynamics. Note that the latter conclusion does not necessarily hold when
we include the CE dynamics (see below).
The smaller the slope of fp and the larger the slope of fe(), the higher are the chances of
having inverse U-shaped functions. In other words, if the effect of pressure on
productivity is small, and the effect on quality is strong, the tipping dynamics are more
applicable.
Tipping point dynamics are impacted by the presence of CE and bug-fixing. Basically,
the introduction of CE and bug-fixing increases the costs of low-quality work, so that not
only the cost of rework plays a role in the output function (the (1 - eN.fe (RG).fr ) term
in the feature release equation becomes important), but also the cost of fixing the bugs at
customer sites or later in the code base haunts the development organization (he
eN.fe (RG).(l - fr7 ).[
k
I
+ RP ] term in the denominator of feature release; see
RP(_ RPBF
Appendix A). As a result, the total amount of work which can be managed at higher
levels of pressure further decreases. Moreover, this term introduces an effect for poor
testing in reinforcing the tipping dynamics; therefore we can no longer avoid the tipping
easily by having a poor test. In short, the introduction of CE and bug-fixing exacerbates
the tipping dynamics and increases the chance that we observe an inverse U-shaped curve
between pressure and output.
115
However, the strength of CE/BF effect on tipping dynamics depends on the effect of
pressure on quality; in other words, if quality is not impacted by pressure at all, the CE
and BF dynamics CANNOT produce a tipping point alone.
The adaptation trap
The adaptation trap is created when the quality feedback of current development comes with
some delay in the form of extra work to be managed later. The result is that an increase in current
development not only requires an initial increase in resources to manage such development, but
also will require further resources to be allocated later (to do current engineering and bug fixing).
This dynamic exists as long as we have a significant CE or bug fixing component in the work.
Moreover, the strength of adaptation depends on the following considerations:
-
How fast the pressure can change, compared to how delayed the CE and BF feedback is.
If the change in pressure is faster than the feedback, the adaptation trap is more
important. Specifically, if the CE feedback arrives with significant delays (e.g., long
testing and sales lead times separate initial development and observation of CE
feedback), the adaptation trap is a significant danger to the organization. Alternatively, if
CE/BF feedback comes fast, the management team will have little time to be misled by
apparently productive pressure increases.
-
Strength of CE/BF effect influences the adaptation trap in two ways. First, if CE/BF is in
general a large proportion of work, then adaptive increases in pressure, and consequently
in output, bring more significant surprises later, when the CE feedback becomes
apparent. This proportion is captured by the term
k
RP.^
+
1
~ in the simple model. The
RP,,
second effect is that the proportion of CE/BF effect is itself impacted by the quality of
work, i.e., low-quality work takes more CE resources per feature released. Therefore
adaptive increases in pressure not only increase the feature release, but also increase the
error rate and therefore the relative importance of CE/BF feedback. In short, the
significance of adaptation trap depends both on the magnitude of CE/BF effects and the
strength of effect of pressure on error rate. While the nature of work in many industries
limits the former to small levels and therefore reduces the importance of the discussed
dynamics, the processes used to manage the work can impact the latter factor and create
116
heterogeneity in presence of the adaptation trap within an industry or even an
organization.
Bug fixing and building on errors
Bug fixing dynamic is at core similar to CE effect, in the sense that it is created based on the
quality of work done in the past, and will take resources away from working on the current
release. Its strength, however, depends on some different factors. The main value in doing the
bug-fixing/architecture refactoring work is to avoid degradation of quality of future features, and
to some extent to avoid old bugs coming out in new releases. Therefore the importance of BF
activities depends on the extent to which such degradation matters:
-
If a long life is predicted for a platform, BF dynamics become more important, since the
development organization needs a reliable and productive platform for future releases.
However, if the platform is at the end of its life cycle, for the last couple of releases the
BF effect becomes less of an issue. This consideration, however, does not have an impact
on the "error creates error" dynamic.
-
To the extent that CE activity can fix the bugs in the product base, the importance of BF
activities goes down. In other words, if by fixing the problem for one customer, we can
also fix the problem for all future releases and customers, then BF activities remain
limited to work on the overall architecture. However, if CE activity is very customized,
then separate work needs to be done on the product base to avoid similar failures in future
releases of a product, increasing the importance of BF.
-
The modularity of the architecture has an important impact on significance of "error
creates error" dynamics (where the quality of new modules is impacted by the quality of
the product base). In a modular architecture, the new pieces have few interactions with
old parts, and therefore their quality is largely independent of those parts. Consequently,
the "error creates error" feedback does not play a significant role unless the density of
error in the product base increases. On the other hand, an integral architecture increases
the chances that a new module's quality is impacted by the quality of any other module in
the product base, and therefore increases the importance of "error creates error"
dynamics.
117
Resource allocation:
In the model we have assumed a proportional resource allocation scheme between development,
CE, and BF activities. Different allocation strategies have an impact on the results.
If we move towards increasing the priority of development activity, the effect of CE and BF start
to lose importance, and therefore:
- The tipping dynamics weaken (but don't go away; see the discussions above under the tipping
dynamics).
- The adaptation trap weakens and can potentially go away if development gets all its required
resources, before we allocate anything to current engineering or bug fixing.
- The impact of error creates error dynamics increases, because spending less resources on bug
fixing, we leave the problems to accumulate and impact the quality of current development more
significantly.
If we move towards increasing the priority of CE:
-
The tipping dynamics aggravate, since a bigger fraction of resource is sent to CE at
higher error rates, increasing the downward slope of the inverse U-shaped curve.
-
The adaptation trap becomes more significant.
-
The error creates error dynamics can be impacted differently, depending on parameter
settings: on the one hand, the fixing of bugs in the code base through CE can help
alleviate the error creates error feedback. On the other hand, the increased pressure on
development phase as a result of CE priority will actually increase the error rate in the
first place and therefore the magnitude of errors that need be fixed through BF. Finally,
higher CE priority will take away resources from BF activity.
In practice, it is often unlikely to give higher priority to BF, because it is the least salient, and not
productive in either producing new features or keeping the customers happy. Nevertheless, under
such practice, the effects on tipping dynamics and adaptation trap are similar to increasing CE
priority. The effect on "error creates error" dynamic is positive.
In Sigma, product Alpha was tilting towards giving a higher priority to CE, and Beta was close
to the proportional allocation scheme used in the model. In this essay we used the proportional
118
formulation, which takes a conservative stance with respect to significance of the main dynamics
discussed in the study.
119
Appendix 1-1- Technical material to accompany essay 1: Effects of Feedback
Delay on Learning
The following supplementary material includes the detailed formulations of the four learning
algorithms discussed in the essay, an alternative method for measuring the speed of internal
dynamics of the models, the statistics for parameters and results of robustness analysis,
formulations for Q-learning algorithm tested for comparison with the four learning models
discussed in the text, as well as instructions for inspecting and running the simulation learning
models discussed in the essay and the Q-learning model.
Mathematics of the models
This appendix includes the formulation details for all four learning algorithms; Regression,
Correlation, Myopic Search, and Reinforcement; as well as the exploration procedure. Please see
the online material on author's website for the simulation models and applets that allow you
simulate the models and run different experiments.
Table 1- Reinforcement Learning algorithm
Reinforcement Learning Algorithm
Variable
Formulation
d Vj
I(t)- Gj(t)
I, (t)
n
p(t)Rei forcemenetPower
*
G, (t)
V, (t) / Reinforcement Forgetting Time
A (t)
Table 2- Myopic Search algorithm
Myopic Search Algorithm
Variable
d
()
Formulation
((V*(t) - Vj(t))/AMyo* Direction(t)
Direction(t)
I
0
Historical
Smooth(P(t),HistoricalAveraging Time Horizon)
Payoff(t)
Where smooth function is defined by: d (smooth(x, r)) = x
if P(t) > Historical Payoff(t)
if P(t) • Historical Payoff(t)
__ _ _ _
dt
-
120
-
smooth(x,r)
Initial value of smooth(x, r) is equal to initial x unless explicitly mentioned
V* (t) .
FractionofActions(t) *
Vj(t)
Where FractionofActions (t) is:
A.(t)
Table
3Correlation
algorit(t)
Table 3- Correlation algorithm
Correlation algorithm
Variable
Formulation
d
dt
(Vt)-VJ(t))/2
(V(t)
V (t)
V, (t).f(Action _ PayoffCorrelationj (t))
Where f(x) is:
e 2*x
2.(1 + e2x
) and y is Sensitivity ofAllocations
to
Correlations
Action _ PayoffCorrelation j (t)
Action _ PayoffCo var iancej (t)
A ctionVariancej(t) * Payoffariance(t)
Action_ PayoffCovariancej(t)
smooth(ChangeinAction,(t) * ChangeinPayoff(t),
ActionLookupTimeHorizon)
Action Variance, (t)
smooth(ChangeinAction j (t) 2 , ActionLookupTimeHorizon)
PayofflYariance(t)
smooth( ChangeinPayoff 2 (t) , ActionLookup TimeHorizon)
ChangeinActionj (t)
Aj (t) - smooth(Aj (t), ActionLookupTimeHorizon)
ChangeinPayoff (t)
P(t)- smooth(P(t), ActionLookupTimeHorizon)
Table 4- Regression algorithm
Regression Algorithm
Variable
d Vj(t)
Vi ' (t)
Formulation
(Vj*(t) - V (t))/2
Max(aj' (s
aj (s),O),
I~~~~~~~~~~~~~
121
if at least one aj (s) is non-zero otherwise
V,*(t)
V 1(t)
s = E* INT(t/E)
where E is the evaluation period (the decision maker runs the regression
model every E time periods). INT function gives the integer part of t/E
a (s) = Max(O,a '(s))
aj* (S)
The ax '(s) are the estimates of a, in the following OLS regression
model: log(P(t)) = log(a0 ) +
E
cd,* log(Aj(t)) +e(t)
The regression is run every Evaluation Period (EP) periods, using all
data between time zero and the current time (every time step is an
observation).
Table 5- Exploration procedure
Exploration
Variable
Formulation
V,(t)Vj
(t)+ N(t)
Nj (t)
Nj
(t).
Vj (t)
J
Var (Nj (t))
S.D. Selector(t)
MinimumAction StandardDeviation+(MaximumAction Standard
Deviation-MinimumAction StandardDeviation)*S.D. Selector(t)
Min(1,Max(O,I/SlopeInverseforS.D. Selection*(Recent
Improvement(t)/MinImprovement-I))
Where Recent Improvement equals:
Smooth(Max(O, P(t)-Historical Payoff(T)),
PayoffmprovementUpdatingTime),Initial Recent Improvement=l
Speed of internal dynamics of the model
In the essay we used the convergence times of different models in the base case as a proxy for
the speed of internal dynamics of the models. An alternative measure which does not directly
depend on our implementation of convergence criteria can be obtained by fitting a first order
negative exponential curve to the base case (no delay) average performances of the algorithms.
Following the practice in control theory, we measure four times the time constant of the fitted
curve, which gives the time needed for the model to settle in the 2% neighborhood of final value.
We use this as a sensible measure for the time needed for the organization to settle down into a
122
policy, and therefore a good measure of the speed of internal dynamics of the model. The
following table shows the values of this measure for different algorithms:
Table 6- The alternative measure for speed of internal dynamics.
These results are consistent with the premise that delays tested in the model are not too long
compared to internal dynamics of the system.
Statistics for robustness analysis parameters and results
The independent parameters in the regressions included all those changed randomly for the
Monte-Carlo simulation analysis, picked from uniform distributions, reported below. For each
parameter we report the low, high, and base case values, as well as the description of the
parameter. In table 7 we also report the summary statistics for the dependent variables of the
regressions.
Table 7- Parameter settings for sensitivity analysis
Parameter
Algorithm Low High Bas Description
Payoff Generation T1 All
Delay[activity 1]
0
Perceived Payoff
Generation
Delaylactivityll
Action Noise
Correlation Time
Value Adjustment
Time Constant
T All
0
S All
I
A Regression 3
Correlation
Value Adjustment A Myopic
Time Constant
I
Maximum Action
All
Standard Deviation
Minimum Action
Standard Deviation
All
0.5
20
0
How long on average it take for resources
allocated to activity I to become effective
and influence the payoff
20 0 The decision maker's estimate of the
delay between resources allocated to
activity I and the payoff.
9
3 The correlation time constant for the pink
_
noise used for model exploration
20 10 How fast the activity value system moves
towards the indicated policy
8
2
How fast the activity value system moves
towards the indicated policy
0.05 0.2 0.1 What is the standard deviation of
exploration noise, when the exploration is
most vigorous
0
0.05 0.01 What is the standard deviation of
exploration noise, when the exploration is
at minimum
123
Minimum
Improvement of
Payoff
All
Slope Inverse for
Standard Deviation
All
Payoff
Improvement
Updating Time
All
3
20
5
Action Lookup
Time Horizon
Correlation 2
10
6
0.00 0.02 0.01 Normalizing factor for changes in payoff.
5
Changes lower than this will trigger
minimum standard deviation for
exploration
0
20 9 Inverse of the slope for relating recent
improvements to standard deviation of
exploration. Higher values suggest
smoother transition from high exploration
to low.
The time constant for updating the recent
payoff improvements (which is then
normalized to determine the strength of
exploration)
The time horizon for calculating
correlations between an activity and the
payoff
Sensitivity of
Allocations to
correlations
Reinforcement
Forgetting Time
Reinforcement
Power
Evaluation Period
Correlation 0.05 0.8
Reinforcem 3
ent
Reinforcem 4
ent
E Regression 1
20
20
9
0.2 How aggressive the activity values (and
therefore allocations) respond to the
perceived correlations between each
activity and the payoff
10 The time constant for forgetting past
activity values.
12 How strongly the differences in payoff
will be reflected in action value updates.
3
How often a new regression is conducted
to recalculate the optimal policy
Table 8- Summary Statistics for Dependent variables
Variable
Achieved Payoff
Percentage[Rgr]
Achieved Payoff
Percentage[Crr]
Achieved Payoff
Percentage[Myo]
Achieved Payoff
Percentage[PfR]
Observations
3000
Mean Std Dev Median
87.2
17.8
93.0
3000
75.6
25.4
83.3
3000
79.6
20.2
85.0
3000
81.7
18.7
86.7
Distance Traveled[Rgr]
3000
18.1
11.5
15.1
Distance Traveled[Crr]
Distance Traveled[Myo]
Distance Traveled[PfR]
3000
3000
3000
17.5
18.4
18.0
11.0
12.2
12.3
14.8
15.4
14.7
124
Convergence
Convergence
Convergence
Convergence
Probability[Rgr]
Probability[Crr]
Probability[Myo
Probability[Pfr]
]
3000
3000
3000
3000
0.236
0.122
0.178
0.200
0.424
0.328
0.383
0.400
0
0
0
0
A Q-learning model for resource allocation learning task
Among different learning algorithms we considered in the original design of this study, a
Q-learning algorithm taken from machine learning literature had some interesting properties to
make it a potential candidate for inclusion in the original essay. However, it was excluded from
the final draft under space constraints, its relative inefficiency in comparison with other models
discussed for a comparable number of trials, and complexity considerations. Here we briefly
discuss this algorithm because it gives a better insight into optimization-oriented learning
algorithms available in machine learning and artificial intelligence literatures, it has been
introduced into organizational learning literature (Denrell, Fang et al. 2004), and it helps build a
more concrete mathematical framework to relate complexity of learning to action-payoff delays.
In comparison with other reported models, this algorithm has a relatively high level of
information processing and rationality; however, it does not use any cognitive search, i.e., it
assumes no knowledge of the payoff landscape. Consequently the algorithm is quite general, but
is not optimal for the payoff landscape at hand.
In the development of this model we follow the general formulation of Q-learning
(Watkins 1989) as discussed by Sutton and Barto (Sutton and Barto 1998). Typical
reinforcement learning algorithms in machine learning (the general category to which Q-learning
algorithm belongs) work based on a discrete state-space and use discrete time. Therefore, to
adapt this algorithm to our task, which has both continuous time and action space, we break
state-action space of our problem into discrete pieces and use a discrete time in the simulations
reported here. Also, here we only report the development of the Q-learning model for the case
with no delays. As the analysis reported here and the discussions regarding the extension of the
model into delayed feedback show, such extension is possible but increases the complexity of the
model exponentially with the size of delays considered; and therefore the number of
experimental data-points to allow for a successful performance under that condition goes beyond
the neighborhood of what is considered feasible in an organizational setting.
125
The state-space in our setting depicts what resource allocation the organization is using at
the current time, i.e., resources allocated to each of the 3 actions, A1 (j=1 ..3); however, since the
sum of these resources equals a constant total resource (R=100) at each time, we have a 2dimensional action space, A, and A2 , each changing between 0 and 100, and with the constraint
that 100> A/+A 2. We break down each action dimension into N possible discrete values. We
choose N=12 in the following analysis since that allows for observation of the peak of the Cobb5
Douglas payoff function used in our analysis (PF=A 10 .A
0 3333
.A 3° 1667 );
2°
consequently, each Aj
can take one of the R/N*i values (i=O..N), provided that sum of Aj equals R.
To define the action space, we assume that at each period the organization can only
explore one of the neighboring states, i.e., by shifting 1/N resources from one activity to another,
or can stay at the last state. Consequently the action space includes 7 possible actions (No
change, Al to A2 , A2 to Al, Al to A3,
). However, at the boundaries of the state space (where
Aj=O for any of the j's) the feasible action set reduces to exclude possibility of existing the
feasible state space. Consequently our action-state space, (S, a), includes 559 possibilities, where
each S consists of a (Al, A2 ) pair and "a" can take one of its feasible values.
At each period the organization takes an action (goes from state S to state S', through
action "a"), observes a payoff (P) from that move, and updates the value of the action-states
Q(S,a) according to the following updating rule:
Q(S, a) - (1- a).Q(S,a) +a.(P +y. MaxQ(S', a'))
Parameter a is used to avoid too fast an adjustment of values in case of stochastic payoff
functions. Since here the payoff is deterministic, we use a very high discount rate, a=0.9, in the
analysis below. Central to the contribution of Q-learning algorithm is that it not only takes into
account the reward received from the last action-state, but also gives a reward for how good a
state that action has brought us to (maximum value available from S' through any action a').
Therefore this algorithm allows for taking into account the contributions of different states in
leading us towards better regions of payoff landscape and creates a potential avenue through
which one can capture the effects of time-delays more efficiently, i.e., by rewarding not only
what payoff we get at this step, but also what new state the action takes us to, a forward-looking
perspective. Parameter y determines the strength of that feedback on model performance.
The action taken at each period depends on the last state, the value of different actions
originating from that state, and the tendency of the organization to explore different possibilities.
126
Here we can have two types of exploration: one includes exploring actions which we have never
taken from the current state (exploring new (S,a)'s); the other reflects the organization's
propensity to try an action which has a lower value than maximum, even though it has been tried
before (exploring (S,a)'s which have already been explored). Both these types of exploration are
needed to lower the chances that the organization gets stuck exploiting a low-payoff location on
the state-space. Starting from a value of 0 for all Q(S,a)s and taking into account that all actions
in this landscape, except for those on the borders, give a positive payoff, we use the following
equations to define the probability of taking action "a" from state S, P(S, a):
P(S, a) =
V
v.Z+ (I - o).K
P(S, a) =
E
if Q(S,a)=0O
(-)
ifQ(S,a)>O
Q(S,a')" ' v.Z +(1- v).K
a'
Here u represents the attention given to exploring new actions and w represents the
priority given to exploring already-tried state-action pairs. In the following analysis we use the
values of 0.3 for u and 4 for o which maximized the speed of learning for this model on the
given task within a 3 by 3 ([0.1, 0.3, 0.5]*[1, 4, 7]) grid of values for u and o.28
Since this model follows a different logic both in defining the space and time states and
in exploration and value adjustment algorithms, we have to find the appropriate simulation
length to make a comparison between this model and those reported in the essay meaningful.
This is important in the light of the fact that given enough time/experimental data, the Q-learning
algorithm is guaranteed to find real values (what they pay off in an optimal policy) of different
state-action pairs, and therefore to be able to make optimal allocations. The important question is
then whether such convergence can happen faster or slower than with the other learning models,
and what is the impact of introducing delays on the time/trial requirements of this model.
The basic idea behind such comparison is the availability of similar amount of data about
the payoff landscape29 . Since such data arrives through the learning agent's exploration of the
28 To
find these parameter values, we ran linear regressions with dependent variable of average
payoff over 10 simulations of 100 periods and independent variable of TIME, to find the largest
learning rate (the largest regression coefficient on TIME) within the given parameter settings.
29 Note that because of the continuous time nature of the four original algorithms, the autocorrelation of exploration term, and the independence of their behavior from time step of
simulation (for dt<0.2), it is not realistic to count the total number of time steps in 300 periods of
127
landscape, gathering comparable amounts of data requires similar speeds for different algorithms
when walking on the state-space in a non-converging mode. That is, the distance traveled on the
state-space should be similar through the length of simulation for Q-learning and the other four
algorithms.
In the case of the other four models, the total distance traveled remains below 2030for
most of the cases (see Table 4 in the essay 1) where most simulations have not converged.
Therefore a reasonable comparison allows the Q-learning algorithm enough time to walk on the
state-space for a similar distance.
In the case of the Q-learning algorithm where activity dimension is divided into N
discrete values for each dimension, each step to a new position moves the organization on the
state-space for distance d:
d=
1 )2+
)2
N N
=
N
Taking into account the probability that the organization stays where it is in a random walk
(which equals 91 ), the distance traveled with each step is d'- 1/N. Consequently, M, total
559
number of periods the Q-learning model needs to run to travel the distance of 20, is:
M=20*N=240
Therefore a reasonable comparison between the Q-learning and other models can be
made when the Q-learning model is simulated for about 240 periods.
Using the parameter values reported above, we run the Q-learning algorithm for as long
as 600 periods in order to make sure that we observe the behavior of the learning algorithm
outside the range of data-availability allowed for other models. We average the results over 400
simulations for higher confidence.
continuous simulation and do a direct comparison with same periods for the Q-learning method.
Such logic will give increasing number of trial periods to Q-learning, as we reduce the dt for the
continuous model, which is obviously wrong.
30 The distance traveled on state-space was calculated in four original models based on summing
the Euclidian distance between points the algorithm visits every period on the 3-dimensional
space of fraction of resources allocated. We follow the same logic here.
128
Learning Performance
IAr,
1UU
90
a)
0)
-'
C
80
1.111OWNH11,01'
~l~r~l~hl~
70
60
o
QL
I'
50
40
.a
!
0
100
200
.
I
300
i
!
I
400
I- ,
I
500
I
600
Period (Q-Learning Scale)
-
Correlation
Myopic
- - Reinforcement
.... Regression
----- Q-Learning
Figure 9- The behavior of Q-learning algorithm (average of 400 simulations) in the resource
allocation task, no-delay condition, as compared to the other four algorithms (as reported in the
essay). Note that the Time scale is adjusted for the other algorithms to represent the equivalence
of data availability. Also note that by making the state-space discrete, the expected payoff of a
random allocation has decreased (mainly because of the 0 payoffs in the boundaries); thus the
lower start-point for the Q-learning algorithm.
As the graph shows, the Q-learning model does not find the optimum allocation even
after receiving over two times the data used by other models. The behavior of the Q-learning
algorithm under these conditions suggests that in order to find good policies this algorithm
requires far more data than the other four models, even under no-delay condition. This should
129
come as no great surprise since the discrete state-action space of this algorithm includes many
points for which a few trials need to be realized before a useful mental model of the space is
built. Moreover, this algorithm does not use any information about the slope of the payoff
landscape in moving to better allocation policies. Nevertheless, the amount of data needed by
this algorithm puts it at a disadvantage in terms of applicability for typical organizational
learning instances, where few data points are available. In fact, this is the main reason we did not
include this algorithm in the analysis reported in the essay.
Extension of this model to capture action-payoff delays through an enhanced
representation of state-spaces is possible; however it is very costly in terms of data requirements
for the model. In the simplest case, where we only can have delays of 1 period between action
and payoff, our state-space will need to include not only the current location on the payoff
landscape (the number of its members are O(N2 )), but also the last action. This is necessary
because both the current state and the state from the last period (which can be deduced from the
last action) contribute to the payoff for the current period. This additional complexity results in
expansion of the state-space and the corresponding data requirement to O(N2.A), where A is the
size of the potential action space (in our setting
7)31.
Consequently, to be able to learn about
processes with delays of length K the algorithm needs a state-space of size O(N2.AK), which is
exponentially growing with the length of the maximum delay in the space, resulting in
exponential growth of data requirements to learn about such a payoff function.
In the very general case one can consider a maximum delay length of K and arbitrarily
distributed delay structure, an arbitrary m-dimensional state space where each dimension can
take one of the N possible values, where exploratory actions are possible, and under an arbitrary
payoff function. The "NK" type models (Kauffman 1993), widely used in organizational
learning literature, are a special case of this general model (where N=2 in our terminology and
no delay is present). For our general case the complexity of state-action space is of the order
O(Nm.(Nm)K),where the first term, Nm, represents, in Levinthal's (2000) terms, the spatial
31 In
fact our model is limiting search to local exploration of neighboring states, and therefore
size of A is limited to 7 (boundary states have smaller sizes; for an m-dimensional allocation
problem, size of A becomes m*(m- )+; here m=3). If long-distance explorative moves were
possible, as most models of organizational learning assume, the A set will expand to all the
states, and will itself be of size O(Nm), which increases the speed of complexity growth as a
function of delays even further.
130
complexity and the latter term, (Nm)K , represents the temporal complexity (see footnote 31 for
derivation). This clearly shows how temporal complexity quickly dominates the spatial
complexity for larger values of delays (K> I period).
The exponential growth of learning complexity with size of delay further reinforces the
main argument of this essay, that delays have an important impact on the complexity of learning,
and therefore should be considered explicitly and in more detail as they relate to organizational
learning.
Simulating the Q-learning model
Two versions of the Q-learning model are posted with the online material on the author's
website. Both are developed through Anylogic TMsoftware. One interface is a stand alone Java
applet with accompanying files and runs under any Java enabled browser with no need to install
additional software, the second includes the source code and equations of the model and can be
used for detailed inspection of model implementation after installation of Anylogic software, as
well as running the model.
To open the stand-alone applet, unzip the "Q-Learning_Applet_Files.zip" into one folder (all
three files need be in the same folder). You can then open the "Q-Learning-Visual Applet.html"
file in any web browser (e.g. Internet Explorer). You can change different model parameters and
extend the simulation time as you desire.
To open the detailed Anylogic model, you need to first download and install the Anylogic
software (you can get the 15-day free trial version from http://www.xitek.com/download/). You
can then open "Q-Learning-Visual.alp" in Anylogic. Different objects and modules of code are
observable on the left hand column and you can navigate through them and inspect different
elements of the model. You can run the model by clicking on run button (or F5) which compiles
the model and brings up a similar page as the attached Java applet. You can inspect the behavior
of the model both through the interface accompanying the model and applet, or by browsing
different variables and graphing them as long as you are in the run mode in Anylogic. For the
second purpose, go to "root" tab in the run time mode, where you can see all model variables and
can inspect their runtime behavior as well as final value.
131
The implemented interface allows you to observe payoff performance as well as the path of the
organization on the state-space, as well as change the following aspects of the model within a
simulation (just click pause, make the desired parameter changes, and click run to continue the
simulation):
I- Change the simulation time to allow for more information for the learning algorithm.
2- Change exploration parameters, u and w, for changing the propensity of the model to explore
new action-states or re-visit those already tried.
3- Change the value adjustment parameters, discount rate and strength of feedback to states that
become stepping stones for better payoffs (Parameter y in the algorithm above).
4- Change the shape of payoff function with changing the exponents of different activities' effect
on payoff. Note that Cobb-Douglas function requires the exponents to add up to 1.
You can reload the model by pressing the refresh button of your browser. You can also change
the speed of the simulation through the simulation control slider on the top left corner of the
applet.
]
:
Simulating the four other learning models
The simulation model used to examine the other four learning algorithms is also posted with the
online material. To open the model, download and install the Vensim Model Reader from
http://vensi m.com/freedownload.html. Then open "LearningModelPresent.vmf" from Vensim
file menu. Navigate through different views of the model using the "Page Up" and "Page
Down" buttons or by the tab for selecting views at the bottom left of the screen, or by use of
navigation buttons enabled on most of the views. View the equation for each variable by
selecting that variable and clicking the "Doc" button in the left toolbar.
Note that this model includes more functionalities than discussed in the essay. It allows you to
introduce noise in payoff and perception delays, and change the payoff function to linear.
Moreover, you can change any of the model parameters and explore different scenarios with this
model.
Simulating the model:
132
-
First choose a name for your simulation and enter it in the field for simulation name in
themiddleof the top toolbar: - '
-
__
_
-
Click on the SET button to the left of this name.
-
Now change the parameters of the model as desired. The best place to do so is to go to
the "control panel" view (Press Page Down until you arrive at the second view) where
you can change all the main parameters. The current values of the parameters are shown
if you click on each parameter.
-
Simulate the model by clicking the Run button in the top toolbar:
.
Examining the behavior:
-
The control panel includes graphs of the payoff, distance traveled in the space,
convergence time, and climbing time (time during which an algorithm improves its
performance), based on your latest simulation.
-
You can also use the tools in the left toolbar to see the behavior of different variables.
Select a variable by clicking on it and then click on the desired tool.
References:
Denrell, J., C. Fang and D. A. Levinthal. 2004. From t-mazes to labyrinths: Learning from
model-based feedback. Management Science 50(10): 1366-1378.
Kauffman, S. A. 1993. The origins of order : Self organization and selection in evolution. New
York, Oxford University Press.
Levinthal, D. 2000. Organizational capabilities in complex worlds. The nature and dynamics of
organizational capabilities. S. Winter. New York., Oxford University Press.
Sutton, R. S. and A. G. Barto. 1998. Reinforcement learning: An introduction. Cambridge, The
MIT Press.
Watkins, C. 1989. Learning from delayed rewards. Ph.D. Thesis. Cambridge, UK, Kings'
College.
133
Appendix 2-1- Technical material to accompany essay 2: Heterogeneity and
Network Structure in the Dynamics of Diffusion: Comparing Agent-Based
and Differential Equation Models
This supplement documents the DE and AB models, the construction of the different networks
for the AB simulations, the distribution of final sizes, the details of sensitivity analysis on Final
size, and the instructions for running the model. The model, implemented in the Anylogic
simulation environment, is available on authors website.
Formulation for the Agent-Based SEIR Model
We develop the agent based SEIR model and derive the classical differential equation model
from it. Figure 1 shows the state transition chart for individual j. Individuals progress from
Susceptible to Exposed (asymptotic infectious) to Infectious (Symptomatic) to Recovered
(immune to reinfection). (In the classical epidemiology literature the recovered state is often
termed "Removed" to denote both recovery and cumulative mortality.)
J
--------
Rate
Rate
Rate
Figure 1. State Transitions for the AB SEIR model
The states SEIR are mutually exclusive; in the AB model the current state has value one and all
others are zero. We first derive the individual Infection Rate flJ], then the Emergence and
Recovery Rates elj] and r[j].
InfectionRate:
134
I
Infectious and Exposed individuals come into contact with others at some rate, and some of these
result in transmission (infection). Disease transmission (the transition from S to E) for individual
j occurs if the individual has an infectious contact with an individual in E or I state. Contacts for
an E or I individual are independent and therefore the time between two contacts is distributed as
Exponential (L) where:
L = cj.H[j]
(1)
Where cj the base contact rate, equals cis or CEsdepending on whether individual j is in E or I
state, and H[j] is the overall heterogeneity factor for individual j (defined below).
The other side of the contacts for individual j is chosen among the N- 1Iother individuals based on
the following probability:
A[I].NW[I,
j]/ K[l]"
Probability(j contacting llj has a contact)= Pjj,l]=
2,[i].NW[i, j] /K[i]
A
(2)
Where the variables are defined as follows:
X [j]: The individual heterogeneity factor. It equals 1 for homogeneous scenarios and is
distributed as Uniform[0.25-1.75] in the heterogeneous case.
KUj]:The number of links individual j has in his network
NW[i,j] is 1 if there is a link between individuals i and j and 0 if there is non or i=j. I is the set of
all individuals in population.
T
captures the time constraint on contacts. In homogeneous scenarios it is set to 1 so that more
links for an individual result in fewer chances of contact with each link, and therefore compensates the
heterogeneity created by degree distributions across different networks. For heterogeneous scenarios
is
set at 0 so that the probability of contact through all links and do not depend on how busy people on two
sides of link are.
The basic idea behind this probability formulation is that contacts of E or I individuals are
distributed between other population members depending on their heterogeneity factor and their
connectedness.
The overall heterogeneity factor for individual j, H[j], is then defined to ensure that 1- Individuals
have contacts directly proportional to their individual heterogeneities (X [I]) 2- Individual contacts depend
135
on their number of links if in heterogeneous condition, and independent of that if homogeneous. 3- Total
number of contacts in the population remains in expectation the same as that for the simple DE model
(see section on driving DE model from AB one). Consequently:
H[j] = A[j].W[j].N
r
(3)
K[j] .WN
Where the variables are defined as follows:
WLj]:The weighted sum of links to individual j: E i[i].NW[i,
I
j]/K[i]r
(4)
WN: The population level weighted sum of W, is a normalizing factor and is defined as:
[i].W[i]
K[i]'
Finally, a contact between individuals j and I only results in the infection of I with probability i,
which equals iESor is depending on the state individual j is in (E or I).
Combining these three components (the contact rate for individual, the contact probability with
different individuals, and the infectivity), we can summarize the rate of infectious contacts between
individuals k and j as:
Z[j,k]=Limit((Probability of an infectious contact between k and j between t and t+dt)/dt, dt"O)
Zj,k]=-
L
N
NW[i,l].A[i].2[l]
,
NW[j, k].2[j].A[k]
(K[j].K[k])r
(6)
(K[i].K[l])
And since these contact rates are independent, for a susceptible individual, the rate of infection,
fIk] is given by:
f[k]=
Z[j,k]
(7)
jEEvS
Emergence and Recovery: The formulations for emergence elj] and recovery rj] are analogous.
Since we are modeling the residence times in each state to be exponential, in parallel with the DE
model, these rates are given as events that take A and B time units to happen, where A and B are
distributed as:
136
A=Exponential( 1/)
(8)
B=Exponential( 1/I)
(9)
And consequently e[j]=1/6 and rj]=1/E.
(10)
Deriving the mean-field differential equation model:
Figure 3 shows the structure of the mean field DE model.
Disease Duration
Emergence Time
tacts of
AsymptoImatic
Infecti ous
Infectivity of
Exposed iES
Contagion
Contact
Frequency for
Exposed cES
I
ils
Contact Frequency
for Infectious clS
Total Contacts of
Symptomatic
Infectious
Figure 2. The structure of the SEIR differential equation model.
137
To derive the DE formulation for the infection rate, fDE,sum the individual infection rates flj]:
fDE =
Z[j,k]
J
J
(11)
K
In the DE model, homogeneity implies all individuals have the same number of links, Koj]=K[k]=K, and
same propensity to use their links, x []= k [k]= 1. Therefore:
Z[jk] = ci, .N.NW[j, k]
NW[j, k]
(12)
k,j
NWUj,k]depends on the network structure, however, in the DE model individuals are assumed to be wellmixed. Therefore
NW[j,k]/ E
NW[j,k]=I/N 2
(13)
k,j
and
c
Z[j, k] -=
N
(14)
Substituting into the equation for infection rate we have:
fDE=
E
keS jEEvl
N =-Z(
N
N NkES
CEs.iEs
jEE
+ZCI
sils)
jEl
(15)
Which simplifies into the DE infection rate discussed earlier in the essay:
fDE= (CES is.E
+
(16)
cs.i,s.I).(S / N)
The emergence/recovery rate in the DE is the sum of individual emergence/recovery rates:
e,,, = X e[j] = / c and r, = Er[j]=I/8
(17)
jEl
i El
Which is the formulation for a first-order exponential delay with time constant
£ and
6.
The full
SEIR model is thus:
dS
dt
dt
-f
dE
d=
f - e,
dt
dI
dt
=e- r
(18)
138
f =(CFS
iEs.E +c.ibs.I).(S/ N)
(19)
e EE/
(20)
r=II1
(21)
Network Structure:
Connected: every node is connected to every other node.
Random: The probability of a connection between nodes i and j is fixed, p=k/(N-l), where k
is the average number of links per person (10 in our simulations, 6 and 18 in our sensitivity
analysis to population size) and N is the total size of population (200 in our base simulations, 50
and 800 in sensitivity analysis).
Scale-free: Barabasi and Albert (1999) outline an algorithm to grow scale-free networks.
The algorithm starts with mo initial nodes and adds new nodes one at a time. The new node is
connected to the previous nodes through m new links, where the probability each existing node j
receives one of the new links is m*KUj]// KU] where Klj] is the number of links of node j. The
probability that a node j has connectivity smaller than k after t time steps is then:
P(KUi]t<k)= l-m 2 t/(k2 (t+mo))
(22)
We use the same procedure with m=mo=k (average number of links per person). Following the
original paper3 2, we start with mO individuals connected to each other in a ring, and then add the
rest of the N-mO individuals to the structure. The model, available on the author's website,
includes the Java code used to implement all networks.
Small World: Following Watts and Strogatz (1998), we build the small world network by
ordering all individuals into a one-dimensional ring lattice in which each individual is connected
to the k/2 closest neighbors. Then we rewire these links to a randomly chosen member of the
population with probability p=0.05
(23)
Ring Lattice: In the ring lattice each node is linked to the k/2 nearest neighbors on each side
of the node (equivalent to the small world case with zero probability of long-range links.)
32 Albert
(2005). Personal communication.
139
Histogram of Final Size for different network and heterogeneity conditions
The histograms for final sizes for different network and heterogeneity conditions are shown below. The
heterogeneous cases are graphed in blue and homogeneous cases in red. Note that X axes is truncated in
middle for Connected, Random, and Scale-free networks (same scales) since for majority of middle points no
data existed. The small-world and ring lattice networks include the complete range. The Final Size for DE
(0.983) is shown in the picture with line and arrow head.
Connected Network
Random Network
20
-
12
10
15
I
a,
0,
o,
a,
0
S
0
10
0
8
6:
0.
0.
4
5'
2
01
L
-;"~ r ~ ~
, '~'
T
Gooddddddddoodddddd
ooooro
OOO
OO
C14MI
0CDXCD
0-
-
W
0000000°00000000°00000000
FinalSize
Final Size
Scale-free Network
Small-World Network
12
35
30
10
II
25
R
0
I
0
0
U
6
i
20
ZD
0.
CD
0.
15
I
4
I
10
..
0
1
k.
OEN"tM
000000
5
'!I
a
0
Wm
1
0
RZNMtI0911t-
00o 00 60 606600660ze
O Mq9NM10s-mg
0000000
000000000
OC
Final Size
Final Size
140
Ring Lattice Network
4n
1Z
10
8
T
1,
c.
6
a.
4
2
0
oo660060
_
N
"
I
Size
ooooooFinal
8
E
O
XO
Final Size
The Fitted DE Model
We calibrate the DE model to the trajectory of the infectious population in 200 simulations of
each AB scenario. The best-fit parameters for each infectivity (iEs and is) and emergence
duration () are found by minimizing the sum of squared error between the mean AB behavior of
the infectious population, IAB,and infectious population in the DE model, IDE,through time T,
subject to the DE model structure:
7'
min [II(t)-IAB(t)]
2.
(24)
is is't=0
We use the VensimTMsoftware optimization engine for the estimation.
Diffusion is slower in the small world and lattice networks, so the model is calibrated on data for
T =300 days (Connected, Random, and Scale-free networks), and T = 500 days (Small world and
Lattice) to include the full lifecycle of the epidemic. The estimated parameters are constrained
such that 0 < iES,iS and 0 < E < 30 days.
Below you can find the complete results of calibrated parameters (Medians and Standard
Deviations) as well as the goodness of fit between different metrics in the fitted DE and the AB
simulations. The fitted simulations fall within the 90/95 percent confidence intervals of AB
simulations for over 75/89 percent of cases in all network and heterogeneity conditions and for
all metrics.
141
Table 1. Median and standard deviation, a, of estimated parameters for the DE model fitted to
the symptomatic population. Reports results of 200 randomly selected runs of the AB model for
each cell of the experimental design. Compare to base case parameters in Table 1. The median
and standard deviation of the basic reproduction number, RO=CES*iES*E
+ clS*ils*5,computed
from the estimated parameters is also shown.
Parameter
Infectivity
Connected
of
H#
H=
H*
H=
H=
H*
H=
Ho
0.044
0.049
0.053
0.059
0.025
0.026
0.025
0.040
0.021
0.088
0.023
0.171
0.124
0.252
0.055
0.135
0.280
0.310
0.100
0.025
0.047
0.072
0.046
0.075
0.054
0.011
0.023
0.015
a
0.115
0.067
0.068
0.058
0.068
0.053
0.076
0.060
0.037
0.032
Median
14.47
10.87
12.14
6.788
9.827
3.998
21.05
16.30
6.607
3.562
a
4.567
5.139
4.881
5.323
5.798
4.836
7.813
8.527
13.05
10.70
Median
4.189
2.962
3.090
2.546
2.900
2.425
3.326
2.524
1.561
1.350
Infectious is
Incubation
Time s
Implied Ro =
CES*iES*|
R
H,
a
1.712
0.622
0.576
0.524
0.666
0.611
0.880
0.658
0.812
0.550
Median
0.999
0.999
0.999
0.999
0.999
0.999
0.998
0.998
0.985
0.987
a
0.025
0.049
0.017
0.050
0.042
0.071
0.040
0.059
0.056
0.043
+CIS*ilS*
z
Lattice
0.053
Median
Average
Small World
H=
a
of
Scale-free
0.045
Median
Exposed iES
Infectivity
Random
Table 2- The percentage of fitted DE simulations which fall inside the 95% and 90% confidence
intervals defined by ensemble of AB simulation, for each of the three metrics of Final Size, Peak
Time, and Peak Prevalence, under each network and heterogeneity condition.
Metric
Conf.
Bound
Connected
Random
Scale-free
Small World
Lattice
Bound
H
H
H
H
H.
H,
H.
H#
H.
H*
95%
99.5
94
96
95.5
94.5
92
97.5
91.5
96
95
90%
95.5
89.5
88.5
89
89
87.5
89.5
87.5
90
89.5
Peak Time,
95%
97
93.5
95.5
91.5
89
93.5
96
93.5
89.5
93
Tp
Peak Prev,
90%
95%
95
97.5
88.5
96.5
90
97.5
86.5
95.5
84.5
92.5
85
96.5
94.5
97
87.5
98
85.5
94
79.5
95.5
Imax
90%
92.5
93.5
89.5
83.5
86.5
94
92
91.5
75.5
85
Final Size
Sensitivity analysis on population size:
The sensitivity analysis repeats the AB simulations for populations of 50 and 800 individuals,
with 1000 simulations for each network and heterogeneity scenario. We do not repeat the
calibration operations for this analysis. The following tables report the statistics of the four
142
metrics as well as their comparison with the base case DE model for each population size.
Values of metrics for base case DE model are reported with the metric name (left column). The
fraction inside envelop is calculated for the time in which the final recovered population settles
into 2% of its final value in base DE, which is 76 days for N=800 and 57 days for N=50.
*/**/*** indicates the DE simulation falls outside the 90/95/99% confidence bound defined by
the ensemble of AB simulations.
Population
Connect
N=800
Random
ed
Final Size
(0.983)
Homogeneous
Heterogeneous
Peak Time,
Tp (57.2)
Homogeneous
Heterogeneous
Homogeneous
Peak
Prev., Imax
(27%)
Heterogeneous
Fraction
Inside
Homogeneous
Heterogeneous
%F<0.1
AB Mean
AB a
%F<0.1
AB Mean
AB a
AB Mean
AB a
AB Mean
AB a
AB Mean
AB a
AB Mean
AB a
Base DE
Base DE
Lattice
Scale
Small
Free
World
2.1
0.953
0.139
3.9
0.888***
0.179
99.6**
22.4
87.6*
23.4
18.8***
3.4
18.2***
4.1
0.355
0.421
2.2
0.820
0.237
4.8
0.640***
0.270
153.8
123.7
130.0
101.9
5.4***
1.6
5.0***
1.7
0.303
0.316
Lattice
3.1
0.952
0.170
3.9
0.902***
0.181
58.2
12.4
51.9
12.4
27.0
5.0
25.9
5.4
1
1
2.4
0.942**
0.148
3.5
0.885***
0.168
61.4
13.1
55.6
12.9
26.0
4.3
24.6
4.9
1
1
2.2
0.954
0.143
4.6
0.846***
0.186
62.7
12.8
45.9
12.6
26.3
4.2
24.2
5.5
1
0.816
Connect
Envelope
Population N=50
Final Size
Homogeneous
(0.983)
Heterogeneous
Peak Time,
Tp (37.8)
Homogeneous
Heterogeneous
Peak
Prev., Imax
Homogeneous
Random
Scale
Small
%F<0.1
ed
2.1
3.2
Free
3.4
World
2.8
AB Mean
AB a
%F<0.1
AB Mean
AB a
AB Mean
AB a
AB Mean
AB a
AB Mean
AB a
0.959
0.144
3.5
0.905
0.170
39.92
12.43
37.89
13.17
33.2
7.29
0.890*
0.166
5.5
0.814**
0.209
42.99
16.13
40.36
16.98
29.3
7.47
0.922
0.172
3.8
0.881**
0.176
39.8
14.9
37.2
14.1
31.7
7.80
0.803
0.251
4.7
0.698**
0.266
51.26
26.71
47.99
27.20
22.1
7.85
3
0.705
0.277
3.8
0.602**
0.267
48.52
28.68
46.86
30.05
18.9
7.11
33 The peak prevalence depends on initial susceptible fraction, besides infectivities, contact rates,
and delays and therefore slightly differs for two population sizes where same initial susceptible
creates different fractions.
143
Running the model
Two versions of the model are posted on the web. Both are developed through Anylogic TM
software used for mixed agent-based and differential equation models3 4 . One interface is a stand
alone Java applet with accompanying files and runs under any Java enabled browser with no
need to install additional software, the second includes the source code and equations of the
model and can be used for detailed inspection of model implementation after installation of
Anylogic software, as well as running the model.
To open the stand-alone applet, unzip the "ABDE-Contagion-applet.zip" into one folder (all
three files need be in the same folder). You can then open the "Dynamics of
Contagion_AprOS_Paper Applet.html" file in any web browser (e.g. Internet Explorer). The
interface is self-explanatory and has a short help file which can be opened by clicking the [?]
button.
To open the detailed Anylogic model, you need to first download and install the Anylogic
software (you can get the 15-day free trial version from http://www.xjtek.com/download/) and
then unzip the attached file "ABDE-Contagion-code.zip" in one folder. You can then open
"Dynamics of Contagion_AprOS_Paper.alp" in Anylogic. Different objects and modules of code
are observable on the left hand column and you can navigate through them and inspect different
elements of the model. You can run the model by clicking on run button (or F5) which compiles
the model and brings up a similar page as the attached Java applet. You can inspect the behavior
of the model both through the interface accompanying the model and applet, or by browsing
different variables and graphing them as long as you are in the run mode in Anylogic. For the
second purpose, go to "root" tab in the run time mode, where you can see all model variables and
can inspect their runtime behavior as well as final value.
34We thank the XJ Technologies development team for building the first version of Anylogic
model based on earlier drafts of this paper. The model was later revised and refined by authors to
precisely reflect the theoretical setting described in the paper.
144
The implemented interface has the following capabilities that you can use by either way of
accessing the model:
1- Run the model for different scenarios in single or multiple runs.
2- Inspect the behavior of the AB model through the total number of people in different states, as
well as the state transition rates and different confidence intervals for behavior of the infected
population in AB model (you can change the size of confidence intervals as well).
3- Visually inspect the spread of disease in the population. Different nodes and links between
them are shown and you can choose between 4 different representations of the network structure.
4- Observe the metrics of peak value, peak time, and final size.
5- Change all important model parameters, including contact rates (note that contact rates of
exposed and infected are described in terms of fraction of contact rate of healthy) and
infectivities, duration of different stages, size of the population, network type, heterogeneity,
number of links per person, and probability of long-range links in small-world network.
6- Save the results for populations of different states at different times into a text file (this feature
may have some problem under some browsers, however, it works fine if you install the model
through the second method and open it from inside Anylogic)
145
Appendix 3-1- Reflections on the data collection and modeling process
In this document I briefly discuss the process used for data gathering, model building, and model
analysis that I employed in the research for the essay Dynamics of multiple-release product
development.. I focus mainly on a few tools and activities that I found useful for myself and I
had not found discussed in detail in the literature. Therefore I leave the comprehensive
description of the qualitative research process and model building to relevant books and articles
(Glaser and Strauss 1967; Eisenhardt 1989; Strauss and Corbin 1998; Sterman 2000) and keep
this document short and operational. I make no claim of the efficiency of the processes I have
used; rather I will simply describe my experience and highlight process insights of this
experience. The intended audience of this appendix is other researchers who want to build theory
based on case studies using simulation, as well as model builders who are looking for ways of
doing their testing and analysis faster and more efficiently.
This research started in June 2004 and has continued in multiple steps, outlined chronologically
below:
-
June-August 2004: Interviews, archival data gathering, and hypothesis generation
-
July 2004: building a qualitative model
-
August 2004: Building a detailed simulation model and preliminary analysis
-
September-October 2004: Model analysis, building a simple insight model
-
November 2004: First write-up
-
December 2004-June 2005: Further data collection (both interviews and quantitative
data), and refinement of analysis and text
-
March 2005 on: Definition of other research questions in the research site and datagathering for these questions.
Four main activities are evident in this process: data gathering, model building, model analysis,
and model distillation/insight generation. The process of moving between these activities was
iterative, with the first iteration being the most intense.
Data gathering and qualitative model building
146
In the data gathering phase, the main source of information was interviews with individuals
involved with the development of two software products, Alpha and Beta. I entered the site with
some general impression of the problem I wanted to study: the quality, cost, and delay problems
faced by many new product development projects, when multiple projects are connected to each
other. These were very real challenges for product Alpha, and product Beta was distinguishable
for avoiding several of these problems3 5. Therefore the contrast between the two was promising
for making sense of the guiding questions.
I started the interviews with a set of questions (usually tailored to the specific person); however,
in general the interviews were open-ended and explored different themes relevant to the problem
at hand, as well as the mechanics of the development process and the relationships between
different groups involved. Interviews were often recorded and summarized at the end of the day
in daily report files. In essence, these summaries were shortened transcripts of the interviews
which I wrote down while listening to the audio files at the end of each day. Marking each topic
of conversation with a header and a time stamp (the corresponding minute and second of the
interview) helped accessibility of these files for future reference, while keeping their size small
and therefore their review more feasible. An example of these transcripts is reproduced below:
"13:50 Translatingfeatures
Product manager comes up with features, gives to system engineers, which for them have been
fine in Siebel, since the sameperson has been doing system engineeringacross multiplereleases
and was a well-understoodarea. So we got most of requirementswell. We stillfind problemson
customerexpectationnot being the same, but that is usuallytheproblem of VPs thinkingthat we
need a desktopapplication([PersonA] and [PersonB]) andpushing this, while this ended up
appearnot to be central at all
17:00 Feature additionsand drops
Weflipflop a little bit. QQQ it was driven mostly by thinking thatwe can sell it with higher
price, but customers didn 't want it. We have so much problem getting more than one application
with [complementary software x] on the screen that is problematic. "
35 In fact I started with product Alpha and added product Beta to my study when in mid-July (04) I1learnt about its
significantly better practices.
147
As the example shows, each paragraph tracks 2-3 minutes of conversation in a couple of crude
sentences; moreover, I did a quick coding for action items, data sources mentioned, good
quotations, questions to ask, etc. while doing the transcription, by leaving marks such as QQQ in
the text, where relevant. These were easy to automatically code in the Atlas.ti software that I
used for qualitative analysis and allowed me to quickly extract important action items, people to
interview, etc., at the end of the week. This transcription and coding process took about 1.5 times
the original interview length. In retrospect, coding for action items, quotations, and data sources
is helpful, but too much coding of content at this early stage is not so helpful.
Along with the data-gathering process, I built a qualitative model (in causal loops) of dynamics
that can impact multiple-release product development. The data-gathering and model building
activities were intimately inter-related: interview data provided new hypotheses to be added to
the loop-set, while the loop-set guided the questions to be asked for the confirmation of existing
dynamic hypotheses and for the exploration of the new relevant themes. Moreover, often the
loops played a valuable role as boundary objects to discuss the research with interviewees and to
quickly elicit their comments on other dynamics relevant to the problem at hand. Appendix 3-2
outlines the main dynamic hypotheses generated from this part of the process through causal
loop diagrams.
The loops capture the theoretical story behind the case. A good complement to this theoretical
view is the case history which allows the researcher to connect the theoretical model to empirical
facts in a coherent fashion. Moreover, the case history reveals the issues not yet covered in the
data-gathering process and helps orient further data collection. Finally, it can be used later on for
assembling the datasets needed for model calibration and validation. I used an Excel database to
build a case history for different releases of the Alpha project. Basically, a case history should
follow the evolution of different dimensions of a project through time. A spreadsheet provides a
good structure to do so: the rows of this spreadsheet represented the different dimensions of the
project which were relevant to my research interest (e.g., resources, schedules, costs, scope of
work, amount of work done, sales, etc.) and the columns included different months in the project
history. While going through archival data or interview notes, I transferred every interesting
piece of information to the relevant cell in this Excel file. This allowed for an easy and
comprehensive way of capturing a lot of data from different files into a simple project history.
Below you can see a picture of the structure used to capture the case history.
148
A few tips are helpful in constructing such a structure:
-
Include any piece of the relevant information and expand the categories (left hand side) to
capture them. Time pressure would not allow you to iterate this process many times so it
is better to capture all relevant information on the first run.
-
For each item entered into the cell, record which file or interview this data point has
originated from. This will facilitate future recovery, elaboration, and replication of the
work.
-
When you see tables or graphs with lots of useful information which are hard to capture
in a single cell, copy the bitmap image of the table or graph (e.g., using PrintScreen key)
and paste it into the Excel sheet. You can then minimize the picture's size to fit on a cell,
and put a description of content inside the cell the picture is covering (you can see a few
such tables in the example above).
Finally, the organization of data in the data collection phase is crucial to using them effectively.
A few tips on useful organizing schemes I used for the data collection phase follow:
-
A file/table tracked every document I received. I included the document name upon
receipt, along with details on the source of the document. Later, when I studied the
document in detail, I added some points on what might be the useful points in the
document, as it relates to my research (besides extracting the data relevant to the case
files).
149
-
Another file/table tracked all the interviews. The main fields included: Who did I
interview, when, where, for how long, did I record the interview, have I transcribed the
interview, have I coded the interview in detail.
-
A separate file/table recorded people whom I met in the field, their role, their contact
information, and any details I wanted to remember from them (e.g., their expertise,
interests).
-
Most of the numerical data were transferred into relational databases. This allows
efficient and quick query generation and will become handy for model calibration and
validation work. Moreover, most organizations have their data in such data-bases, so you
can try to integrate such already available data into your databases.
These files were helpful for me to keep the work organized in presence of hundreds of
documents, interview files, and individuals which I needed to manage. Moreover, they
improve the traceability and reproducibility of the research process.
Building the simulation model
The qualitative model discussed above informed the formulation of a simulation model for
multiple release product development. I started with the core mechanics of product development
for a single release, added some of the more salient feedback processes from the causal loop
diagrams, and then expanded the model to include multiple releases and the interactions between
these releases. Future refinements included more feedback loops and hypotheses to this model.
The detailed model formulations are listed in the appendix 3-3.
The simulation model resulting from this process was detailed (over 1500 variables and
parameters) and complex. Therefore the model required a strict discipline to ensure the integrity
and the consistency of the model after frequent additions of new pieces of structure. This need
was further pronounced by the tight time-line of model building on which I was working (about
I month to build the detailed model from scratch). A set of automated tests facilitated such
robustness analysis. I describe these tests below. This testing process is in nature quite similar to
the "Reality checks" feature of Vensim; however, I found the implementation of the latter
relatively confusing and therefore followed a custom procedure.
In short, I created a set of test cases, built them into a command script that executed the relevant
simulations consecutively, and ran the command script upon each modification of the model. I
150
then inspected a few relevant metrics in the control panel view of the model to see if the model
had broken down as a result of the last modification.
Design of the test cases- The correct behavior of the model under a set of extreme conditions and
theoretical scenarios is often predictable. For example, in absence of development resources, no
task will get done. These cases create the basis for automated testing. Below are a few such
cases that I used to test my model:
-
When there is no work to do, nothing should happen.
-
When there are no resources to work with, nothing should happen.
-
When we have unlimited resources for testing and development, the only bottleneck is design
and testing time, therefore eachstage should take as much as design timeplus a small
amountfor first orderfinishing of tasks in developmentand testingstages and the circling of
tasks between rework and testing.
-
When we install in many sites, we find most of the errors very fast. Ifthen the resources for
CE are unlimitedand all errors are consideredimportant,then all the errors are quickly
fixed and no errors remain.
-
Whenwe start all releasesat the same time (0), nothingstrangeshould happen with the
scheduled times to finish, etc.
-
When we take all the problematicfeedback loops out of thepicture, things should go
smoothly.
Each test case needs a condition that can be translated into parameter changes, and an expected
behavior which is easily inspected in the simulation output.
Building the command script- DSS version of the Vensim software has the capacities for writing
command scripts, pieces of code for controlling and using the software based on pre-defined
schemes. Therefore I wrote the command scripts that set up the model based on each test-case
scenario, and simulated it with a specific run name. See an example below (the first line includes
the test description).
! When we take all the problematic feedback loops out of the picture, things should go smoothly
SIMULA TE>SETVAL ISW EffArchtcr Complexity on Productivity = 0
SIMULA TE>SETVAL ISW Eff Old Arch on Design Problems = 0
SIMULA TE>SETVALISWEff Old Code and Archtctr on Error = 0
SIMULA TE>SETVALISWEff Pressure on Error = 0
151
SIMULA TE>SETVAL ISW Eff Pressure on Productivity = 0
SIMULA TE>SETVAL ISW Eff Bug Fix on Scope =0
SIMULATE>SETVALSW Effect Replan on Replan Treshold=0
SIMULA TE>SETVAL ISW Fixing Bugs=O
SIMULA TE>SETVALISW Endogneous Sales=O
SIMULA TE>RUNNAMEI TL -NoFeedbackActive. vdf 0
MENU>RUNI
0
Running the command scripts and inspecting the model behavior- Testing often leads to finding
problems in the model that need to be fixed. To avoid problems of version control (having
multiple versions of the model which each have fixed some aspects of the problem), I copied the
latest version of the model into a testing directory (which includes all the relevant command
files), ran the tests on that version of the model, and edited the model according to the problems
found. When finished with this round of testing, I returned this edited version of the model to the
main directory and saved it under a new version name. I could then build on this model for future
work.
I used a control panel which included several of the critical variables of the model to track the
behavior of the model in automated testing. Moreover, a few variables (such as the "minimum of
all physical stocks") were added to enable the faster inspection of the test results. Meaningful run
names help one quickly inspect each test run for potential flaws and go to the next in absence of
any flaws.
The concept of extreme condition testing is well established in SD literature (see Chapter 21 in
Sterman 2000). However, time constraints often cut down on the extensiveness of their
application. Automated testing is very quick in running a set of pre-defined tests on the model
and therefore allows the modeler to do so very regularly and ensure the robustness of model to
extreme conditions and different scenarios, with a little up-front time investment. In fact, for a
relatively complicated model, the establishment of the test cases and the command script took
me 2-3 hours, and afterwards each round of testing was very fast (a few minutes).
Another useful practice in the modeling phase includes the color coding of different variables in
the model. Color coding of variables helps the modeler navigate different views much faster, run
different tests easier, and make sense of different structures quicker. The basic idea is to have
152
distinct colors for different types of variables (e.g., parameter, data source, switch, table function,
etc). The following color codes were developed by MIT Ph.D. students in summer 200436and
account for visibility of parameters when in sensitivity analysis modes of Vensim software.
Other modelers are encouraged to use a similar scheme to improve compatibility and avoid
cognitive switching costs. Also short prefixes mentioned below can help organize the variables
better (e.g., SW for switch). For parameters and table functions, it is possible to write a simple
code to automatically color code the text version of the model. I encourage anybody interested to
write such code and share it with others.
Green ExogenousParameters, Dala.' DaLtaseries.Parameterrsfor lble fitnclions
Blue TL: Tablefunctions
Pink SW: Switch
Purple TST.:Testparameters/structuralparameters/exogenous
OrangeOutput variables,OPT.:Payoff variables
Brown IN.:Initial conditions
Red AF: Attentionflags
Model analysis and simplification
Much is written about the different facets of model analysis. Here I focus on a quick derivation
of behavioral loop dominance analysis (Ford 1999) which can be automated within available
capabilities of Vensim software. While this method is not as rigorous as eigenvalue analysis and
pathway participation methods (Forrester 1982; Eberlein 1989; Mojtahedzadeh, Andersen et al.
2004), it has the benefit of being already operationally available for large models, with some
fruitful results. Moreover, the method is intuitive and most modelers already use some version
of it in their work; therefore the cognitive costs of adopting it are low. I used this method to
create a quick ranking of more important feedback loops from a larger set of hypotheses in the
detailed simulation model.
The basic idea is to measure the sensitivity of a few basic metrics to existence of different
feedback loops in the model. Ranking of the loops in their impact on these metrics determines
which ones we want to focus on for further analysis. To do so we need to:
36
Thanks to Jeroen Struben and Gokhan Dogan for their contributions to this development.
153
-
Define a set of metrics about which we care (for example the cost and budget over-run) and
implement these in the model as explicit variables.
-
Run a sensitivity analysis switching off all the important loops in the hypothesis set, one at a
time.
-
Observe the sensitivity of different metrics to these loops and rank-order the loops based on
these sensitivities to find the more important hypotheses.
Vensim allows us to perform the above-mentioned sensitivity analysis very fast (in seconds). For
this purpose you need to have switch parameters defined for each loop in the hypothesis set,
which makes the loop inactive in the simulation (see Ford 1999). It is helpful to name and
parameterize these switch variables consistently, i.e., all of them should start with similar
prefixes and loops should be cut when their switch is 1 (or 0). With these settings in place, you
can use the optimization option of the Vensim, include your metrics in the payoff function, turn
the optimizer off, put sensitivity to "All Constants=l," and simulate the model. Such simulation
will automatically change all the parameters of the model, one at a time, 1% upper and lower
than their base value, and reports the changes in the payoff from the base case. The results will
be recorded in two .tab files, sensitiv.tab and sortsens.tab, one sorted alphabetically and the other
sorted based on the extent of sensitivity of payoff to the parameters of the model.
Since the switches in your model are put at 0, they will be changed to I and -1, the former of
which is equivalent to cutting the feedback loop. Therefore the analysis automatically cuts every
feedback loop (that has a switch variable) and observes the impact on the metric of interest. You
can then extract the subset of results which are related to loops by copying those records from
the alphabetically sorted "sensitive.tab" (remember that same prefixes were used for all the
switch variables) and analyze them in a spreadsheet. This process can be further automated
within a command script that changes the payoff function to investigate the sensitivity of all
different relevant metrics and extracts the results into a spreadsheet.
The main goal of this method is to simplify the process of selecting a set of more important
hypotheses to include them in smaller models, in write-ups of the study, or in the communication
with the client team. This was a salient problem in my study, where I wanted to distill the most
critical dynamic hypotheses from a list of over a dozen loops captured in the detailed model
discussed above. The strength of the impact of different loops on the behavior of interest is an
important consideration for such choices. This method facilitates inclusion of this consideration
154
more robustly. The graph below shows an example of the final results that can come from this
analysis.
Adding Late
Features
Cutting
Release
Quality
Criteria
More Time to
Releases
Being Ready
Finding More Starting New
to Replan Higher Test Problems in Releases
Coverage
Rework
Later
Sooner
Doing
Rework
Faster
4U
0
._O
30
C)
.
20
.E
0
10
0)
(O
C.
0)
X -10
-sn
Note that this procedure is not without problems. First, the set of feedback loops you choose are
often not comprehensive. There are algorithms to select an independent and comprehensive loopset (Oliva 2004) that can relieve this shortcoming. Nevertheless, the main goal of the above
method is not to find the dominant loop in the structure, but to find the relative importance of a
few different hypotheses (loops). Second, this analysis does not investigate the interactions
between loops because it cuts loops one at a time. It is possible to include interactions by doing a
full-grid sensitivity analysis of switch parameters with their values set at 0 or 1. Such analysis,
however, increases the computational burden to 2 N(from N) and needs further data reduction
methods (e.g. regression) to be applied on the outcome data to allow a ranking of loops from
such multi-dimensional data. Finally, this method requires the definition of relevant metrics,
which can be a complicated task in some settings (for example, when we care about oscillatory
behavior). In short, this method is a quick and dirty way of getting an impression on relative
importance of main hypotheses in their impact on a few metrics of interest. It is neither
155
comprehensive, nor flawless; however, in the real-world applications where limited time often
impacts the quality and thoroughness of the analysis, it can prove useful.
Another helpful practice in analysis phase includes examination of the results of a full sensitivity
analysis on all model parameters (as described above) one by one, guessing the impacts on each
of the metrics and then observing the real impact, and trying to explain the discrepancies
between our intuition and the observations. Doing this for the top (most important) 20% of
parameters often results in a very good understanding of the model behavior and its origins.
Model distillation
Case study and qualitative hypothesis generation often push the modeler to build relatively
complex models which include hundreds, if not thousands of variables, and are hard to maintain,
understand, and communicate. Above, I discussed a few tools I used to facilitate the
maintenance, testing, and understanding of one such model. A critical next step of this process is
to distill the critical dynamics of the model into a simple model which can be used to
communicate the results, write papers, and do more comprehensive simulation analysis or
analytical work on the model. The model documented in the essay ("Dynamics of multiplerelease product development") is in fact a distillation of the detailed model discussed above,
which allowed for reducing the number of variables over 20 times, while keeping most of the
central dynamics.
Contrary to intuition, in my experience, it was probably harder to formulate the latter type of the
model (small insight model) directly from the detailed data provided through a case study.
Taking the intermediate step of constructing a larger and more detailed model allowed me to
bridge the gap between the nuanced qualitative data, and the sweeping aggregations and
abstractions required for the insight model. Besides, the detailed model can play other roles in
calibration, management flight simulator development, and getting buy-in from the client. The
practices discussed above allowed me to speed up the process and make it feasible to do both
types of models in a relatively short time span (about 4-5 months of full-time work to go through
one round of this process); therefore I hope they can prove helpful for some other researchers.
References:
156
Eberlein, R. L. 1989. Simplification and understanding of models. System Dynamics Review
5(1).
Eisenhardt, K. M. 1989. Building theories from case study research. Academy of Management
Review 14(4): 532-550.
Ford, D. N. 1999. A behavioral approach to feedback loop dominance analysis. System Dynamics
Review 15(1): 3-36.
Forrester, N. 1982. A dynamic synthesis of basic macroeconomic theory: Implications for
stabilization policy analysis. Sloan School of Management. Cambrdige, MA 02142, MIT.
Glaser, B. G. and A. L. Strauss. 1967. The discovery of grounded theory; strategies for
qualitative research. Chicago,, Aldine Pub. Co.
Mojtahedzadeh, M., D. Andersen and G. P. Richardson. 2004. Using digest to implement the
pathway participation method for detecting influential system structure. System Dynamics
Review 20(1): 1-20.
Oliva, R. 2004. Model structure analysis through graph theory: Partition heuristics and feedback
structure decomposition. System Dynamics Review 20(4): 313-336.
Sterman, J. 2000. Business dynamics: Systems thinking and modeling for a complex world. Irwin,
McGraw-Hill.
Strauss, A. L. and J. M. Corbin. 1998. Basics of qualitative research: Techniques and
procedures for developing grounded theory. Thousand Oaks, Sage Publications.
157
Appendix 3-2- Feedback loops for multiple-release product development
The following set of feedback loops are developed as part of the research leading to the essay
"Dynamics of Multiple-release Product Development." The role of loops in the research process
is briefly discussed in Appendix 3-1.
These loops include several different hypotheses that came out of discussions with members of
the product organization. Only a subset of these loops are captured in the detailed simulation
model (Appendix 3-3) and a smaller fraction of the most important ones in the original essay.
In each page a set of loops which relate to a distinct area of product development and sales is
depicted. Short descriptions of loops are also provided.
158
0
n~
C
E
0
0
a
V
cn
C
cn
a
2 =
C
0'
cn
CC
0
A.
0
0)
L
C)
40)
C,)
Q
O
C
4
cn
4-,
0
a)
Q) O
.°i
,0
C
O
0
a)
e
D
O:2
0
o
0
Io
,
a
*) 2Qa)a
oL.
U1)
-
U
E) ,
U)
E
-
CC
-
U,)
a
0)4-'
L
I..
-W
0
a)
m
co
a) Y
L CLU
C
a)
(9i
o a
C:
a0
._
C
(U
Icr
C
oC
LL
E
2
-C,
L.
0
0
I..
0
._
\
la
C
E
o0
,-
C
'0C
L
r
Ucu
.n
0
CL
o
a)
,
0C
am- rPn
0
cm
o OCD
oaE>
o
0
)
O~~
Uf)
_
IL.
CO
U)~~a
0 a
U)
_0
4--
O
4CUU
0
o
a
=
E
(U
.C
U
O
.
4
_
Y
Lo .
0Ita
CD
C. a)
8'n
L-
O Q)
0O
O
0
C~~~L
O O
=
0
E
0.
a.
Z°
zO
UC)
0SC
CU
DC <B
0 L- o 4.,(L 0 4..CD=
0- U
y h
C
0 0n 3O
°O2 '
0 *r mL~
D
v)
-a
c)
0
-c
4-'
cnO
I_
U,
cb0
U4 - O
L
0~
U)
c
0
0
ca
or
L
'
.
9 AI
wY
at
°
*
c
U)
I
)
U)
(I)
C
U)
a>~~~~~U
cr
IX
ro a)M-
CD
E 0~
cn
0)
0,
C,4)O
-~U
4C)
In
E
+I
¢
4
a)
c
I0)
0
I
o~
E
.1
_
cT
I
O
c5a r
0)
CU
o
c
O
O 0
0UU
i
rcn
e m
a
Ea
EL
\
o
:D~~~~~~I
+
c
/0
+r
0
L1)
~~~4-'
0c
0
~
aI)
8D
n
L
Oc
aU
*)
o s
O
_D t CUa
O
c
@C
-l
j
_
'l
oC)
a)
C,
o
.-
U
+ 1
t
O
'4
IC0
-
_
..
)rE
U
cn
._
>
0
a
a
a)
0o
(
C-
a.
I'4
C
a)
0).
m
CD,,
CD) )>) rm)
L-
*10
=pn
u)~-Q ojo:E
C
CL
2
a-
..
U wa( 0
·z~
l~~
. .0OCO
..w- <-1.
_
-a
0
o>
o
)
- § a~
C0UC
C o,0
a,
(D
0 Q 'P
.- C
°e
O
cn
c:
0
/t *4
D a
0
c
n cn
O tS
U
5'
C;
cr
o 0o
cr3
_C 7..
It
LL
0
_o
C
a,
-
-%
L)
0
L)
a
Er
LI
2
o
.r
E
E
L)
a
C)
ac
~-0 o 8
I)
CD
a)
Lr-
-.- 0
C
I..
OD
a)a
a20
, --
L ..
CL
o
0
$,-. .U
P
o
fio
0
na,0~~
I
a,
E~
O. -o
1
I- c °0
3-
CoWCa-C
: ~~~0
o1
0
t)- o
n
.
.
0)
C
_
o
mt
C,)
-._ 2?e
+o_.
L-4>
(I
O30
x
C
)
zan
+
,
O
0ci/
0>
Q)
~~~~~~~~~~~~~~~~~3)
p
0
I=
4-
c
._
§) o L.
' O _
fmm .
=
,U
9
I
a) l ,
C,
cn
0)\
t
0
.&-:
<
E0
0
-
0
jw
\F
-
-
0
-6-0 w
a
)
(D
t
0Ds0
_
E
W
0
U)
$ 1L
\
f 0) (0
o~~~~~'
~0
E
,UL0
C
O
*0o
v~L
EIE
C]._'<
0
~~~~0
a~~~~~a
>0
E,,-.N
0
40
*2~2O
:)C
UEar L.2VE
I-
0O
U)
-a
0.
E
0
Ui)
C
4-
co
u
U1)
· 1
4-a
a)
4-
+~
U)
U)a
o0 o
ODO
o~~1
0
0
U +
00
E
.0
,
.-
-
.
*-)
n LL
0
I
0~~
O)
LZ
o2
+E
E
0
-+
C
0
m
2sP
Z0
B
0
-C
o .c
) L
_0
o
C coC
a:
C
co
aCD
o
,/
4-
()
.00
L) 't-~
E
LU
2
-E
Un
aD
0(I,E
aa
IL
+
co
zU)
0
0
0 .4.
0O
ci)
Co
I -MaL .~
U
ca
of
L;
4-a
.
O
O C
icn
.7
.D
0
l C
C) a)
cn
L
u4CO,
0
c
H
_
L
h.
cn
F/
_
0
-
S
a
em
U -C
0
90
(D
u
L o
,.-
X,
C\u
Q)
+ +Uo
U.
Q,>C/)
EZ
I.
'w0)
,o
E
ae
0g
a)
Ao
..
>s
Y-w
--
oU)O
L·~~~
~~LA
C
0O
0o
Ln
.o-a
0
rF~~~~~~~~U
a,
a.
0a)
O
Co
LQ
0.0
EU)
i-)
o(L)
j""'
4-
L
Qr
Q~~~~~~:
DO
r.
t(n
a)
(U
-a
) -t
cn
O O
0
a)
v
\o
0
CD
nC
a0
C/)
a>
cDa)
o
o
c
a
0
0
cn
ar
-4-a
CU
0
a,
a,
0 ..
)
E
'o0,
C)
Ocua
00f=
o
2
m~~~
0~~~
E
i
O
:=
\
E
o
o
o
0o
--
I )/
0
' a
E
L-
)
'-c °C
)cn
0,
0
0co
,
ocn
o0
~S
n
I/, /
+
/4
0
C:/) ~E
aU
a
/
4-
0
7
.U
EO. 0
OO
n
a)
en C
0O -'
.'o
a) u8
, o
n
I
(
o
P +
C)
0
0
aO
O
(9
0
2
+
0
3§
cC
,
I
-
.,.R i.T_/
E
o
E
n
Q,I
L.
cn
0Z
a,
D 0n
cn
8
a-
V E
C) g
i'3
-
80;~5
0
co
O '
0
8
()
m
730
-
-C
cn m -ia
o=o
aQ)
a)
0)
'
a
-to
m
c
D
-
CD0 -
w
a)
C
4
.
a0)
)
>5 (n
.
X
L-
-
:0
)
-C
t
n
0
E
0
co
0
cn
a)
o) 50
"U)
0n
c
00
5r
0mo
cu
co
0)D
a)o o
Coo ,o-2
o..E
cn0
c
0)
0)
0
.
-.
2
-
.
,r
O
E4)*
0m 0
0LL
50
)'
0Z
0 -0
cn Co 0 0 = 0
c0
0
'8
oo
U)
0
x
0
0
cu
a)
U)
*a
U
U-1i
1-i
,E
0F
CD
I-
Cn
i
I!
Appendix 3-3- Documentation for detailed multiple-release PD model
In this appendix the model formulations for the detailed simulation model (discussed in
Appendix 3-1) are documented. This model was produced as part of the research project leading
to the essay "Dynamics of Multiple-release Product Development." The model, however, was
too complex to be discussed in the essay. Appendix 3-1 elaborates on the role of this model in
the research process and discusses some of the testing and analysis tools developed for working
with this model.
The model is broken into different views, each focusing on a different aspect of the development
process and other related dynamics. Each view is followed by the equations of the variables
introduced in that view, sorted alphabetically. At the end of the document, all the variables of the
model are sorted alphabetically and their group/view is listed. Therefore you can find where to
look up the equations of any variable by looking up the group name at the end of the document
and going to the equations for the relevant group. This should prove most useful for tracking the
shadow variables in each view. The color codes used in these views are explained in Appendix
3-1.
The model includes design, development, testing, defects in the field, bug fixing, underlying
code base quality, scheduling, and sales and financial metrics of 6 consecutive releases of a
software product. Several major feedback structures relevant to quality and delays in software
projects are captured in the model. The main ones include: resource sharing between different
releases and current engineering; effects of pressure on productivity and error rate; code base and
architecture quality on error rate and design quality; architecture complexity on productivity; bug
fixing on scope; re-planning on re-plan threshold; quality, delay, and features on sales; profit on
resources; and endogenous start of different releases and endogenous sales.
At least three pieces of formulation in this model offer relatively new and potentially useful
options for modeling some common problems and therefore may merit closer attention. These
pieces are briefly mentioned below and the references to the relevant model sector are provided.
Coflow of tasks and errors - Historically, project dynamics' formulations of tasks and errors
either divide tasks into correct/flawed categories, or use a coflow of"task quality" (changing
between 0 and 1) or "changes" (required changes accompanying the flow of tasks) that goes with
tasks. In the current model I have used an alternative formulation that tracks the number of flaws
169
in the tasks developed, and uses the density of flaws in a consistent formulation to determine
chances of discovery of problems in quality assurance phases. This formulation is most useful to
follow the logic of many projects in which problems are conceptually different from tasks. For
example, in the software development process a module of code (a task) may have several flaws
(bugs) that can be discovered (become an MR: Modification Request) or not, and the density of
these problems determines the chances of their discovery in different testing phases. Details of
this formulation can be seen in "Development" and "Test" views of the model.
Tracking the problems in the field - As products are sold to customers, the bugs inside the
products also flow into different customer sites. These bugs can be independently found by
different customers, reported to the company, fixed, and patched for all, or part, of the existing
customer set. The coflow structure to capture these possibilities is non-trivial because each error
is flowing in all of the sites, but can be found in any of them. Therefore the enclosed formulation
may prove useful for other researchers who tackle similar modeling problems. "Defects in the
Field" view of the model elaborates on this formulation.
Resource anticipation between different releases with different priorities - In allocating
resources between different overlapping releases of a software, the project managers for each
release should perceive what the priority of their release is (depending on where in the queue of
projects it is located) and plan their schedules and expected resources accordingly. While in logic
this is simple, the implementation of the formulation in SD software is not without challenges
because as different releases start and finish, their priority changes, and their perceived priority
should adjust according to their current position in project priority queue. The formulations on
the "Scheduling" view of the model offer one solution to this modeling challenge.
This model includes 448 symbols, 17 lookups, 220 levels, 1151 auxiliaries, and 147 constants
(total of 1519 variables).
170
C
.
E
<
,
e
,t=
I ..
w
6
z
o
·/
D
,
I
,L
D
_
a
**** **************************
Concept and Design
********************************
Ave New Problem per Cancellation[Releasel = ZIDZ ( Problems Per Feature Interaction *
Features Designed[Release] , Current Scope of Release[Release] )
Units: Problem/feature
This equation assumes that cancellations have detrimental effects to the extent that they interact
with other features in the code. Therefore the problems they create also correlates with fraction
of features inside the code designed. However, for the part of code not designed, the
cancellations create no new issues, since those pieces can be designed with knowledge of
cancellation.
Ave Prblm in Designed Feature[Releasel = ZIDZ ( Problems in Designed Features[ Release],
Features Designed[Release] )
Units: Problem/feature
Average number of design problems in designed features.
Ave Problem in Features to Design[Release] = ZIDZ ( Problems in Approved Features[
Release] , Features to Design[Release] )
Units: Problem/feature
Average number of (coherence) problems in features to be designed.
Chng Start Date[R1] = 0
Chng Start Date[Later Releases] = if then else ( Endogenous Release Start Date[ Later Releases]
= FINAL TIME, if then else ( VECTOR ELM MAP ( Fraction code Perceived Developed[ RI],
Later Releases - 2) > Threshold for New Project Start, 1, 0) * ( Time + I - Endogenous Release
Start Date[Later Releases ] ) / TIME STEP, 0)
Units: Dmnl
This equation puts the release start date for later projects (not first release) at the TIME+1, as
soon as the pro ect before them is completed to a Threshold level in development phase.
Current Scope of Release[Releasel = Features Designed[Release] + Features to Design[
Release]
Units: feature
Defect Intro in Coherence of Feature Set[Releasel = Error Rate in Original Feature Selection[
Release] * Features Approved[Release]
Units: Problem/Month
Des Time Change[Release] = if then else ( "Release Started?"[Release] = I :AND: Sanctioned
Design Complete[Release] = 0, Allocated Design Finish Date[ Release] / TIME STEP, "Release
Started?"[Release] * (Allocated Design Finish Date[ Release] - Sanctioned Design
Complete[Release] ) / Time to Replan )
Units: Dmnl
We set new desired times for finishing the design based on what we get allocated.
172
Design[ReleaseJ = Features to Design[Release] / Design Time Left[Release]
Units: feature/Month
The development of requirements is enforced to finish by the time that designs are scheduled.
Design Time Left[Release] = Max ( Min Des Time, Sanctioned Design Complete[ Release] -
Time )
Units: Month
Design time left depends on the scheduled time for design and a minimum time needed to make
any design changes.
Eff Bug Fixing on Scope of New Release[Releasel = T1 Eff Bug Fix on Scope ( Relative Size
of Expected Bug Fixing[Release] ) * SW Eff Bug Fix on Scope + I - SW Eff Bug Fix on Scope
Units: Dmnl
The effect of bug fixing on scope of new release depends on how much bug fixing is there
compared to what we are planning to accomplish in the next release, if there was no bugs.
Eff Old Archtctr on Design Error = Tl Eff Old Arch on Design Problems ( Density of
Problems in Architecture ) * SW Eff Old Arch on Design Problems + 1 - SW Eff Old Arch on
Design Problems
Units: Dmnl
The effect of quality of old design and architecture on how good a job we do in design of new
pieces.
Endogenous Release Start Date[R1] = INTEG( Chng Start Date[R1] , 0)
Endogenous Release Start Date[Later Releases] = INTEG( Chng Start Date[Later Releases ],
FINAL TIME)
Units: Month
This release start date is set to start a release as soon as the former one passes a threshold in % of
code developed.
Error Rate in Feature Design[Release] = Normal Error Rate in Feature Design[
Release] * Eff Old Archtctr on Design Error
Units: Problem/feature
The error rate in designing features. Includes low quality requirements and unclear specifications
etc. Depends on the quality of old code and architecture..
Error Rate in Late Features[Releasel = Error Rate in Original Feature Selection[
Release] * Late Feature Error Coefficient
Units: Problem/feature
The error rate in the design of late features.
Error Rate in Original Feature Selection[Release] = 0.1
Units: Problem/feature
Rate of problems in coherence of combinations chosen for new features.
173
Exogenous Release Start Date[Release]
= 0, 6, 12, 18, 24, 30
Units: Month
Release Start Dates, as an exogenous parameter
Feature Cancellation[Releasel = Code Cancellation[Release] / Modules for a feature
Units: feature/Month
The speed of cancellation of features already designed, under time pressure.
Feature Requests = SUM ( Features Approved[Release!] )
Units: feature/Month
The rate of request for new features, mainly from customer needs and market pressure. For the
equilibrium, it can be put at the features approved level so that there is no net change in the total
number of features under consideration (assuming no cancellation)
Features Approved[Release] = Features Under Consideration * Scope of New Release[
Release] / TIME STEP * "Ok to Plan?"[Release]
Units: feature/Month
At the time of OK to Plan, a fraction of total features under consideration are finalized to be
designed.
Features Designed[Release] = INTEG( Design[Release] - Feature Cancellation[ Release], 0)
Units: feature
Number of features designed and ready to be coded
Features to Design[Release] = INTEG( - Design[Release] + Features Approved[ Release] +
Late Feature Introduction[Release], 0)
Units: feature
Number of features to be designed before being coded.
Features Under Consideration = INTEG( Feature Requests - SUM ( Features Approved[
Release!] - Feature Cancellation[Release!] ), Initial Features Under Consideration )
Units: feature
The number of features under consideration for future releases.
Fraction code Perceived Developed[Release] = 1 - XIDZ ( Perceived Estimated Code to Write[
Release] , Perceived Total Code[Release], 1)
Units: Dmnl
The fraction of code for each release that is perceived to be left for development.
Initial Features Under Consideration = 2000
Units: feature [0,12000] The constant for the initial number of features under
consideration for each release.
Late Feature Error Coefficient = 3
Units: Dmnl
174
How many times of the normal feature additions do we encounter problems with the design of
late features.
Late Feature Introduction[Release] = if then else ( Late Intro Time[Release ] = Time, Late
Features[Release] / TIME STEP, 0)
Units: feature/Month
The rate of introduction of late features. Controlled as an exogenous shock here.
Late Feature Problems[Release] = Error Rate in Late Features[Release] * Late Feature
Introduction[ Release]
Units: Problem/Month
Rate of introduction of problems into approved features through introduction of late features.
Late Features[Release] = 0
Units: feature
The number of new features introduced at some point in the process
Late Intro Time[Releasel = 15
Units: Month
Time at which the late features are introduced
Min Des Time = 0.25
Units: Month
Minimum time needed to finish the designs, even if after schedule.
Normal Error Rate in Feature Design[ReleaseJ = 0.1
Units: Problem/feature
Normal Error Rate in Feature Design
Normal Scope of New Release[Release] = 0.2
Units: Dmnl
The normal scope of the projects, as a fraction of all the features under consideration, if there
was no bug fixing.
"Ok to Plan?" [Release] = if then else ( Time = Release Start Date[Release], , 0)
Units: Dmnl
A switch which equals 1 at the time step when we decide to start a new release.
Perceived Total Code[Release] = DELAY I (Total Code[Release] , Time to Perceive Current
Status )
Units: module
The perceived total code for the release.
Problem Cancellation[Release] = Feature Cancellation[Release] * Ave Prblm in Designed
Feature[ Release]
Units: Problem/Month
175
The problems are removed along with cancelled features.
Problems from Cancellation[Release] = Feature Cancellation[Release] * Ave New Problem
per Cancellation[ Release]
Units: Problem/Month
The feature cancellation can result in addition of design problems for the rest of the code, since
pieces are removed from otherwise coherent architecture.
Problems in Approved Features[Releasel = INTEG( Defect Intro in Coherence of Feature Set[
Release] - Problems Moved to Design[Release] + Late Feature Problems[ Release] , Features to
Design[Release] * Error Rate in Original Feature Selection[ Release] )
Units: Problem
Problems in Designed Features[Releasel = INTEG( Problems Moved to Design[Release ] +
Problems from Cancellation[Release] - Problem Cancellation[Release ] + Problems Introduced
in Design[Release] , Features Designed[ Release] * Error Rate in Original Feature
Selection[Release])
Units: Problem
Problems Introduced in Design[Release] = Design[Release] * Error Rate in Feature Design[
Release]
Units: Problem/Month
The rate of introduction of new problems depends on the design speed as well as the quality of
design.
Problems Moved to Design[Releasel = Design[Release] * Ave Problem in Features to Design[
Release]
Units: Problem/Month
Coherence problems carry forward into designed features.
Problems Per Feature Interaction = 0
Units: Problem/feature
Number of problems arising as a result of each feature interaction.
Relative Size of Expected Bug Fixing[Release] = XIDZ ( Size of Next Bug Fix, Features
Under Consideration * Normal Scope of New Release[Release ] * Modules for a feature, 3)
Units: Dmnl
This tells how big will the bug fixes be in the upcoming release, compared to the desired number
of features.
Release Start Date[Releasel = SW Endogenous Start * Endogenous Release Start Date[
Release] + ( I - SW Endogenous Start) * Exogenous Release Start Date[ Release]
Units: Month
Release start date can be exogenous or endogenous.
Sanctioned Design Complete[Releasel = INTEG( Des Time Change[Release] , 0)
176
Units: Month
This is the variable which drives time pressure to design stage.
Scope of New Release[Releasel = Normal Scope of New Release[Release] * Eff Bug Fixing on
Scope of New Release[ Release]
Units: Dmnl
The fraction of total features under consideration that we decide to include in this release.
SW Eff Bug Fix on Scope = 1
Units: Dmnl
Put 1 to adjust the scope of next release according to the level of bug fixing planned for it. Put 0
not to have this effect.
SW Eff Old Arch on Design Problems = I
Units: Dmnl
Put I to have the effect of old architectural and design problems on quality of current design
active.
SW Endogenous Start = 1
Units: Dmnl
Put 1 to have the starts of release endogenously determined, put 0 for exogenous starts.
Threshold for New Project Start = 0.6
Units: Dmnl
Threshod for starting a new release. As soon as the current release passes this threshold, the new
release will be started.
Time to Replan = 1
Units: Month
This is the time constant for adjustment of expectations about release date, code complete etc, to
the data showing that it is going up or down.
TI Eff Bug Fix on Scope ( [(0,0)-(4,1)],(O,1),(0.452599,0.692982),(1,0.4),(1.3945,0.232456)
,(2,0.1),(2.77676,0.0263158),(4,0))
Units: Dmnl
Effect of bug fixing scope on the scope of the next release
Tl Eff Old Arch on Design Problems ( [(0,0)-(1,2)],(0,1),(0.269113,1.16667)
,(0.40367,1.32456),(0.562691,1.60526),(0.691131,1.80702),(0.82263,1.9386) ,(1,2))
Units: Dmnl
Table for effect of architecture and code base quality on the quality of design
177
00
cr
*0
E
CL
0
a'
a'
O
10
Development
Allocated Design Time Left[Releasel = Estmtd Design Time Left[Release] * Prcnt Time
Allocated[ Release]
Units: Month
allocated design time left.
Ave Dfct Dnsty in Accptd Code[Release = ZIDZ ( Defects in Accepted Code[Release ], Code
Accepted[Release] )
Units: Defect/module
Defect density in the accepted code is an important quality measure for us.
Ave Dfct in Rwrk Code[Releasel = ZIDZ ( Defects in MRs[Release] , Code Waiting Rework[
Release] )
Units: Defect/module
The number of defects that can be fixed by fixing the MR.
Ave Dfct in Tstd Code[Releasel = Code Covered by Test * Dfct Dnsty in Untstd Code[
Release]
Units: Defect/Test
The average number of defects that one can expect to find in a test, given the average density of
defects.
Bug Fixes Designed[Releasel = Code to Fix Bugs[Release]
Units: module/Month
The bug fixes included for the upcoming release.
Code Accepted[Releasel = INTEG( Test Acceptance[Release] , 0)
Units: module
Code Cancellation[Release] = Code to Develop[Release] * Frctn Code Cancellation[ Release]
Units: module/Month
Under pressure we can cancel some of the codes that are designed to be written. This can be both
intentionally, or unintentional.
Code Covered by Test = 1
Units: module/Test This should be a function of test coverage
Code Developed[Release] = INTEG( Development[Release] + Rework[Release] - Test
Acceptance[ Release] - Test Rejection[Release] , 0)
Units: module
The amount of code developed, but yet not successfully tested.
179
Code Passing Test[Releasel = Totl Code to Tst[Release] * Frctn Tst Running[ Release]
Units: module/Month
The rate of code testing is assumed to be proportional to the amount of test that is performed out
of total tests to be performed. This assumption is OK for products which run only the related
fraction of the tests.
Code to Develop[Releasel = INTEG( Design Rate[Release] - Development[Release ] - Code
Cancellation[Release] + Bug Fixes Designed[Release ], 0)
Units: module
We start from no new code to develop, since no requirement is written at the beginning.
Code Waiting Rework[Releasel = INTEG( Test Rejection[Release] - Rework[Release ], 0)
Units: module
The total amount of code waiting rework.
Current Dev Schedule Pressure[Releasel = ZIDZ ( Estmtd Dev Time Left[Release ,Current],
Development Time Left[Release] )
Units: Dmnl
The schedule pressure depends on how long we have left for the development phase vs. how
long we estimate this phase to take. Note that this schedule pressure is based on observed
productivity, rather than the normal/base productivity. Therefore it is anchored on what is
culturally considered right productivity in the organization not an absolute level of productivity.
Therefore it should not be used for determining productivity levels.
Defects Accepted[Releasel = ( Tstd Code Accptd[Release] * Dfct Dnsty in Accepted Code[
Release] + ( Test Acceptance[Release] - Tstd Code Accptd[Release ] ) * Dfct Dnsty in Untstd
Code[Release])
Units: Defect/Month
The defects can be either in the part tested, or in the part not tested. The density of defect in the
untested part is same as average untested code, while the density in the tested part is calculated
based on Poisson distribution of defects.
Defects Fixed[Releasel = Rework[Release] * Ave Dfct in Rwrk Code[Release] * Frac Errors
Found in Rwrk
Units: Defect/Month
It is assumed that a fraction of the defects in the reworked code are removed, and a fraction
remain undiscovered.
Defects found[Release] = Test Rejection[Release] * Dfct Dnsty in Rejctd Code[ Release]
Units: Defect/Month
Defects are carried with the code, the higher the rejection rate and the density of defects in the
code rejected, the higher will be the defects that are found.
Defects from Dev[Releasel = Development[Release] * Error Rate New Dev[Release ]
Units: Defect/Month
180
The defects that come from development depend on the error rate and the rate of developement
we do.
Defects Generated[Releasel = Defects from Dev[Release] + Dfct from Rwrk[Release ]
Units: Defect/Month
The total defect generated is a function of both defects in the new code as well as in the old code.
Defects in Accepted Code[Releasel = INTEG( Defects Accepted[Release] , 0)
Units: Defect
The number of defects in the accepted code.
Defects in Code Developed[Releasel = INTEG( Defects Generated[Release] - Defects
Accepted[ Release] - Defects found[Release] + Dfcts Left in RWrk[Release ], 0)
Units: Defect
The defects from the code of last release are carried over to the current release.
Defects in MRs[Release] = INTEG( Defects found[Release] - Defects Fixed[Release ] - Dfcts
Left in RWrk[Release] , 0)
Units: Defect
We find defects through testing and deplete them by either submitting reworked code that has
defects or by fixing them.
Design Rate[Release] = Design[Release] * Modules for a feature
Units: module/Month
The total number of modules to be developed depend on the modules in one feature, and the total
number of features to be developed.
Desired Development Rate[Release] = Code to Develop[Release] / Development Time Left[
Release]
Units: module/Month
The development rate needed to finish the coding on time for code complete
Dev Reschedule[Releasel = if then else (Allocated Dev Finish Date[Release] > Time :AND:
Sanctioned Code Complete[Release] = 0, Allocated Dev Finish Date[ Release] / TIME STEP, if
then else (Allocated Dev Finish Date[ Release] > Time, 1, 0) * ( Allocated Dev Finish
Date[Release ] - Sanctioned Code Complete[Release] ) / Time to Replan)
Units: Dmnl
The (re)scheduling of the code complete date.
Dev Resources to New Dev[Releasel = Min ( ZIDZ ( Dsrd New Dev Resources[Release ],
Dsrd Dev Resources[Release] ) * Development Resources to Current Release[ Release] , Dsrd
New Dev Resources[Release] )
Units: Person
The resources between rework and new development are distributed based on how much need
there is for each, proportionately.
181
Dev Resources to Rework[Release] = Min ( ZIDZ ( Dsrd Rwrk Resources[Release ] , Dsrd Dev
Resources[Release] ) * Development Resources to Current Release[ Release] , Dsrd Rwrk
Resources[Release] )
Units: Person
The resources between rework and new development are distributed based on how much need
there is for each, proportionately.
Development[Releasel = Feasible New Dev Rate[Release]
Units: module/Month
The development rate is based on resources allocated for development.
Development Time Left[Releasel = Max ( Min Dev Time, Sanctioned Code Complete[
Release] - Allocated Design Time Left[Release] * ( 1 - "Frctn Des-Dev Overlap" ) - Time)
Units: Month
If we run over the code scheduled date, we can't push for faster than a minimum finish time for
development.
Dfct Dnsty in Accepted Code[ReleaseJ = EXP ( - Tst Effectiveness * Ave Dfct in Tstd Code[
Release] ) * Ave Dfct in Tstd Code[Release] * ( 1 - Tst Effectiveness)
Units: I)efect/module
The defect density in accepted code is based on Poisson distribution of errors across code,
leading to a chance of not discovering any defects for each test that has n errors, therefore we can
calculate the expected error per accepted test.
Dfct Dnsty in Rejctd Code[Releasel = Max ( 0, ZIDZ ( Totl Code Tstd[Release] * ( Dfct
Dnsty in Untstd Code[Release] - Dfct Dnsty in Accepted Code[ Release] ) + Test
Rejection[Release] * Dfct Dnsty in Accepted Code[ Release] , Test Rejection[Release] ) )
Units: Defect/module
We can't change the density of error in the untested code based by simply testing it. Therefore
the density of defect in the exiting code (accepted or rejected) should be the same as that in the
stock of code. This equation is based on the free mixing of errors in the stocks of code.
Dfct Dnsty in Untstd Code[Releasel = ZIDZ ( Defects in Code Developed[Release ], Code
Developed[Release])
Units: Defect/module
The density of Defects in the code developed but not tested.
Dfct from Rwrk[Releasel = Rework[Release] * New Error Rate in Rwrk[Release]
Units: Defect/Month
the rate of errors created by rework depend on the error rate and the amount of rework we do.
Dfcts Left in RWrk[Releasel = Rework[Release] * Ave Dfct in Rwrk Code[Release ] * ( I Frac Errors Found in Rwrk )
Units: Defect/Month
A fraction of errors are not corrected in each iteration of rework, since not all defects are caught
by testing at the first place.
182
Dsrd Dev ResourceslReleasel = Dsrd New Dev Resources[Release] + Dsrd Rwrk Resources[
Release]
Units: Person
Desired development resources for new code and rework.
Dsrd New Dev Resources[Releasel = Max ( Desired Development Rate[Release] / Normal
Productivity[Release] , 0)
Units: Person
Desired new development rate depends on the productivity in normal situations and the rate of
development desired.
Dsrd Rwrk Rate[Release] = Code Waiting Rework[Release] / Dsrd Time to Do Rwrk
Units: module/Month
The desired rework rate depends on the amount of rework in backlog and the how fast we want
to do that rework.
Dsrd Rwrk Resources[Releasel = Max ( 0, Dsrd Rwrk Rate[Release] / Normal Productivity[
Release] )
Units: Person
The desired resources for rework depend on the normal productivity of people and the desired
rework rate.
Dsrd Time to Do Rwrk = 0.25
Units: Month
We want to do our rework in about a week.
Eff Archtcr Complexity on Productivity = Max ( 1, Accmltd Code Accepted / Minimum Code
Size With an Effect ) A ( Strength of Complexity on Productivity Effect * Density of Problems in
Architecture )
Units: Dmnl
The effect of code complexity on productivity not only depends on size of the code base, but also
on the quality of the architecture. A power law is assumed for the shape of function.
"Eff Old Code & Archtctr on Error" [Release] = "TI Eff Old Code & Archtctr on Error"
( Weighted Error Density in Architecture and Code Base ) * SW Eff Old Code and Archtctr on
Error + 1 - SW Eff Old Code and Archtctr on Error
Units: Dmnl
Effect of old code and architecture on the quality of development in the current release depends
on the overall quality of old code base and architecture.
Eff Pressure on Error Rate[Releasel = Tl Eff Pressure on Error Rate ( Natural Dev Schedule
Pressure[ Release] ) * SW Eff Pressure on Error + I - SW Eff Pressure on Error
Units: Dmnl
Effect of work pressure on error rate is created because under work pressure people take short
cuts, skip unit testing, don't study the requirements well enough, don't do code reviews, don't
183
develop as good a design based on given requirements etc. This effect is not only on quality of
coding, but also exacerbates the problems with quality that result from poor design and
architectural issues.
Eff Pressure on Productivity[Releasel = TI Eff Pressure on Productivity ( Natural Dev
Schedule Pressure[Release] ) * SW Eff Pressure on Productivity + I - SW Eff Pressure on
Productivity
Units: Dmnl
The effect of schedule pressure on productivity is captured by a table function.
Error per Design Problem in Features = 10
Units: Defect/Problem How many coding defects each design problem can create.
Error Rate New Dev[Release] = Errors from Coding[Release] + Errors from Design and
Architecture[ Release]
Units: I)efect/module
The number of defects found in a module, in average.
Errors from Coding[Releasel = Normal Errors from Coding * Eff Pressure on Error Rate[
Release] * "Eff Old Code & Archtctr on Error"[Release]
Units: Defect/module
The error rate depends on the inherent quality of the workforce (Normal errors) as well as the
work pressure they are working under and the quality of the code and architecture from old
releases.
Errors from Design and Architecture[Release] = Ave Prblm in Designed Feature[
Release] * Error per Design Problem in Features / Modules for a feature * Eff Pressure on Error
Rate[Release]
Units: Defect/module
Errors from design and architecture have a base rate which depend on quality of architecture, but
can get worse depending on the work pressure.
Feasible New Dev Rate[Releasel = Dev Resources to New Dev[Release] * Productivity[
Release]
Units: module/Month
The development rate that is feasible based on availability of resources for development of new
code.
Feasible Rwrk Rate[Release] = Dev Resources to Rework[Release] * Productivity[ Release]
Units: module/Month
Feasible rework rate from resources depends on the availability of resources as well as their
productivity.
Frac Errors Found in Rwrk = 0.7
Units: Dmnl
The defects per module for reworked code.
184
Frctn Code Cancellation[Release] = TI Eff Schdul Prssr on Cancellation ( Current Dev
Schedule Pressure[ Release]) * Maximum Cancellation Fraction
Units: /Month
The cancellation fraction is a funciton of schedule pressure and can not exceed some maximum
level.
"Frctn Des-Dev Overlap" = 0.5
Units: Dmnl
The fraction of overlap between design and development phases is assumed constant, while it
really is not. The impact is negligible due to short design phase.
Maximum Cancellation Fraction = 0.5
Units: 1/Month
There is a limit on the cancellation of features feasible.
Min Dev Time = 0.25
Units: Month
The minimum time for development of the remaining code, used when we go beyond schedule.
Minimum Code Size With an Effect = 10000
Units: module
The minimum size of code where effect of complexity is observable.
Modules for a feature = 10
Units: module/feature
Number of modules needed for a new feature.
Natural Dev Schedule Pressure[Release] = ZIDZ ( Estmtd Dev Time Left[Release ,Normal],
Development Time Left[Release] )
Units: Dmnl
The natural schedule pressure is felt based on what one sees to be the pressure relative to natural
productivity.
New Error Rate in Rwrk[Release] = Normal Error Rate in Rwrk * Eff Pressure on Error Rate[
Release]
Units: Defect/module
The error rate for creating new errors in the reworked code. It depends on different factors
including the work pressure.
Normal Error Rate in Rwrk = 0.5
Units: Defect/module
The normal error rate in rework phase.
Normal Errors from Coding = 0.5
Units: Defect/module
185
Average number of defects in coding that are created in each module, if the schedule pressure is
very low.
Normal Productivity[Release] = 20
Units: module/(Month*Person) [10,30] Normal productivity represents the work rate of
an average worker, if s/he follows good processes and does not take any shortcuts.
Productivity[Release] = Normal Productivity[Release] * Eff Pressure on Productivity[ Release]
* ( I / EffArchtcr Complexity on Productivity * SW EffArchtcr Complexity on Productivity + 1
- SW Eff Archtcr Complexity on Productivity)
Units: module/(Month*Person)
The productivity depends on the work pressure and code complexity effects.
Rework[Releasel = Feasible Rwrk Rate[Release]
Units: module/Month
Rework rate depends on feasible rate given by the current resource availability.
Sanctioned Code Complete[Releasel = INTEG( Dev Reschedule[Release], 0)
Units: Month
This is the time people think is OK to have the code complete, based on the changing pressures
in the field. It is not clear if the code complete is so well set.
Strength of Complexity on Productivity Effect = 1
Units: feature/Problem
The exponent for the power law governing the effect of complexity on productivity
SW Eff Archtcr Complexity on Productivity = 0
Units: Dmnl
Put I to get the effect of code complexity and architecture quality on productivity, put 0 to avoid
that.
SW Eff Old Code and Archtctr on Error = 1
Units: Dmnl
Put 1 to have the effect of old code and architecture on error rate on, put zero to remove that
effect.
SW Eff Pressure on Error = 1
Units: Dmnl
Put I to get the feedback from schedule pressure on error rate. Put 0 to get no such feedback.
SW Eff Pressure on Productivity = I
Units: Dmnl
Put I to have the effect of work pressure on productivity on, put 0 to switch that effect off.
Test Acceptance[Releasel = Code Passing Test[Release] - Test Rejection[Release ]
Units: module/Month
186
The code accepted in testing is what is not rejected, which in turn depends on the coverage as
well as total test done.
Test Rejection[Releasel = Totl Code Tstd[Release] * Tst Rjctn Frac[Release]
Units: module/Month
Testing rejects in a fraction of cases, and therefore creates proportional amount of rework.
"TI Eff Old Code & Archtctr on Error" ( [(0,0)-(1,3)],(0,1),(0.16208,1.17105)
,(0.333333,1.68421),(0.480122,2.31579),(0.663609,2.75),(0.844037,2.92105) ,(1,3))
Units: Dmnl
The shape of this table funciton might be not S-shape since the worse the old architecture is, the
more problematic new coding will be, yet there should be some saturation level, hence the
current formulation.\!\!
Ti Eff Pressure on Error Rate ( [(0,0)-(4,4)],(0, I),(1,1),(1.27217,1.22807)
,(1.48012,1.5614),(1.71254,2),(2.01835,2.36842),(2.42202,2.72368) ,(3,3),(10,4))
Units: Dmnl
Table for effect of time pressure on error rate.
Ti Eff Pressure on Productivity ( [(0,0)-(3,2)],(0,0.8),(0.623853,0.885965)
,(1,1),(1.27523,1.16667),(1.54128,1.38596),(1.85321,1.54386),(2.37615,1.65789) ,(3,1.7),(10,2)
)
Units: Dmnl
Table for the effect of time pressure on productivity.
Ti Eff Schdul Prssr on Cancellation ( [(0,0)-(10, 1)],(0,0),(1,0),(2,0),(10,0) )
Units: Dmnl
This table represents the reaction of the development group by cancelling features to the
schedule pressure they perceive.
Total Development[Releasel = Development[Release] + Rework[Release]
Units: module/Month
Total development consists of new development and the rework for the current release.
Totl Code to Tst[Release] = Code Developed[Release] + Code to Develop[Release ] + Code
Waiting Rework[Release]
Units: module
The total code that needs testing include those pieces developed, those under rework, and the
parts that need to be developed.
Totl Code Tstd[Releasel = Code Covered by Test * Test Rate[Release]
Units: module/Month
the total amount of code that is really tested, depends on the number of tests and the coverage of
each test.
Tst Effectiveness = 0.9
187
Units: Dmnl
The effectivness of tests in finding bugs, if they are in the area that is tested.
Tst RjectnFrac[ReleaseJ = I - EXP ( - Tst Effectiveness * Ave Dfct in Tstd Code[ Release] )
Units: Dmnl
The fraction of tests that reject are calculated based on the assumption that defects are randomly
distributed in the whole code base. Therefore the number of defects per test is a Poisson
distribution with mean of'Ave Dfct in Tstd Code'. Assuming an independent chance of'Tst
Effectiveness' for finding each of those errors, we can find the chance of being rejected by
finding the summation of that chance over all number of errors (O to infinity).
Tstd Code Accptd[Release] = Test Rate[Release] * Code Covered by Test * ( I - Tst Rjctn
Frac[Release])
Units: module/Month
The portion of tested code that gets accepted, depends on test coverage and the fraction of
acceptance.
188
Cm
a)
ACU
a)
FQo
H
1=
a)
,
m
-oa)u
LL
C
C
00
C,
I
-I
4)
I
A
AO
I
a) a,
00
\
-cA
-C
O
°
I\
C
v\
<
T
o A\
a)a )
0
-a
V
'I
O
E
.o_
_
L)
Test
********************************
Demand for Test Resources[Releasel = Tst Rate Feasible[Release] / Test Resource
Productivity
Units: Person
The total demand for test resources across ongoing releases.
Frac Total Tst Feasible[Releasel = TI Tst Dependency on Dev[Release] ( Fraction Code
Written[ Release] )
Units: Dmnl
This is the fraction of total tests that are ready to be executed. We can't run tests without having
them ready, because of interdependencies etc. Blocking issues are included in this table function.
Fraction Code Accepted[Release] = ZIDZ ( Code Accepted[Release] , Total Code[ Release] )
Units: Dmnl
The fraction of the total code that is done and tested.
Fraction Code Written[Releasel = ZIDZ (Code Accepted[Release] + Code Developed[
Release] , Total Code[Release])
Units: I)mnl
Fraction of code that is written and either tested or ready to test. This determines the fraction of
tests one can do. Note that the inputs to this do not include any delays, which assumes that the
delays are relatively small for test to know where the code is.
Frctn Tst Running[Releasel = ZIDZ ( Test Rate[Release] , Tests to Run[Release ] )
Units: /Month
The fraction of total tests that is being run at this time
Min Tst Time = 0.1
Units: Month
The minimum time needed to run a test.
Retest Required[Releasel = Test Rate[Release] * Tst Rjctn Frac[Release]
Units: Test/Month
The tests that are rejected need to be redone, after the code is revisited.
Test Coverage = 0.5
Units: Dmnl
How much of the code the test set to be designed covers
Test Design[Release] = ( Design[Release] * Modules for a feature + Bug Fixes Designed[
Release] ) / Code Covered by Test * Test Coverage
Units: Test/Month
190
The tests are assumed to be designed as we design the features or bug fixes. So test design
depends on how fast we develop the requirements, what is our test coverage, how big is each
requirement, and how much code each test covers.
Test Feasible[Release] = Max ( 0, Frac Total Tst Feasible[Release] * Total Tests to Cover[
Release] - ( Total Tests to Cover[Release] - Tests to Run[Release ] ) )
Units: Test
The feasible tests are those that are not already done and are feasible based on the precedence
relationships.
Test Ran[Releasel = INTEG( Test Rate[Release] - Retest Required[Release] , 0)
Units: Test
The number of tests that are already done and have passed. In coflow with code, it parallels the
code accpeted stock.
Test Rate[Releasel = Tst Rate from Resources[Release]
Units: Test/Month
The test rate depends on both availability of resources that are requested based on feasibility of
tests considering the current development leve
Test Resource Productivity = 50
Units: Test/(Month*Person)
The productivity of test personnel for doing tests.
Test Resources Allocated[Releasel = Min ( Demand for Test Resources[Release ], ZIDZ (
Demand for Test Resources[Release] * Testing Resources, SUM ( Demand for Test
Resources[Release!]
) ))
Units: Person
The test resources are given to the release based on availability of those resources and
proportional to requests.
Testing Resources = 20
Units: Person
The total number of testing resources available. Assumed to be fixed here.
Tests to Run[Release] = INTEG( Retest Required[Release] + Test Design[Release ] - Test
Rate[Release],
0)
Units: Test
Note that in a coflow of tests with code, this stock represents all the code that is not yet
administered, i.e. code to develop, developed, or waiting rework.
TI Tst Dependency on Dev[Release] ( [(0,0)-(1,1)],(0,0),(0.9,0),(0.95,0.1), (1,1),(1.1,1))
Units: Dmnl
The shape of the curve suggests what fraction of our tests we can do based on the current level of
development.
191
Total Code[Releasel = Code Accepted[Release] + Totl Code to Tst[Release]
Units: module
The total code
Total Tests to Cover[Release] = Test Ran[Release] + Tests to Run[Release]
Units: Test
Total number of tests that are designed and can be run.
Tst Rate Feasible[Release] = Test Feasible[Release] / Min Tst Time
Units: Test/Month
The maximum feasible testing rate depend on what amount of tests we can potentially administer
and how long each test takes an minimum.
Tst Rate from Resources[Release] = Test Resources Allocated[Release] * Test Resource
Productivity
Units: Test/Month
The resources for testing can put an upper bound on the amount of testing we can do.
192
3
o
r a.
n .
a
El
el)
LY
cr
l
********************************
Defects in the Field
********************************
Ave Def with Patch[Releasel = ZIDZ ( Defects with Patch[Release] , Sites Installed[ Release] )
Units: Defect
The average number of defects in sites for which we have already developed a patch
Ave Known Def in Site[Release] = ZIDZ ( Known Defects in Field[Release] , Sites Installed[
Release] )
Units: Defect
The number of defects known in a typical site.
Ave Unknown Def in Site[Releasel = ZIDZ ( Unknown Defects in Field[Release] , Sites
Installed[Release])
Units: Defect
The average number of unknown defects that is in use in a typical site
Average Defects Patched[Release] = ZIDZ ( Defects Patched[Release] , Sites Installed[
Release] )
Units: Defect
The average number of defects patched in a typical site.
Average Outstanding Defects = Total Ave Defect - SUM ( Average Defects Patched[ Release!]
)
Units: I)efect
This is the number of defects outstanding in the sites, an overall measure of quality
Average Time in the Field = 30
Units: Month
The product remains in the field for about 50 month
Calls from Defects With Patch[Releasel = Defects with Patch[Release] / Time to Show up for
Def with Patch
Units: Defect*Site/Month
Number of calls coming from defects for which a patch exist
Calls from Known Defects[Release] = Known Defects in Field[Release] / Time to Show Up for
Defects
Units: Defect*Site/Month
The number of defects that show up again as customer calls, even though we know about them.
Calls from Unknown Defects[Release] = Unknown Defects in Field[Release] / Time to Show
Up for Defects
Units: Defect*Site/Month
194
The number of MRs created by real, unknown problems in the customer site.
Defect Covering by Patch[Release] = Patch Development[Release] * Defect per Patch
Units: Defect/Month
The total number of defects that are covered by these patches
Defect not Patching[Release] = Defect Covering by Patch[Release] * Sites Installed[ Release] *
( 1 - Fraction Sites Installing )
Units: Defect*Site/Month
The number of defects for which we have developed patches, yet people have not installed those
patches.
Defect Patching[Release] = Defect Covering by Patch[Release] * Sites Installed[ Release] *
Fraction Sites Installing
Units: Defect*Site/Month
The number of defects that are patched in different installation of the software in the field.
Defect Show up[Releasel = Calls from Unknown Defects[Release] * Sites Installed[ Release] /
Sites Reporting One Defect[Release]
Units: Defect*Site/Month
Every defect that is found becomes known and therefore is known for all the sites installed.
Defects Patched[Release] = INTEG( Defect Patching[Release] + Installing Known Patches[
Release] - Patches Retired[Release] + Patched Defects Sold[ Release] , 0)
Units: Defect*Site
I assume there is no patches at the beginning of simulation, therefore there is no defect that is
covered by patches
Defects Retired[Release] = Products Retired[Release] * Ave Unknown Def in Site[ Release]
Units: Defect*Site/Month
As products are retired, some of the defects that have remained unknown also retire with the
solution.
Defects with Patch[Releasel = INTEG( Defect not Patching[Release] - Defects with Patch
Retired[ Release] - Installing Known Patches[Release] + Unpatched Defects Sold with Patch[
Release],
0)
Units: Defect*Site
I assume there is no patches at the beginning of simulation, therefore there is no defect that is
covered by patches
Defects with Patch Retired[Release] = Products Retired[Release] * Ave Def with Patch[
Release]
Units: Defect*Site/Month
Defects with developed patches are retired as installations are retired
195
Fraction Defect Unknown[ReleaseJ = XIDZ ( Unknown Defects in Field[Release] , Total
Defects in Field[Release],
1)
Units: Dmnl
Fraction of defects which are unknown.
Fraction of Defects Important = 0.15
Units: Dmnl
The fraction of total defects in the code shipped that can lead to customer calls and quality
issues.
Fraction of sites Installing All patches for a Call = 0.1
Units: /Defect
Each defect that escalates results in the site installing the available patches or not. This number
indicates what fraction of sites (or patches) are installed following each call.
Fraction Patches Installed At Sales[Release] = 0.8
Units: Dmnl
What fraction of already known problems patched, are fixed at the installation time.
Fraction Sites Installing = 0.5
Units: Dmnl
What fraction of sites we can persuade to install the patchs at the time they are out.
Installing Known Patches[Release] = Total Customer Calls from Defects[Release
] * Fraction of' sites Installing All patches for a Call * Ave Def with Patch[ Release]
Units: Defect*Site/Month
Any customer call that escalates results in installing the available patches on the system,
therefore removing the possible issues created by those patches
Known Defects in Field[Release] = INTEG( Defect Show up[Release] - Defect not Patching[
Release] - Defect Patching[Release] - Known Defects Retired[ Release] + Known Defects
Sold[Release] , Defect Show up[Release ] * Time to Show Up for Defects)
Units: Defect*Site
The total number of known defects, coming from unknown defects discovered or known defects
sold.
Known Defects Retired[Release] = Products Retired[Release] * Ave Known Def in Site[
Release]
Units: Defect*Site/Month
Known defects are retired as we retire the systems installed.
Known Defects Sold[Release] = Total Defects Introduced[Release] * XIDZ ( Known Defects in
Field[ Release], Total Defects in Field[Release], 0)
Units: Defect*Site/Month
The total number of known defects introduced since we known some defects exist when we are
introducing the product.
196
Patched Defects Sold[Release] = Total Defects Introduced[Release] * ZIDZ ( Defects
Patched[Release] + Defects with Patch[Release], Total Defects in Field[ Release] ) * Fraction
Patches Installed At Sales[Release]
Units: Defect*Site/Month
Of total defects for which we have already made a patch, a fraction will be sold with the patches
installed.
Patches Retired[Releasel = Products Retired[Release] * Average Defects Patched[ Release]
Units: Defect*Site/Month
Defects we have fixed through patches also retire with retired installments.
Products Retired[Releasel = Sites Installed[Release] / Average Time in the Field
Units: Site/Month
The rate of retiring the systems depends on the number of systems we have installed and how
long they last in the field.
Sites Installed[Releasel = INTEG( Products Sold[Release] - Products Retired[ Release],
Products Sold[Release] * Average Time in the Field)
Units: Site/Month
The total number of sites with a version of the product installed in their site.
Sites Reporting One Defect[Releasel = 1
Units: Site/Month
The number of sites that report a single bug
Time to Show up for Def with Patch = 10
Units: Month
The defects for which patches are developed but not installed usually get longer to trigger any
problem.
Time to Show Up for Defects = 50
Units: Month
The time it takes for an average defect to show up
Total Ave Defect = SUM ( Ave Def with Patch[Release!] + Ave Known Def in Site[ Release!] +
Ave Unknown Def in Site[Release!] + Average Defects Patched[ Release!] )
Units: Defect
Total number of average defects, distributed between different stocks.
Total Defects in Field[Release] = Defects Patched[Release] + Defects with Patch[ Release] +
Known Defects in Field[Release] + Unknown Defects in Field[ Release]
Units: Defect*Site
The total number of patched or unpatched defects in the field
197
Total Defects Introduced[Releasel = Products Sold[Release] * ( Defects in Accepted Code[
Release] + Defects in Code Developed[Release] + Defects in MRs[Release ] ) * Fraction of
Defects Important
Units: Defect*Site/Month
The observable defects introduced depend on the rate of sales, the average number of defects in
an installation, and the fraction of those defects that can create observable issues for the
customer.
Unknown Defects in Field[Releasel = INTEG( Unknown Defects Introduced[Release ] - Defect
Show up[Release] - Defects Retired[Release], 0)
Units: Defect*Site
The total number of important defects over all different sites.
Unknown Defects Introduced[Releasel = Total Defects Introduced[Release] * Fraction Defect
Unknown[ Release]
Units: Defect*Site/Month
Unknown defects get introduced since several of the defects are unknown yet!
Unpatched Defects Sold with Patch[Releasel = Total Defects Introduced[Release
] * ZIDZ ( ( Defects with Patch[Release] + Defects Patched[Release] ) * ( 1 - Fraction Patches
Installed At Sales[Release] ) , Total Defects in Field[Release])
Units: Defect*Site/Month
A fraction of total patches are installed at the time of sales to new sites.
198
I
M
9
II
,a-
-62
00
L
i
!I
£i
,;-,I
r1
X
-
o
r
a77va
!°S / l?
8,
biO
%o
im
'.=sa\ I.I
80
tL
\
~~
-
I
75
EE·
U
<C*)..
,~
~~
~
~
o~,
v;
~
~
.
i V
~'F,i
ra
;
, .
eL ra,
.
S
~~-
,
pD
·i'~
********************************
Patches and Current Engineering
Accumulated Customer Calls = INTEG( Inc Customer Calls, 0)
Units: Defect*Site
The accumulated number of customer calls from all the releases until this point in time.
Accumulated Total Work = INTEG( Development and CE Rate, 0)
Units: module
The total accumulated work done on the product, both on development and patching.
Allctd Ttl Dev Resource = Min ( Ttl Des Resource to CR, XIDZ ( Ttl Des Resource to CR *
Total Dev Resources, Total Desired Resources, Total Dev Resources ) )
Units: Person
We allocate the resources between current engineering and new release proportionately
Alloctd Ttl CE Resource = Min ( Ttl Des Resource to CE , ZIDZ ( Ttl Des Resource to CE *
Total Dev Resources, Total Desired Resources ) )
Units: Person
Resources are allocated proportional to request, unless we have more resource than needed.
CR Dev Resources Left After Allocation[R1] = Max ( Allctd Ttl Dev Resource - Dsrd Dev
Resources[RI],
0)
CR Dev Resources Left After Allocation[Later Releases] = Max ( 0, VECTOR ELM MAP ( CR
Dev Resources Left After Allocation[RI] , Later Releases - 2) - Dsrd Dev Resources[Later
Releases] )
Units: Person
The amount of resources left for new development resources given to different future releases,
after allocating the resources to all the releases before that. This is based on the allocation
scheme where we give all the resources requested by the most mature release, before going to the
next.
Desired Patch Rate[Release] = Patch to Develop[Release] / Time to Patch
Units: Patch/Month
Desired speed of patching.
Desired Resources to Current Engineering[Release] = Desired Patch Rate[Release
] / Productivity in Writing Patch[Release]
Units: Person
Desired resources for current engineering.
Desired Total Dev Resources = 50
Units: Person
200
This is the total maximum resources desired to be given to this product for development and
current engineering. In fact it should be tied to the amount of work expected for the product, not
a stand-alone concept.
Development and CE Rate = SUM (Total Development[Release!] + Patch Development[
Release!] * Modules Equivalent to Patch )
Units: module/Month
The rate of development work done on the product in total (over all releases)
Development Resource to Current Engineering[Release] = Desired Resources to Current
Engineering[
Release] * Frac CE Allctd
Units: Person
The development resources allocated to developing patches across different releases are
proportional to their requests.
Development Resources to Current Release[Later Releases] = Dsrd Dev Resources[
Later Releases] * Frac CR Allctd * ( I - SW Higher Priority to Current) + SW Higher Priority
to Current * if then else ( CR Dev Resources Left After Allocation[ Later Releases] = 0,
VECTOR ELM MAP ( CR Dev Resources Left After Allocation[ RI] , Later Releases - 2),
Dsrd Dev Resources[Later Releases ] )
Development Resources to Current Release[R1] = Dsrd Dev Resources[R1] * Frac CR Allctd * (
1 - SW Higher Priority to Current ) + SW Higher Priority to Current * if then else ( CR Dev
Resources Left After Allocation[R1] = 0, Allctd Ttl Dev Resource, Dsrd Dev Resources[R1] )
Units: Person
We allocate the resources to each release proportional to their requests for resources.
Alternatively one can allocate resources to the sooner release in the expense of the older one.
Both allocations are possible here and one can choose between them using the switch.
Eff Profitability on Resources Allocated = ( Tl Eff Profit on Resources ( Perceived
Profitability) * Min (Time Since Release Ready[R1] / "Honey-Moon Period", 1 ) + 1 - Min (
Time Since Release Ready[RI] / "Honey-Moon Period", 1)) * SW EffProft on Resources + 1 SW Eff Proft on Resources
Units: Dmnl
We gradually strengthen the tie between profitability and resources allocated, as the project gets
more mature (passes through honeymoon period).
Frac CE Allctd = ZIDZ ( Alloctd Ttl CE Resource, Ttl Des Resource to CE) * ( 1 - SW
Exogenous CE Resources ) + SW Exogenous CE Resources
Units: Dmnl
Fraction of CE resource request that is allocated
Frac CR Allctd = XIDZ ( Allctd Ttl Dev Resource, Ttl Des Resource to CR, 1)
Units: Dmnl
The fraction of total resources for current development that is allocated.
201
Fraction of Calls Indicating New Defects[Releasel = ZIDZ ( Calls from Unknown Defects[
Release] , Total Customer Calls from Defects[Release] )
Units: Dmnl
Fraction of calls that indicate a new defect.
Fraction of Desired CR Allocated[Releasel = XIDZ ( Development Resources to Current
Release[ Release], Dsrd Dev Resources[Release], I)
Units: Dmnl
What fraction of desired resources for current development of each release is allocated to them.
"Honey-Moon Period" = 20
Units: Month
The period during which the effect of profitability on resources is not strong
Inc Customer Calls = SUM ( Total Customer Calls from Defects[Release!] )
Units: Defect*Site/Month
Increase in the number of customer calls.
Patch Developed[Releasel = ( Ave Def with Patch[Release] + Average Defects Patched[
Release] ) / Defect per Patch
Units: Patch
The total number of patches developed so far for different releases of the product
Patch Development[Releasel = Development Resource to Current Engineering[Release
] * Productivity in Writing Patch[Release]
Units: Patch/Month
The rate of developing patches depends both on the productivity of developers in writing patches
and on quantity of developers devoted to this task.
Patch to Develop[Release] = Ave Known Def in Site[Release] / Defect per Patch
Units: Patch
The stock of patches pending development
Perceived Profitability = INTEG( ( Profitability - Perceived Profitability ) / Time to Perceive
Profitability,
0)
Units: Dmnl
Perceived profitibility fraction.
Productivity in Writing Patch[Releasel = Normal Productivity[Release] / Modules Equivalent
to Patch
Units: Patch/(Month*Person)
The number of patches an individual can write in a month. The productivity should be connected
to what the productivity effects are for development, however, effect of work pressure is not so
high since patches are written in a fairly high pressure environment anyway (customers are
waiting) and effect of code complexity is not so big since people try to do the patches with the
least interaction. It is red to make sure that other possible feedbacks are included in it.
202
SW Eff Proft on Resources = 0
Units: Dmnl
Put 1 to have the effect of profitability on resources allocated, put 0 to cut this feedback.
SW Exogenous CE Resources = 0
Units: Dmnl
Put 0 to get normal allocation to CE, put I to get exogenous allocation to CE, where all the
resources needed are given to CE, without interfering with the resources for current development'
SW Higher Priority to Current = I
Units: Dmnl
Put I so that allocation of resources for development of different releases gives categorically
higher priority to more eminent ones, put 0 for proportional distribution of resources between
different releases.
Time Since Release Ready[Release] = INTEG( "Release Ready?"[Release] , 0)
Units: Month
The number of month since each release is ready.
Time to Adjust Resources = 10
Units: Month
Time to adjust the resources to what is dictated by profitability concerns.
Time to Patch = 5
Units: Month
The desired time to write the patches for the outstanding defects
Time to Perceive Profitability = 10
Units: Month
Time constant for perceiving profitability.
T1 Eff Profit on Resources ( [(-1,0)-(1,1.3)],(-1,0.5),(-0.425076,0.587281) ,(0.125382,0.695614),(0,0.8),(0.1, 1),(0.174312,1.06623),(0.29052,1.10614)
,(0.535168,1.15746),(1,1.2))
Units: Dmnl
Table for effect of profitability on resources given to the project.
Total Allctd Resources = Allctd Ttl Dev Resource + Alloctd Ttl CE Resource
Units: Person
Total resources allocated for development and current engineering.
Total Customer Calls from Defects[Releasel = Calls from Defects With Patch[
Release] + Calls from Known Defects[Release] + Calls from Unknown Defects[ Release]
Units: Defect*Site/Month
Total customer calls coming from defects in the product
203
Total Desired Resources = Ttl Des Resource to CR + Ttl Des Resource to CE
Units: Person
The total development and CE resources requested
Total Dev Resources = INTEG( ( Desired Total Dev Resources * Eff Profitability on Resources
Allocated - Total Dev Resources ) / Time to Adjust Resources, Desired Total Dev Resources)
Units: Person
The resources allocated depend to the profitability of the product to some extent.
Ttl Des Resource to CR = SUM ( Dsrd Dev Resources[Release!] )
Units: Person
Total people requested for the development tasks at hand for the currently under development
releases
Ttl Des Resource to CE = SUM ( Desired Resources to Current Engineering[Release! ] ) * ( I SW Exogenous CE Resources)
Units: Person
Total people requested for the development tasks at hand for the current engineering of different
releases.
204
O
-_
0
m
a)
a~
x
Q
X
a x
ZL
p3
.°_ m
en
tn
U
r-4
m
._
LL
LLM
m
co
C
1")
LL
i
U)
Fixing Bugs
********************************
Bug Fix Approval[Release] = SUM ( Desired Bug Fixes for Next Release[Release! ] ) * "Ok to
Plan?"[Release] / TIME STEP
Units: Defect/Month
The bug fixes planned are assumed to be decided upon at the OK to plan. We assume that we
take bug fixes proportionately from all the different releases.
Bug Fixes Planned[Releasel = INTEG( Bug Fix Approval[Release] , 0)
Units: Defect
Number of bug fixes planned in each release.
Bugs to Fix in Future Releases[Releasel = INTEG( Increase in Known Bugs[Release ] - ZIDZ
( Bugs to Fix in Future Releases[Release] , SUM ( Bugs to Fix in Future Releases[ Release!] ) *
SUM ( Bug Fix Approval[Release!] ) - Cancel Bug Fixes[Release], 0)
Units: Defect
This is parallel to stock of open MRs from all different releases. The current formulation does
not reflect how quality of future releases for customers is increased by doing bug fixes in this
release. That is usually one of the important motivations for having bug fixes in a release.
Cancel Bug Fixes[Releasel = Bugs to Fix in Future Releases[Release] / Time to Cancel Bug
Fixes *( 1 - ZIDZ ( Products Sold[Release] , SUM ( Products Sold[Release! ] ) ) )
Units: Defect/Month
We cancel bug fixes based on the importance of the product in the market. If a product is selling
well, we will put more emphasis on fixing its bugs, than when a release is not selling so well (in
which case we start draining the bug fixes for that release).
Code to Fix Bugs[Releasel = Bug Fix Approval[Release] / Defect per Patch * Modules
Equivalent to Patch * "Multiplier Fixing work vs. Patch"
Units: module/Month
The number of modules of code needed to be written as a result of bug fixes planned is
determined by looking at the number of patches and the adjusting the scope to account for the
fact that bug fixes are more comprehensive and time consuming.
Defect per Patch = 4
Units: Defect/Patch
The number of defects that are taken care of in one typical patch
Desired Bug Fixes for Next Release[Releasel = Bugs to Fix in Future Releases[
Release] * Fraction Bugs Fixed in Next Release
Units: Defect
The number of MRs planned to be fixed in the upcoming release.
206
Fraction Bugs Fixed in Next Release = 0.3
Units: Dmnl
Fraction of bugs normally planned to be fixed in the next release.
Increase in Known Bugs[Releasel = ZIDZ ( Defect Show up[Release] , Sites Installed[
Release] ) * SW Fixing Bugs
Units: Defect/Month
The increase in the number of bugs we know about in the field depends on the rate of bug reports
from the field across all releases.
Modules Equivalent to Patch = 20
Units: module/Patch A measure of how much work a patch entails, when compared to a
typical module.
"Multiplier Fixing work vs. Patch" = 2
Units: Dmnl
Fixing the errors when done as part of a new release and included in the whole code is this much
more resource consuming than fixing them as patches.
Size of Next Bug Fix = SUM ( Desired Bug Fixes for Next Release[Release!] ) / Defect per
Patch * Modules Equivalent to Patch * "Multiplier Fixing work vs. Patch"
Units: module
Size of the next bug fix in the upcoming planning of a release is determined by desired number
of bug fixes.
SW Fixing Bugs = I
Units: Dmnl
Put I to have bugs from old release become part of new release planning. Put 0 so that such
effect does not exist, i.e. the old bugs are completely ignored.
Time to Cancel Bug Fixes = 10
Units: Month
The time frame for draining the stock of bugs to fix.
207
'1)
in
( )
GD
v
;
©a
C)
JI
UC
A=
v
V
.(
m
a
a
0
C3
A
U,(D
.
L
'
2
C5
11
Li-
7)
n
,
LL
C
(C)
,
00
E
9a
m
0
U
.Z
a.
-
-
O
0
l
a)
im IC6
CY --t
U,
F
(C
C
2
,X
-1'
.,
iC
C)
,
11
CC)
,/) /
(( m
(1)I
n -rC
C-
, ll
f,
L,.-
+
7.
C L)
;i,
L
F ,
o"
'p
Architecture and Code Base Quality
Accmltd Code Accepted = INTEG( Inc Underlying Code, 0)
Units: module
Accumulated code accepted.
Accmltd Features Designed = INTEG( Inc Old Design, 0)
Units: feature
Accumulated number of features designed.
Accumulated Defects in Code = INTEG( Inc Defect in Underlying Code - Code Fixes, 0)
Units: Defect
The total number of defected codes accumulated across all releases
Accumulated Problems in Architecture and Design = INTEG( Inc Prblm in Old Design Architecture Redesign, 0)
Units: Problem
The accumulated problems in the architecture and design that influence the quality of future
releases.
Architecture Redesign = 0
Units: Problem/Month
The refactoring of architecture can reduce the errors in it.
Code Fixes = SUM ( Bug Fixes Planned[Release!] / TIME STEP * "Release Ready this Step?"[
Release!] ) / Fraction of Defects Important
Units: Defect/Month
The rate of fixing the code base as a result of incorporating MR fixes in the new release. Note
that that no matter what fraction of defects we call important, we will transfer same amount from
this stock.
Density of Defect in Code = ZIDZ ( Accumulated Defects in Code, Accmltd Code Accepted )
Units: Defect/module
The density of defects in the underlying legacy code.
Density of Problems in Architecture = ZIDZ ( Accumulated Problems in Architecture and
Design, Accmltd Features Designed)
Units: Problem/feature
The density of problems in the features already designed and used as building block of future
code.
Fixed Defects in Code = INTEG( Code Fixes, 0)
Units: Defect
accumulated number of fixed (former) defects in the code.
209
Inc Defect in Underlying Code = SUM ( ( Defects in Accepted Code[Release!] + Defects in
Code Developed[Release!] + Defects in MRs[Release! ] ) * "Release Ready this
Step?"[Release!] ) / TIME STEP
Units: Defect/Month
Defects in the code that is released become influential in the quality of upcoming releases, to the
extent that they hinder good development.
Inc Old Design = SUM ( Features Designed[Release!] / TIME STEP * "Release Ready this
Step?"[Release!] )
Units: feature/Month
The addition of legacy design
Inc Prblm in Old Design = SUM ( Problems in Designed Features[Release!] / TIME STEP *
"Release Ready this Step?"[Release!] )
Units: Problem/Month
Rate of increase in the problems of the underlying design.
Inc Underlying Code = SUM ( ( Code Accepted[Release!] + Code Developed[Release! ] +
Code Waiting Rework[Release!] ) * "Release Ready this Step?"[ Release!] ) / TIME STEP
Units: module/Month
Rate of increase in the underlying code
Relative Importance of Architecture Problems = 2
Units: Dmnl
This parameter factors in the difference between one problem in a feature architecture, vs. one
defect in a module of code written. The higher the parameter, the more important the weight of
architecture is in this balance.
Total Accumulated Defects = Accumulated Defects in Code
Units: Defect
Total number of accumulated defects in the underlying code.
Weighted Error Density in Architecture and Code Base = ( Density of Defect in Code
+ Density of Problems in Architecture * Error per Design Problem in Features / Modules for a
feature * Relative Importance of Architecture Problems ) / ( I + Relative Importance of
Architecture Problems )
Units: Defect/module
The weighted error density in architecture and code base shows the density of errors in old
architecture and code base as it influences the quality of development in new releases.
210
a
U
w
¥
-1
I
I
II
11
tlI
11
********************************
Scheduling
*******************************
Accumulated Change in Release Date[ReleaseJ = INTEG( Chang in Release Date[ Release],
0)
Units: Month
The amount of change in the release date accumulated through time for different releases.
Allocated Design Finish Date[Release] = Time + Allocated Design Time Left[Release ]
Units: Month
Allocated design finish date, used to determine the development finish date.
Allocated Dev Finish Date[Releasel = Time + ( Allocated Design Finish Date[ Release] - Time
) * ( 1 - "Frctn Des-Dev Overlap" ) + Allocated Dev Time Left[ Release]
Units: Month
The development finish date allocated.
Allocated Dev Time Left[Release = Estmtd Dev Time Left[Release,Current] * Prcnt Time
Allocated[Release]
Units: Month
Allocated development time left.
Allocated Release Date[Releasel = Allocated Dev Finish Date[Release] + Allocated Test Time
Left[ Release] * ( - "Frctn Test-Dev Overlap")
Units: Month
Allocated release date, based on development, design, and testing phase conditions.
Allocated Test Time Left]Releasel = Estmtd Test Time Left[Release] * Prcnt Time Allocated[
Release]
Units: Month
Allocated testing time left.
Chang in Release Date[Releasej = ( Allocated Release Date[Release] - Sanctioned Release
Date[ Release] ) / Time to Replan * if then else ( Sanctioned Release Date[ Release] = 0, 0, 1) * (
I - "Release Ready?"[Release] )
Units: Dmnl
is
the
change rate in the release date, which accumulates to show the delay in one release.
This
Chng Prcvd Estmtd Code[Releasel = if then else ( "Release Started?"[Release ] = I :AND:
Perceived Estimated Code to Write[Release] = 0, Estmtd Code to Write[ Release] / TIME STEP
, ( Estmtd Code to Write[Release] - Perceived Estimated Code to Write[ Release] ) / Time to
Perceive Current Status )
Units: module/Month
212
This equation ensures that the perceived estimated release date starts at estimated level at the
beginning but then adjusts to it with some delay.
Chng Status[Releasel =( 1 - "Release Ready?"[Release] ) * "Release Ready this Step?"[
Release] / TIME STEP
Units: /Month
The rate to change the status of a release into ready.
Estimated Test Left[Releasel = Estmtd Tests to Be Designed[Release] + Tests to Run[ Release]
Units: Test
The number of tests to run.
Estmtd Code to Write[Release] = Code to Develop[Release] + Features to Design[ Release] *
Modules for a feature
Units: module
A rough estimate of the code waiting to be written comes from currently available code and the
part to be designed based on the current specifications.
Estmtd Design Time Left[Release] = ZIDZ ( Features to Design[Release] , Features Designed[
Release] + Features to Design[Release] ) * Normal Design Time
Units: Month
Estimated design time left based on how far in the design process we are.
Estmtd Dev Time Left[Release,Productivity Base] = ZIDZ ( Expctd Code to Write After
Being First[ Release,Productivity Base] , Expctd Resource Available to First Release New Dev *
Prcvd Productivity[Release,Productivity Base])
Units: Month
The development time left depends on the work left, the expected resources available and
productivity of those resources.
Estmtd Test Time Left[Releasel = Estimated Test Left[Release] / Testing Resources / Test
Resource Productivity
Units: Month
Estimated testing time left.
Estmtd Tests to Be Designed[Release = Features to Design[Release] * Test Coverage / Code
Covered by Test * Modules for a feature
Units: Test
The number of tests to be designed.
Expctd Code to Write After Being FirstlRelease,Productivity Base] = Max ( 0 , Perceived
Estimated Code to Write[Release] - Expctd Code Written Before Being First[
Release,Productivity Base] )
Units: module
The code that is expected to be written after becoming the main release depends on how much
code is left to be written and how long we expect to stay in the shadow of other projects.
213
Expctd Code Written Before Being First[Release,Productivity Base] = Expctd Resource
Available to New Dev[
Release] * Prcvd Productivity[Release,Productivity Base] * Time in Shadow[ Release]
Units: module
The expected progress during the time the release is in the shadow of more eminent ones.
Expctd Resource Available to First Release New Dev = Total Dev Resources * Prcvd Frac
Resource to New Dev * Prcvd Frac to Current Release * SUM ( "If Ri Nth
Release?"[Release!,Rl] * Expected Fraction to This Release[ Release!] )
Units: Person
The resources available for the first (highest priority/most eminent) release is calculated from
looking at the current release with the highest priority.
Expctd Resource Available to New Dev[Release] = Total Dev Resources * Prcvd Frac
Resource to New Dev * Prcvd Frac to Current Release * Expected Fraction to This Release[
Release]
Units: Person
From total resources, only the fraction that is not going to CE, Rework, or other releases, can be
expected to be assigned to this release.
Expected Fraction to This Release[Release] = SUM ( Perceived Fraction to Nth Release[
NthRelease!] * "If Ri Nth Release?"[Release,NthRelease!] )
Units: Dmnl
The expected fraction of resources going to each release depends on where in the ranking that
release is perceived to be.
Frac Resources to New Dev = XIDZ ( SUM ( Development[Release!] ), SUM ( Development[
Release!] + Rework[Release!] ), 0.9)
Units: Dmnl
Fraction of resources going to development
Frac Resources to Nth Release[NthReleasel = SUM ( Fraction of Desired CR Allocated[
Release!] * "If Ri Nth Release?"[Release!,NthRelease] )
Units: Dmnl
The fraction of desired resources going to the Nth active release. Alternatively we could look at
the fraction of resources actually going, but that will come up with a too conservative estimate
for what will be left for each project.
Frac Time Cut = I
Units: Dmnl [0,1.5] Fraction of time requested that is normally given to the team.
Fraction of Resources to Current Release = XIDZ ( Allctd Ttl Dev Resource, Allctd Ttl Dev
Resource + Alloctd Ttl CE Resource, Initial Prcvd Fraction to Current release)
Units: Dmnl
The fraction of resources allocated to current releases.
214
Fraction Total Tests Passed[Release] = ZIDZ ( Test Ran[Release] , Total Tests for This
Release[ Release] )
Units: Dmnl
Fraction Total Tests Passed.
"Frctn Test-Dev Overlap" = 0.1
Units: Dmnl
The fraction of test overlap with development is indeed not constant, but changing depending on
where in the process we are. So when all the development is done, there is no overlap, while
there is much more overlap at the beginning. The small fraction makes the constancy assumption
OK.
"If Ri Nth Release?" [Release,NthRelease] = if then else ( Sorted Vector of Releases[
NthRelease] + 1 = Release, 1, 0) * ( 1 - "Release Ready?"[Release ] ) * "Release
Started?"[Release]
Units: Dmnl
This equation finds out if the each release is the Nthe active release. For example, if first release
is GA then the second release will be 1st, and so on. Note that releases which have not started or
have finished are not counted.
Initial Prcvd Fraction to Current release = 0.9
Units: Dmnl
Initially people come with the perception that only 10% of time need be spent on current
engineering
Negotiated Release Date[Release] = ( Reqstd Release Date[Release] - Time) * Frac Time Cut
+ Time
Units: Month
The release date given to the team, after negotiations.
Normal Design Time = 2
Units: Month
normal length of the design phase.
Normal Threshold for Replan = 2
Units: Month
Number Replans So Far[Releasel = ( SQRT ( ( 2 + Strength of Threshold Feedback) ^ 2 + 8 *
Accumulated Change in Release Date[Release] / Normal Threshold for Replan ) - ( 2 + Strength
of Threshold Feedback ) ) / ( 2 * Strength of Threshold Feedback ) - Modulo ( ( SQRT ( ( 2 +
Strength of Threshold Feedback) A 2 + 8 * Accumulated Change in Release Date[Release ] /
Normal Threshold for Replan) - ( 2 + Strength of Threshold Feedback ) / ( 2 * Strength of
Threshold Feedback), 1)
Units: Dmnl
215
Number of replans can be tracked based on the accumulated change in the release date and how
the management changes the formal schedule based on that accumulated informal change. The
formula comes from the assumption that threshold for changing release date is B*(l+a*n), where
B is threshold for the first replan, a is the strength parameter and n is the replan numbers so far.
By summing this over n=1..N we get to the total delay corresponding to N replans, and by
finding N in terms of that summation, we can find the Time equivalent. This variable, however,
is not changing anything at the current formulation, in the other words, it is assumed that official
replans are not so significant in determining the pressure, rather, the perception of the work
status and what will become the official replan in future are important. This makes the behavior
smoother (since the discrete replans are removed) but the overall dynamics don't change.
Perceived Estimated Code to Write[Release] = INTEG( Chng Prcvd Estmtd Code[ Release],
Estmtd Code to Write[Release] )
Units: module
Perceived code to be written based on current estimates.
Perceived Fraction to Nth Release[NthReleasel = INTEG( ( Frac Resources to Nth Release[
NthRelease] - Perceived Fraction to Nth Release[NthRelease] ) / Time to Perceive Resource
Availability, XIDZ ( 1, SUM ( I - "Release Ready?"[NthRelease!] ) * "Release Started?"[
NthRelease!]), 1))
Units: Dmnl
The fraction of resources given to the Nth release is perceived as we go forward. The initial
value is set to assume equal allocation between all active releases.
Prcnt Time Allocated[Release] = XIDZ ( Negotiated Release Date[Release] - Time, Reqstd
Release Date[Release] - Time, 1)
Units: Dmnl
Fraction of time requested, given to the development team.
Prcvd Frac Resource to New Dev = INTEG( ( Frac Resources to New Dev - Prcvd Frac
Resource to New Dev ) / Time to Perceive Resource Availability, 0.9)
Units: D)mnl
It takes time for people to learn what percentage of resources are really available for new
development, vs. rework and current engineering. The perception is assumed to aggregate across
multiple releases since people interact and see what happens across releases. At the beginning
this perception is assumed to equal 0.9
Prcvd Frac to Current Release = INTEG( ( Fraction of Resources to Current Release - Prcvd
Frac to Current Release ) / Time to Perceive Resource Availability, Initial Prcvd Fraction to
Current release )
Units: Dmnl
People adjust their perception about what fraction of work is given to current release
Prcvd Productivity[Release,Currentl = INTEG( ( Productivity[Release] - Prcvd Productivity[
Release,Current] ) / Time to Perceive Productivity, Normal Productivity[ Release] )
216
Prcvd Productivity[Release,Normal] = INTEG( 0 * ( Productivity[Release] - Prcvd Productivity[
Release,Current] ) / Time to Perceive Productivity, Normal Productivity[ Release] )
Units: module/(Month*Person)
Perceived productivity slowly adjust to real productivity observed on the field.
Project Readiness Threshold = 0.95
Units: Dmnl
The pass rate necessary for release
Rank Release[Releasel = ( Total Number of Releases - Release + 1) * ( I - "Release Ready?"[
Release] ) * "Release Started?"[Release]
Units: Dmnl
Gives the rank of each release: highest number for the first release, 0 for inactive ones.
Release Date Planning[Release] = if then else ( Estmtd Dev Time Left[Release ,Current] > 0
:AND: Sanctioned Release Date[Release] = 0, Allocated Release Date[ Release] / TIME STEP,
if then else ( Estmtd Dev Time Left[Release ,Current] > 0, 1, 0) * ( Allocated Release
Date[Release] - Sanctioned Release Date[Release] ) / Time to Replan )
Units: Dmnl
The formulation makes sure that we start with initial scheduled release date and then adjust it to
the allocated one slowly.
"Release Ready this Step?" [Release] = if then else ( Fraction Total Tests Passed[ Release] >=
Project Readiness Threshold, 1, 0) * ( 1 - "Release Ready?"[ Release] )
Units: Dmnl
The flag that becomes I at the time the release gets ready and then stays at 0.
"Release Ready?"[Release] = INTEG( Chng Status[Release] , 0)
Units: Dmnl
The counter variable showing if the release is ready or not
"Release Started?" [Release] = if then else ( Release Start Date[Release] > Time, 0, 1)
Units: Dmnl
Gives 1 if the release has started, otherwise 0.
Reqstd Release Date[Release] = Time + Time Reqstd for Des Dev Test Left[Release ]
Units: Month
Requested release date by development organization.
Sanctioned Release Date[Release] = INTEG( Release Date Planning[Release] , 0)
Units: Month
Sorted Vector of Releases[Release] = VECTOR SORT ORDER ( Rank Release[Release ], -1)
Units: Dmnl
This variable sorts and finds out the rank of each release among the active releases. So the first
active release gets the rank 0 and all the finished releases get the rank 5 (the number of releases).
217
Strength of Threshold Feedback = 0.5
Units: Dmnl
This is the power to which the number of replans are raised, in order to determine how larger the
Threshold gets for replanning. A value of 1 means for the 2nd replan we have 2 times of normal
Threshold and for the 3ed one 3 times, so on. A value of 0.5 suggests that for 2nd replan we have
sqrt(2)=1.41 times longer Threshold etc.
SW Effect Replan on Replan Threshold = I
Units: Dmnl
Put I to have the effect of number of replans on Threshold for a new replan. Put 0 for not having
this feedback on.
Time in Shadow[Later Releases] = Max ( 0, if then else ( "If Ri Nth Release?"[ Later
Releases,RI] = 1, 0, VECTOR ELM MAP ( Sanctioned Release Date[ RI], Later Releases - 2) Time ) )
Time in Shadow[RI]
= 0
Units: Month
The time that a release is in the shadow of more important releases is as long as there is a more
urgent release out there.
Time Reqstd for Des Dev Test Left[Release] = Estmtd Design Time Left[Release
] * ( I - "Frctn Des-Dev Overlap" ) + Estmtd Dev Time Left[Release,Current ] + Estmtd Test
Time Left[Release] * ( 1 - "Frctn Test-Dev Overlap")
Units: Month
The time required for the rest of the project depends on how much time testing, development,
and design need all together.
Time to Perceive Current Status = 2
Units: Month
There is a delay in assessing and perceiving the current status of project. This delay is important
in how fast we plan and react to the state of project.
Time to Perceive Productivity = 10
Units: Month
Time to Perceive Productivity
Time to Perceive Resource Availability = 50
Units: Month
Time to Perceive Resource Availability
Total Number of Releases = ELMCOUNT(Release)
Units: Dmnl
The total number of releases modeled
Total Tests for This Release[Releasej = Estmtd Tests to Be Designed[Release
218
] + Total Tests to Cover[Release]
Units: Test
Total tests for the current release.
Control: Simulation Control Parameters
FINAL TIME = 100
Units: Month
The final time for the simulation.
INITIAL TIME = 0
Units: Month
The initial time for the simulation.
SAVEPER = 1
Units: Month
The frequency with which output is stored.
TIME STEP = 0.03125
Units: Month
The time step for the simulation.
219
0
L
A
0
LL
::i
*
u,
<P
,
_
0)
cas
U)
a-
X
X
.
i5
C
-ar
C
40 (n
LL
t~~~~~~~C
X
_
,
C1
a)
.- ' 6~~~~~~~~
1
(U
LL.
CO,
C
v~
_
Cu
cuJ
U)
0)
F
I
co
~~~~~n
~
~
a2
cr
~
F-
0d)
'L
" ~g!
ffi
:3~~
euJ .
?' +
r)
N .>_
-
Z
z
a
f
T
:L
1
-
,Lrm
(3
~:A,'
a)
0P
Q)
a-
<A
E
Fe
a)
'i,
a)
C)
.
CK
-
71
1., C
\.
.
I
,
M
f ,
r -p·:x1
a
Y
Sales and Financial
Acceptable Delay Level = 4
Units: Month
The normal level of delay is put at 4 month, suggesting that if we have delays that long we get
normal sales.
Acceptable Features in Market = RAMP ( Feature Expectation Growth Rate, 20, FINAL
TIME )
Units: feature
This is the number of new features that need to be in a new release to make that release
reasonable in the eyes of customer.
Accumulated Features Designed[R1] = Features Designed[R1]
Accumulated Features Designed[Later Releases] = VECTOR ELM MAP ( Accumulated
Features Designed[ R1] , Later Releases - 2) + Features Designed[Later Releases]
Units: feature
This is the total number of features from the first release to this one.
Average Delay from All[Release] = (Accumulated Change in Release Date[Release
] + Relative Weight of Old Releases * Average Delay from Ready Releases ) ( I + Relative
Weight of Old Releases )
Units: Month
The average delay is a weighted average of delay so far in the current release, and the delay
observed in past releases.
Average Delay from Ready Releases = ZIDZ ( SUM ( Accumulated Change in Release Date[
Release!] * "Release Ready?"[Release!] ), SUM ("Release Ready?"[ Release!] ) )
Units: Month
Calculating the average of delay across all the ready projects.
Cost of Development and Testing Resources = Salary of developer and testers
* ( SUM ( Test Resources Allocated[Release!] ) + Total Allctd Resources)
Units: $/Month
Total Cost of Development and Testing Resources allocated to this project. It assumes that nonallocated resources are used elsewhere and therefore not billed to this product.
Cost of Services and Maintenance = Cost per Customer Call * SUM ( Total Customer Calls
from Defects[ Release!] )
Units: $/Month
Total Cost of Services and Maintenance, assumed to be a function of the number customer calls.
221
Cost per Customer Call = 200
Units: $/(Defect*Site)
The cost of each customer call arising from a defect, in terms of service labor etc.
Eff Features on Sales[Releasel = TI Eff Features on Sales ( XIDZ ( Perceived Features
Designed[ Release] , Acceptable Features in Market, 1) ) * SW Eff Features on Sales + I - SW
Eff Features on Sales
Units: Dmnl
The effect of new features on sales depends on how many features we have vs. what is the
acceptable level of features we could have.
Eff Market Saturation on Sales = TI Eff Saturation on Sales ( XIDZ ( Market Size, SUM (
Indicated Sales[Release!] ), 10))
Units: Dmnl
The effect of market saturation depends on how much sales we could potentially do, vs. what the
market needs.
Eff Quality and Services on Sales = TI Eff Quality and Services on Sales ( Normalized Quality
) * SW Eff Qual and Services on Sales + I - SW EffQual and Services on Sales
Units: Dmnl
Effect of quality and services on sales of the product.
Eff Timeliness on Sales[Releasel = TI Eff Delay on Sales ( Normalized Delay[ Release] ) * SW
Eff Delay on Sales + I - SW Eff Delay on Sales
Units: D)mnl
Effect of time-to-market on sales.
Exogenous Products Sold = 10
Units: Site/Month
Exogenous sales is used for model testing
Feature Expectation Growth Rate = 30
Units: feature/Month
The rate of growth in the number of features needed to remain competitive in the market place.
Fraction Overhead Cost = 0.5
Units: Dmnl
Overhead cost, as a fraction of R&D and Service labor costs.
Indicated Sales[Release] = Product Sales Personnel * Normal Sales Productivity * Eff Features
on Sales[Release] * Eff Quality and Services on Sales * EffTimeliness on Sales[Release] * This
Release Selling[Release ]
Units: Site/Month
The indicated sales depends on number of sales person, their productivity and how that
productivity depends on quality, timeliness, and features of the product.
222
Initial Perceived Quality = 1
Units: Dmnl
The initial quality level (0 for very good quality, 1 for medium, and higher for poor) is what
customers have in mind before observing the products real quality.
Maintenance Fee = 1000
Units: $/(Month*Site)
The maintenance fee charged.
Market Size = 50
Units: Site/Month
This size is defined in terms of number of sites per month in a steady state, not including any of
the diffusion dynamics etc. Therefore is fairly simplistic.
Net Revenue = Revenue - Total Cost
Units: $/Month
Net revenue of the product.
Normal Bug Acceptable = 300
Units: Defect
This is the avearge number of bugs in a medium release.
Normal Sales Productivity = 2
Units: Site/(Month*Person) The productivity of sales personnel in selling this product.
Normalized Delay[Release] = XIDZ (Acceptable Delay Level, Perceived Delay[ Release], 5)
Units: Dmnl
Delay level normalized by some acceptable delay.
Normalized Quality = XIDZ ( Normal Bug Acceptable, Perceived Outstanding Defects, 5)
Units: Dmnl
Normalized quality of the product.
Perceived Delay[Releasel = DELAY 1 (Average Delay from All[Release] , Time to Perceive
Delays )
Units: Month
Perceived delivery delay.
Perceived Features Designed[Release] = INTEG( (Accumulated Features Designed[ Release] Perceived Features Designed[Release] ) / Time to Perceive Functionality, Accumulated Features
Designed[Release])
Units: feature
Perceived number of the features included in each release.
223
Perceived Outstanding Defects = INTEG( ( Average Outstanding Defects - Perceived
Outstanding Defects ) / Time to Perceive Quality * "Release Ready?"[R ] , Initial Perceived
Quality * Normal Bug Acceptable )
Units: Defect
The initial perceived quality sets the expectations from the company before the release is out.
Later on it is adjusted to observed quality, as soon as first release is out.
Price = 250000
Units: $/Site The price of each installation of the software.
Product Sales Personnel = 5
Units: Person
The number of dedicated sales personnel for this product.
Products Sold[Release] = Exogenous Products Sold * This Release Selling[Release ] * ( I - SW
Endogenous Sales ) + SW Endogenous Sales * Sales Materialized[ Release]
Units: Site/Month
The number of products sold and sites installed in a month.
Profitability = ZIDZ (Net Revenue, Revenue )
Units: I)mnl
Profitability of the product line.
Relative Weight of Old Releases = 0.5
Units: Dmnl
The weight of old releases in comparison with the focal release, when determining the effect of
delay on sales.
Revenue = SUM ( Products Sold[Release!] * Price ) + Maintenance Fee * SUM ( Sites
Installed[Release!] )
Units: $/Month
The revenue earned from this product line.
Salary of developer and testers = 6000
Units: $/(Month*Person) Normal salary of developer and testers
Sales Bonus Fraction = 0.01
Units: Dmnl
The fraction of sales that goes as sales bonus
Sales Cost = Product Sales Personnel * Sales Personnel Cost + SUM ( Sales Materialized[
Release!] ) * Sales Bonus Fraction * Price
Units: $/Month
The cost of salesforce.
Sales Materialized[Releasel = Indicated Sales[Release] * Eff Market Saturation on Sales
224
Units: Site/Month
The real sales materialized not only depends on the potential level, but on market availability.
Sales Personnel Cost = 50000
Units: $/(Month*Person) The fixed salary of sales personnel
SW Eff Delay on Sales = I
Units: Dmnl
Put I to have effect of schedule delay on attractiveness, put 0 not to include that effect.
SW Eff Features on Sales = I
Units: Dmnl
Put I to have the effect of features on sales, put 0 to eliminate that effect.
SW Eff Qual and Services on Sales = I
Units: Dmnl
Put I to have effect of quality on sales, put 0 not to include that effect.
SW Endogenous Sales = I
Units: Dmnl
Put 1 to have endogenous sales rate. Put 0 to get the fixed, exogenous rate.
This Release Selling[Early Releases] = "Release Ready?"[Early Releases] * ( 1 - VECTOR
ELM MAP ( "Release Ready?"[RI] , Early Releases))
This Release Selling[R6] = "Release Ready?"[R6]
Units: Dmnl
Only the latest release is selling at any time.
Time to Perceive Delays = 5
Units: Month
Time to perceive the delays by customers.
Time to Perceive Functionality = 6
Units: Month
Time to perceive functionality of the software.
Time to Perceive Quality = 15
Units: Month
Time to perceive the quality of the product.
TI Eff Delay on Sales ( [(0,0)-(1,1.5)],(0,0),(0.195719,0.203947),(0.321101,0.447368)
,(0.455657,0.690789),(0.666667,0.888158),(1, 1),(2, 1.15),(4, 1.3))
Units: Dmnl
Table for effect of delays on sales.
225
TI Eff Features on Sales ( [(0,0)-(2, 1.5)],(0,0),(0.311927,0.026315 8),(0.489297,0. 118421)
,(0.66055,0.355263),(0.795107,0.598684),(0.880734,0.815789),(1,
1)
,(1.24159,1.18421),(1.5,1.3),(2,1.4))
Units: Dmnl
Table for effect of feature availability on sales.
TI Eff Quality and Services on Sales ( [(0,0)-(4,2)],(0,0),(0.464832,0.587719)
,(0.685015,0.789474),(1,1),(2,1.3),(4,1.5))
Units: Dmnl
Table for effect of quality on sales of the product.
TI Eff Saturation on Sales ( [(0,0)-(3,1)],(0,0),(0.4,0.4),(0.706422,0.583333)
,(1,0.7),(1.45872,0.837719),(2.07339,0.93421
1),(3, 1))
Units: Dmnl
There is a market saturation effect, i.e. sales can not grow more than a limit in the market.
Total Cost = ( Cost of Development and Testing Resources + Cost of Services and Maintenance
) * ( 1 + FractionOverheadCost) + Sales Cost
Units: $/Month
Total R&D, Service, and salesforce costs of the project.
226
A
QLL
U)
/E
en
A
a)
V
cU) o a)
T2
v
U)
V)
C'
<
E
m-
+z 's
1
L
La)U
ctci4
5
U)
.r
r
2
)
0
X
D {)cLA
AW
3
V-Xs Oa) U)
4)
2
v
a)
-a
A
00
o
E a)
G)
:3Q
qj
A
A
a)
(n
-
U)
_u
a)
-_
_
V
fO
v
********************************
Metrics
**
*****************************
Accumulated Calls at Final Time = if then else ( Time <= FINAL TIME - TIME STEP, 0,
Accumulated Customer Calls at Sampling Time / TIME STEP)
Units: Defect*Site/Month
Accumulated customer calls at final time for payoff function in optimization.
Accumulated Customer Calls at Sampling Time = SAMPLE IF TRUE( Time <= Sampling
Time :AND: Time + TIME STEP > Sampling Time, Accumulated Customer Calls, 0)
Units: Defect*Site
The accumulated number of customer calls at the time of the sampling.
Accumulated Resources at Final Time = if then else (Time <= FINAL TIME - TIME STEP,
0, Accumulated Resources at Sampling Time / TIME STEP)
Units: Person
accumulation resources at final time for optimization purposes.
Accumulated Resources at Sampling Time = SAMPLE IF TRUE( Time <= Sampling Time
:AND: Time + TIME STEP > Sampling Time, Accumulated Resources So far, 0)
Units: Month*Person
Picks up the value of accumulated resources at the sampling time.
Accumulated Resources So far = INTEG( Resource Accumulation, 0)
Units: Month*Person
Total number of people involved in development and testing of the product.
Accumulated Revenue = INTEG( Revenue Inc, 0)
Units: $ The accumulated revenue through time.
Average Max Natural Sch Pressure = ZIDZ ( Max Natural Schedule Pressure, Time )
Units: Dmnl
Average of the maximum schedule pressure among different releases, through time.
Finish Time at Final Time = if then else (Time <= FINAL TIME - TIME STEP, 0, Finish
Time of Last Release / TIME STEP )
Units: Dmnl
Finish time of 6 releases at the end, divided by time step to make sure the payoff is equal to final
time, in optimization control.
Finish Time of Last Release = SAMPLE IF TRUE( "Release Ready?"[R6] > 0 :AND: Finish
Time of Last Release = 0, Time, 0)
Units: Month
This picks up the finish time for the last release.
228
Max Natural Schedule Pressure = INTEG( Min ( 3, VMAX ( Natural Dev Schedule Pressure[
Release!] )) , 0)
Units: Month
The maximum of schedule pressure across different releases.
Resource Accumulation = SUM ( Test Resources Allocated[Release!] ) + Total Allctd
Resources
Units: Person
Sum of resources actively involved in the project at this time.
Revenue Inc = Revenue
Units: $/Month
The increase rate in revenue.
Sampling Time = if then else ( Finish Time of Last Release > 0, Min ( Finish Time of Last
Release + Time to Wait Before Sampling, FINAL TIME ), FINAL TIME )
Units: Month
Sampling time for observing value of important metrics.
Time to Wait Before Sampling = 10
Units: Month
There is no sampling in the first few months to avoid picking up things before project are under
way.
229
+i
N,
em
o 0
f
O'
a) a)C)
0
'-e;
E
ri,
v
v
~,
C)c)
a)
O.
0
v
aht , )+ I.-,
-3
8 E
E ,
C),W
_-
'
+
< r
'T
o
a
.>
LL
0
0C14
0
0
en
4..
..
I-
Testing Model
Discrepancy Defect in Field Vs Accumulated = Total Accumulated Defects - Total Ave Defect
/ Fraction of Defects Important
Units: Defect
This test item tells if the total number of defects in the average field site equals the number we
have on the accumulated error case. It is not necessarily 0 since when we sell the product, not
100% of code is passed, and therefore initial customers get fewer errors (just a few) while the
later ones have more.
Maximum of All Physical Stocks = Max ( VMAX ( Code Developed[Release!] ), Max (
VMAX ( Code to Develop[Release!] ), Max ( VMAX (Code Waiting Rework[ Release!]),
Max ( VMAX ( Defects in Code Developed[ Release!] ), Max ( VMAX ( Defects in
MRs[Release! ] ) , Max ( VMAX ( Defects Patched[Release! ] ) , Max ( VMAX ( Defects with
Patch[ Release!] ) , Max ( VMAX ( Features Designed[ Release!] ) , Max ( VMAX ( Features to
Design[ Release!] ) , Max ( Features Under Consideration, Max ( VMAX ( Known Defects in
Field[ Release!] ), Max ( VMAX ( Sites Installed[ Release! ] ), Max ( VMAX ( Test Ran[
Release! ] ), Max ( VMAX ( Tests to Run[ Release! ] ), VMAX ( Unknown Defects in Field[
Release!]))))))))))
)))))
Units Not Specified
Maximum of all physical stocks, used only for robustness testing, and therefore does not have a
unit.
Minimum of all Physical Stocks = Min ( VMIN ( Code Developed[Release!] ), Min ( VMIN (
Code to Develop[Release!] ), Min ( VMIN ( Code Waiting Rework[ Release!] ), Min ( VMIN (
Defects in Code Developed[ Release!] ), Min ( VMIN ( Defects in MRs[Release! ] ), Min (
VMIN ( Defects Patched[Release! ] ) , Min ( VMIN ( Defects with Patch[ Release!] ) , Min (
VMIN ( Features Designed[ Release!] ) , Min ( VMIN ( Features to Design[ Release!] ) , Min (
Features Under Consideration, Min ( VMIN ( Known Defects in Field[ Release!] ) , Min (
VMIN ( Sites Installed[ Release! ] ), Min ( VMIN ( Test Ran[ Release! ] ), Min (VMIN ( Tests
to Run[ Release! ] ), VMIN ( Unknown Defects in Field[ Release! ] ) ))
Units Not Specified
This test item checks if all the physical stocks remain above 0 or not.
***************************
Subscripts
Early Releases: (R1-R5)
Later Releases: (R2-R6)
NthRelease <-> Release
Productivity Base: Current,Normal
231
))) )) ) ) ) ) )
This subscript is to simplify equations written in parallel, but based on two different productivity
levels, one is the current level of the productivity, and another is based on the normal level of
productivity. Current productivity is what is really used in most of the planning of the work,
while the normal productivity is what is driving how fast people work, how is their quality and
so on.
Release: RI,Later Releases
Different releases for the same product
SRelease <-> Release
232
Alphabetical list of model variables and their groupings:
Variable
Acceptable Delay Level
Acceptable Features in Market
Accmltd Code Accepted
Accmltd Features Designed
Accumulated Calls at Final Time
Accumulated Change in Release Date
Accumulated Customer Calls
Accumulated Customer Calls at Sampling Time
Accumulated Defects in Code
Accumulated Features Designed
Accumulated Problems in Architecture and Design
Accumulated Resources at Final Time
Accumulated Resources at Sampling Time
Accumulated Resources So far
Accumulated Revenue
Accumulated Total Work
Allctd Ttl Dev Resource
Allocated Design Finish Date
Allocated Design Time Left
Allocated Dev Finish Date
Allocated Dev Time Left
Allocated Release Date
Allocated Test Time Left
Alloctd Ttl CE Resource
Architecture Redesign
Ave Def with Patch
Ave Dfctsy
in Accptd Code
Ave Dfct in Rwrk Code
Ave Dfct in Tstd Code
Ave Known Def in Site
Ave New Problem per Cancellation
Ave Prblm in Designed Feature
Ave Problem in Features to Design
Ave Unknown Def in Site
Average Defects Patched
Average Delay from All
Average Delay from Ready Releases
Average Max Natural Sch Pressure
Average Outstanding Defects
Average Time in the Field
Bug Fix Approval
Bug Fixes Designed
Bug Fixes Planned
Bugs to Fix in Future Releases
Calls from Defects With Patch
Calls from Known Defects
233
Group/View Name
Sales and Financial
Sales and Financial
Architecture and Code Base Quality
Architecture and Code Base Quality
Metrics
Scheduling
Patches and Current Engineering
Metrics
Architecture and Code Base Quality
Sales and Financial
Architecture and Code Base Quality
Metrics
Metrics
Metrics
Metrics
Patches and Current Engineering
Patches and Current Engineering
Scheduling
Development
Scheduling
Scheduling
Scheduling
Scheduling
Patches and Current Engineering
Architecture and Code Base Quality
Defects in the Field
Development
Development
Development
Defects in the Field
Concept and Design
Concept and Design
Concept and Design
Defects in the Field
Defects in the Field
Sales and Financial
Sales and Financial
Metrics
Defects in the Field
Defects in the Field
Fixing Bug s
Development
Fixing Bugs
Fixing Bugs
Defects in the Field
Defects in the Field
Calls from Unknown Defects
Cancel Bug Fixes
Chang in Release Date
Chng Prcvd Estmtd Code
Chng Start Date
Chng Status
Code Accepted
Code Cancellation
Code Covered by Test
Code Developed
Code Fixes
Code Passing Test
Code to Develop
Code to Fix Bugs
Code Waiting Rework
Cost of Development and Testing Resources
Cost of Services and Maintenance
Cost per Customer Call
CR Dev Resources Left After Allocation
Current Dev Schedule Pressure
Current Scope of Release
Defect Covering by Patch
Defect Intro in Coherence of Feature Set
Defect not Patching
Defect Patching
Defect per Patch
Defect Show up
Defects Accepted
Defects Fixed
Defects found
Defects from Dev
Defects Generated
Defects in Accepted Code
Defects in Code Developed
Defects in MRs
Defects Patched
Defects Retired
Defects with Patch
Defects with Patch Retired
Demand for Test Resources
Density of Defect in Code
Density of Problems in Architecture
Des Time Change
Discrepancy Defect in Field Vs Accumulated
Design
Design Rate
Design Time Left
Desired Bug Fixes for Next Release
Desired Development Rate
Defects in the Field
Fixing Bugs
Scheduling
Scheduling
Concept and Design
Scheduling
Development
Development
Development
Development
Architecture and Code Base Quality
Development
Development
Fixing Bugs
Development
Sales and Financial
Sales and Financial
Sales and Financial
Patches and Current Engineering
Development
Concept and Design
Defects in the Field
Concept and Design
Defects in the Field
Defects in the Field
Fixin Bugs
Defects in the Field
Development
Development
Development
Development
Development
Development
Development
Development
Defects in the Field
Defects in the Field
Defects in the Field
Defects in the Field
Test
Architecture and Code Base Quality
Architecture and Code Base Quality
Concept and Design
Testing Model
Concept and Design
Development
Concept and Design
Fixing Bugs
Development
234
Desired Patch Rate
Desired Resources to Current Engineering
Desired Total Dev Resources
Dev Reschedule
Dev Resources to New Dev
Dev Resources to Rework
Development
Development and CE Rate
Development Resource to Current Engineering
Development Resources to Current Release
Development Time Left
Dfct Dnsty in Accepted Code
Dfct Dnsty in Rejctd Code
Dfct Dnsty in Untstd Code
Dfct from Rwrk
Dfcts Left in RWrk
Dsrd Dev Resources
Dsrd New Dev Resources
Dsrd Rwrk Rate
Dsrd Rwrk Resources
Dsrd Time to Do Rwrk
Early Releases
Eff Archtcr Complexity on Productivity
Eff Bug Fixing on Scope of New Release
Eff Features on Sales
Eff Market Saturation on Sales
Eff Old Archtctr on Design Error
Eff Old Code & Archtctr on Error
Eff Pressure on Error Rate
Eff Pressure on Productivity
Eff Profitability on Resources Allocated
Eff Quality and Services on Sales
Eff Timeliness on Sales
Endogenous Release Start Date
Error per Design Problem in Features
Error Rate in Feature Design
Error Rate in Late Features
Error Rate in Original Feature Selection
Error Rate New Dev
Errors from Coding
Errors from Design and Architecture
Estimated Test Left
Estmtd Code to Write
Estmtd Design Time Left
Estmtd Dev Time Left
Estmtd Test Time Left
Estmtd Tests to Be Designed
Exogenous Products Sold
Exogenous Release Start Date
Patches and Current Engineering
Patches and Current Engineering
Patches and Current Engineering
Development
Development
Development
Development
Patches and Current Engineering
Patches and Current Engineering
Patches and Current Engineering
Development
Development
Development
Development
Development
Development
Development
Development
Development
Development
Development
Subscript
Development
Concept and Design
Sales and Financial
Sales and Financial
Concept and Design
Development
Development
Development
Patches and Current Engineering
Sales and Financial
Sales and Financial
Concept and Design
Development
Concept and Design
Concept and Design
Concept and Design
Development
Development
Development
Scheduling
Scheduling
Scheduling
Scheduling
Scheduling
Scheduling
Sales and Financial
Concept and Design
235
Expctd Code to Write After Being First
Expctd Code Written Before Being First
Expctd Resource Available to First Release New Dev
Expctd Resource Available to New Dev
Expected Fraction to This Release
Feasible New Dev Rate
Feasible Rwrk Rate
Feature Cancellation
Feature Expectation Growth Rate
Feature Requests
Features Approved
Features Designed
Features to Design
Features Under Consideration
FINAL TIME
Finish Time at Final Time
Finish Time of Last Release
Fixed Defects in Code
Frac CE Allctd
Frac CR Allctd
Frac Errors Found in Rwrk
Frac Resources to New Dev
Frac Resources to Nth Release
Frac Time Cut
Frac Total Tst Feasible
Fraction Bugs Fixed in Next Release
Fraction Code Accepted
Fraction code Perceived Developed
Fraction Code Written
Fraction Defect Unknown
Fraction of Calls Indicating New Defects
Fraction of Defects Important
Fraction of Desired CR Allocated
Fraction of Resources to Current Release
Fraction of sites Installing All patches for a Call
Fraction Overhead Cost
Fraction Patches Installed At Sales
Fraction Sites Installing
Fraction Total Tests Passed
Frctn Code Cancellation
Frctn Des-Dev Overlap
Frctn Test-Dev Overlap
Frctn Tst Running
Honey-Moon Period
If Ri Nth Release?
Inc Customer Calls
Inc Defect in Underlying Code
Inc Old Design
Inc Prblm in Old Design
236
Scheduling
Scheduling
Scheduling
Scheduling
Scheduling
Development
Development
Concept and Design
Sales and Financial
Concept and Design
Concept and Design
Concept and Design
Concept and Design
Concept and Design
Control
Metrics
Metrics
Architecture and Code Base Quality
Patches and Current Engineering
Patches and Current Engineering
Development
Scheduling
Scheduling
Scheduling
Test
Fixing Bugs
Test
Concept and Design
Test
Defects in the Field
Patches and Current Engineering
Defects in the Field
Patches and Current Engineering
Scheduling
Defects in the Field
Sales and Financial
Defects in the Field
Defects in the Field
Scheduling
Development
Development
Scheduling
Test
Patches and Current Engineering
Scheduling
Patches and Current Engineering
Architecture and Code Base Quality
Architecture and Code Base Quality
Architecture and Code Base Quality
Inc Underlying Code
Increase in Known Bugs
Indicated Sales
Initial Features Under Consideration
Initial Perceived Quality
Initial Prcvd Fraction to Current release
INITIAL TIME
Installing Known Patches
Known Defects in Field
Known Defects Retired
Known Defects Sold
Late Feature Error Coefficient
Late Feature Introduction
Late Feature Problems
Late Features
Late Intro Time
Later Releases
Maintenance Fee
Market Size
Max Natural Schedule Pressure
Maximum Cancellation Fraction
Maximum of All Physical Stocks
Min Des Time
Min Dev Time
Min Tst Time
Minimum Code Size With an Effect
Minimum of all Physical Stocks
Modules Equivalent to Patch
Modules for a feature
Multiplier Fixing work vs. Patch
Natural Dev Schedule Pressure
Negotiated Release Date
Net Revenue
New Error Rate in Rwrk
Normal Bug Acceptable
Normal Design Time
Normal Error Rate in Feature Design
Normal Error Rate in Rwrk
Normal Errors from Coding
Normal Productivity
Normal Sales Productivity
Normal Scope of New Release
Normal Threshold for Replan
Normalized Delay
Normalized Quality
NthRelease
Number Replans So Far
Ok to Plan?
Patch Developed
Architecture and Code Base Quality
Fixing Bugs
Sales and Financial
Concept and Design
Sales and Financial
Scheduling
Control
Defects in the Field
Defects in the Field
Defects in the Field
Defects in the Field
Concept and Design
Concept and Design
Concept and Design
Concept and Design
Concept and Design
Subscript
Sales and Financial
Sales and Financial
Metrics
Development
Testing Model
Concept and Design
Development
Test
Development
Testin Model
Fixing Bugs
Development
Fixing Bugs
Development
Scheduling
Sales and Financial
Development
Sales and Financial
Scheduling
Concept and Design
Development
Development
Development
Sales and Financial
Concept and Design
Scheduling
Sales and Financial
Sales and Financial
Subscript
Scheduling
Concept and Design
Patches and Current Engineering
237
Patch Development
Patch to Develop
Patched Defects Sold
Patches Retired
Perceived Delay
Perceived Estimated Code to Write
Perceived Features Designed
Perceived Fraction to Nth Release
Perceived Outstanding Defects
Perceived Profitability
Perceived Total Code
Prcnt Time Allocated
Prcvd Frac Resource to New Dev
Prcvd Frac to Current Release
Prcvd Productivity
Price
Problem Cancellation
Problems from Cancellation
Problems in Approved Features
Problems in Designed Features
Problems Introduced in Design
Problems Moved to Design
Problems Per Feature Interaction
Product Sales Personnel
Productivity
Productivity Base
Productivity in Writing Patch
Products Retired
Products Sold
Profitability
Project Readiness Threshold
Rank Release
Relative Importance of Architecture Problems
Relative Size of Expected Bug Fixing
Relative Weight of Old Releases
Release
Release Date Planning
Release Ready this Step?
Release Ready?
Release Start Date
Release Started?
Reqstd Release Date
Resource Accumulation
Retest Required
Revenue
Revenue Inc
Rework
Salary of developer and testers
Sales Bonus Fraction
Patches and Current Engineering
Patches and Current Engineering
Defects in the Field
Defects in the Field
Sales and Financial
Scheduling
Sales and Financial
Scheduling
Sales and Financial
Patches and Current Engineering
Concept and Design
Scheduling
Scheduling
Scheduling
Scheduling
Sales and Financial
Concept and Design
Concept and Design
Concept and Design
Concept and Design
Concept and Design
Concept and Design
Concept and Design
Sales and Financial
Development
Subscript
Patches and Current Engineering
Defects in the Field
Sales and Financial
Sales and Financial
Scheduling
Scheduling
Architecture and Code Base Quality
Concept and Design
Sales and Financial
Subscript
Scheduling
Scheduling
Scheduling
Concept and Design
Scheduling
Scheduling
Metrics
Test
Sales and Financial
Metrics
Development
Sales and Financial
Sales and Financial
238
Sales Cost
Sales Materialized
Sales Personnel Cost
Sampling Time
Sanctioned Code Complete
Sanctioned Design Complete
Sanctioned Release Date
SAVEPER
Scope of New Release
Sites Installed
Sites Reporting One Defect
Size of Next Bug Fix
Sorted Vector of Releases
SRelease
Strength of Complexity on Productivity Effect
Strength of Threshold Feedback
SW Eff Archtcr Complexity on Productivity
SW Eff Bug Fix on Scope
SW Eff Delay on Sales
SW Eff Features on Sales
SW Eff Old Arch on Design Problems
SW Eff Old Code and Archtctr on Error
SW Eff Pressure on Error
SW Eff Pressure on Productivity
SW Eff Proft on Resources
SW EffQual and Services on Sales
SW Effect Replan on Replan Threshold
SW Endogenous Start
SW Endogenous Sales
SW Exogenous CE Resources
SW Fixi BuFixing
SW Higher Priority to Current
Test Acceptance
Test Coverage
Test Design
Test Feasible
Test Ran
Test Rate
Test Rejection
Test Resource Productivity
Test Resources Allocated
Testing Resources
Tests to Run
This Release Selling
Threshold for New Project Start
Time in Shadow
Time Reqstd for Des Dev Test Left
Time Since Release Ready
TIME STEP
Sales and Financial
Sales and Financial
Sales and Financial
Metrics
Development
Concept and Design
Scheduling
Control
Concept and Design
Defects in the Field
Defects in the Field
Fixing Bugs
Scheduling
Subscript
Development
Scheduling
Development
Concept and Design
Sales and Financial
Sales and Financial
Concept and Design
Development
Development
Development
Patches and Current Engineering
Sales and Financial
Scheduling
Concept and Design
Sales and Financial
Patches and Current Engineering
Bugs
Patches and Current Engineering
Development
Test
Test
Test
Test
Test
Development
Test
Test
Test
Test
Sales and Financial
Concept and Design
Scheduling
Scheduling
Patches and Current Engineering
Concept and Design
239
Time to Adjust Resources
Time to Cancel Bug Fixes
Time to Patch
Time to Perceive Current Status
Time to Perceive Delays
Time to Perceive Functionali
ty
Time to Perceive Productivity
Time to Perceive Profitability
Time to Perceive Quality
Time to Perceive Resource Availability
Time to Replan
Time to Show up for Def with Patch
Time to Show Up for Defects
Time to Wait Before Sampling
Tl Eff Bug Fix on Scope
TI Eff Delay on Sales
Tl Eff Features on Sales
Tl Eff Old Arch on Design Problems
TI Eff Old Code & Archtctr on Error
TI Eff Pressure on Error Rate
TI Eff Pressure on Productivity
TI Eff Profit on Resources
TI Eff Quality and Services on Sales
Tl Eff Saturation on Sales
TI Eff Schdul Prssr on Cancellation
Tl Tst Dependency on Dev
Total Accumulated Defects
Total Allctd Resources
Total Ave Defect
Total Code
Total Cost
Total Customer Calls from Defects
Total Defects in Field
Total Defects Introduced
Total Desired Resources
Total Dev Resources
Total Development
Total Number of Releases
Total Tests for This Release
Total Tests to Cover
Totl Code to Tst
Totl Code Tstd
Tst Effectiveness
Tst Rate Feasible
Tst Rate from Resources
Tst Rjctn Frac
Tstd Code Accptd
Ttl Des Resource to CR
Ttl Des Resource to CE
Patches and Current Engineering
Fixing Bugs
Patches and Current Engineering
Scheduling
Sales and Financial
Sales and Financial
Scheduling
Patches and Current Engineering
Sales and Financial
Scheduling
Concept and Design
Defects in the Field
Defects in the Field
Metrics
Concept and Design
Sales and Financial
Sales and Financial
Concept and Design
Development
Development
Development
Patches and Current Engineering
Sales and Financial
Sales and Financial
Development
Test
Architecture and Code Base Quality
Patches and Current Engineering
Defects in the Field
Test
Sales and Financial
Patches and Current Engineering
Defects in the Field
Defects in the Field
Patches and Current Engineering
Patches and Current Engineering
Development
Scheduling
Scheduling
Test
Development
Development
Development
Test
Test
Development
Development
Patches and Current Engineering
Patches and Current Engineering
240
Unknown Defects in Field
Unknown Defects Introduced
Unpatched Defects Sold with Patch
Weighted Error Density in Architecture and Code Base
241
Defects in the Field
Defects in the Field
Defects in the Field
Architecture and Code Base Quality