Three Essays on Modeling Dynamic Organizational Processes by Hazhir Rahmandad Bachelor of Science, Industrial Engineering, Sharif University of Technology (2000) Submitted to the Alfred P. Sloan School of Management in partial fulfillment of the requirements for the Degree of Doctor of Philosophy in Management at the Massachusetts Institute of Technology August 2005 C2005 Massachusetts Institute of Technology All Rights Reserved Signatureof Author................................................................................. Department of Operations Management and System Dynamics, August 2005 Certified by ............ ................................... John D. Sterman / Certified by ............. Professor of Management Thesis Supervisor ............................................. Nelso. N Repenning Professor of Management .i~ a _aft Accepted by ... ............................................. MASSACHUSEUS INSTITE OF TECHNOLOGY SEP 0 Thesis Supervisor 2005 LIBRARIES Birger Wernerfelt, Chair, Ph.D. Committee Sloan School of Management Three Essays on Modeling Dynamic Organizational Processes by Hazhir Rahmandad Submitted to the Alfred P. Sloan School of Management in partial fulfillment of the requirements for the Degree of Doctor of Philosophy in Management Abstract Essay 1- Effects of Feedback Delay on Learning Learning figures prominently in many theories of organizations. Understanding barriers to learning is therefore central to understanding firms' performance. This essay investigates the role of time delays between taking an action and observing the results in impeding learning. These delays ubiquitous in real world settings can introduce important tradeoffs between the long-term and the short-term performance. In this essay, four learning algorithms, with different levels of complexity and rationality, are built and their performances in a simple resource allocation task are analyzed. The study focuses on understanding the effect of time delays on learning. Simulation analysis shows that regardless of the level of rationality of the organization, misperceived delays can impede learning significantly. Essay 2- Heterogeneity and Network Structure in the Dynamics of Diffusion: Comparing Agent-Based and Differential Equation Models When is it better to use agent-based (AB) models, and when should differential equation (DE) models be used? Where DE models assume homogeneity and perfect mixing within compartments, AB models can capture heterogeneity in agent attributes and in the network of interactions among them. The costs and benefits of such disaggregation should guide the choice of model type. AB models may enhance realism but entail computational and cognitive costs that may limit sensitivity analysis and model scope. Using contagious disease as an example, we contrast the dynamics of AB models with those of the analogous mean-field DE model. We examine agent heterogeneity and the impact of different network topologies, including fully connected, random, Watts-Strogatz small world, scale-free, and lattice networks. Surprisingly, in many conditions differences between the DE and AB dynamics are not statistically significant for key metrics relevant to public health, including diffusion speed, peak load on health services infrastructure and total disease burden. We discuss implications for the choice between AB and DE models, level of aggregation, and model boundary. The results apply beyond epidemiology: from innovation adoption to financial panics, many important social phenomena involve analogous processes of diffusion and social contagion. Essay 3- Dynamics of Multiple-release Product Development Product development (PD) is a crucial capability for firms in competitive markets. Building on case studies of software development at a large firm, this essay explores the interaction among the different stages of the PD process, the underlying architecture of the product, and the products in the field. We introduce the concept of the "adaptation trap," where intendedly functional adaptation of workload can overwhelm the PD organization and force it into firefighting (Repenning 2001) as a result of the delay in seeing the additional resource need from the field and underlying code-base. Moreover, the study highlights the importance of 2 architecture and underlying product-base in multiple-release product development, through their impact on the quality of new models under development, as well as through resource requirements for bug-fixing. Finally, this study corroborates the dynamics of tipping into firefighting that follows quality-productivity tradeoffs under pressure. Put together, these dynamics elucidate some of the reasons why PD capability is hard to build and why it easily erodes. Consequently, we offer hypotheses on the characteristics of the PD process that increase its strategic significance and discuss some practical challenges in the face of these dynamics. Thesis Supervisors: Michael A. Cusumano Professor of Management Nelson P. Repenning (Co-chair) Professor of Management Jan W. Rivkin Professor of Management John D. Sterman (Co-chair) Professor of Management 3 Acknowledgments When I entered my undergraduate program in Iran, I told my mom that this would be my last official educational degree. She calmly suggested that there was no need to make that decision at that time and that the future might unfold differently. This dissertation manifests the different turn of events she had anticipated, a turn of events which wouldn't have been possible without several individuals to whom I am most grateful. Ali Mashayekhi introduced me to the field of system dynamics, stimulated my interest in doing research, and helped me get into MIT. More importantly, his passion for working on problems that matter, and his deep caring for people, have set high standards that I will always aspire to. In the last five years, I have been extremely lucky to work with some of the best people I could ever hope to learn from. John Sterman first took a chance by admitting me to the program, for which I am very much grateful, and after that he was an exemplary advisor: he set high standards, embodied these standards in his rigorous research, and helped me in every step of the learning process by his continuous feedback and his insightful ideas and remarks. Moreover, John's caring attitude was crucial in providing the safe and supportive environment that is needed for learning and doing research, especially for an international student away from home and family. Nelson Repenning has been the ultimate supervisor in teaching me to focus on what matters in the big picture of a problem. Nelson always gave the best pragmatic advice on the process of Ph.D. work, tried hard to sharpen my focus, and was a role model in setting the right priorities and following them. Michael Cusumano and Jan Rivkin kindly took time out of their busy schedules and joined my committee in the later stages of the process. Their sharp advice, their close interaction, and their helpful attitudes have been instrumental in improving my work, and I am looking forward to future collaborations with them. The help and support of many people in my research sites were crucial for success of my field studies for the third essay. Martin Devos, David Weiss, Randy Hackbarth, John Palframan, and Audris Mockus spent many valuable hours helping me with the research and provided me with lots of data, connections, support, and feedback. Moreover, I am very grateful to many individuals in Sigma organization who gave their time for interviews, gave feedback on my models, provided data, and accepted me as a colleague among themselves. It is easy to think of Ph.D. education as a solitary endeavor: you are assigned a window-less cubicle from which, after a few years, you are expected to emerge with a fine dissertation. Reflecting on my experience, I find this perception far from the truth. Besides the invaluable friendships, much of the learning, research ideas, and improvements in my work came from close interaction with fellow Ph.D. students in different programs. Laura Black, Paulo Gongalves, and Brad Morrison, who preceded me in the SD program made me feel at home, gave me much good advice, and helped me on several important occasions. Jeroen Struben and Gokhan Dogan nurtured the friendly and intimate atmosphere that softened the hard days and brought about many fruitful discussions and initiatives. Finally, Joio Cunha, Lourdes Sosa, 4 Sinan Aral, and Kevin Boudreau helped me integrate my work within the larger community of organizational behavior and strategy. Fun is often not the best word to describe the life of a Ph.D. student at MIT, yet luckily I can describe the last five years as fun and enjoyable, and I owe that largely to several great friends. Not only did I enjoy the many quality days and evenings we spent together, but I also very much enjoyed our collaboration in the Iranian Studies Group with Ali, Mehdi, Mohammad, Farzan, Sara, and Nima. I also have some of my best life-long memories from events, parties, conversations, intimate moments, trips, dances, and sport activities that I enjoyed in the company of Adel, Anya, Ali(s), Chahriar, Mahnaz, Maryam, Mehdi, Mohsen, Parisa, Roya, Saloomeh, Sergi, Yue, and many others to whom I can't do justice in this short note. Mom and Dad get all of the credit for providing me with what I needed most at different stages of life, and they continued to excel in this task even though they were thousands of miles away. Their frequent phone calls, surprise gifts, and constant attention helped me feel connected and happy in the years that I had to spend away from my family. I am eternally thankful to them. The last months before the thesis defense are often considered the most stressful, yet I was lucky to be accompanied in this phase of my program by my adorable sweetheart, Sara. Her support, encouragement, and love made these months the best of all. Finally, I dedicate this dissertation to my grandparents, Babajoon and Mamanjoon, for their endless love and support, and for bringing up, and together, a wonderful family. 5 Table of Contents Essay 1- Effects of Feedback Delay on Learning ........................................ .................. 7 Introduction ................................................................................................................................. 7 Modeling learning with delayed feedback .......................................................... I Results and Analysis .......................................................... 21 Discussion ................................................................................................................................. 30 R eferences:................................................................................................................................ 35 Essay 2- Heterogeneity and Network Structure in the Dynamics of Diffusion ..................... 38 Introduction ............................................................................................................................... A spectrum of aggregation assumptions ............................................................................... Model Structure .......................................................... Experimental design.................................................................................................................. Results....................................................................................................................................... Discussion ................................................................................................................................. Conclusion .......................................................... References ...................................... ..................................................................................... 38 42 43 46 49 56 60 62 Essay 3- Dynamics of Multiple-release Product Development ............................................... 65 Introduction ................................................................................................................................ Data Collection .......................................................... Data Analysis .......................................................... Model and Analysis .......................................................... Discussion ................................................................................................................................. Implications for Practice ......................................................... 65 71 72 76 95 100 R eferences............................................................................................................................... 108 Appendix A- Detailed model formulations............................................................................. 111 Appendix B- The maximum PD capacity ............................................................................... 113 Appendix C- Boundary conditions of the model ........................................ .................. 114 Appendix 1-1- Technical material to accompany essay 1. ...................................... 120 Mathematics of the models ..................................................................................................... 120 A Q-learning model for resource allocation learning task...................................................... 125 Appendix 2-1- Technical material to accompany essay 2. ...................................... Formulation for the Agent-Based SEIR Model ........................................ .................. Deriving the mean-field differential equation model.............................................................. Network Structure: .......................................................... The Fitted DE Model .......................................................... Sensitivity analysis on population size: .......................................................... 134 134 137 139 141 142 Appendix 3-1- Reflections on the data collection and modeling process ............................. 146 Appendix 3-2- Feedback loops for multiple-release product development ......................... 158 Appendix 3-3- Documentation for detailed multiple-release PD model .............................. 169 6 Essay 1- Effects of Feedback Delay on Learning Abstract Learning figures prominently in many theories of organizations. Understanding barriers to learning is therefore central to understanding firms' performance. This essay investigates the role of time delays between taking an action and observing the results in impeding learning. These delays, ubiquitous in real-world settings, can introduce important tradeoffs between the long-term and the short-term performance. In this essay, four learning algorithms, with different levels of complexity and rationality, are built and their performances in a simple resource allocation task are analyzed. The study focuses on understanding the effect of time delays on learning. Simulation analysis shows that regardless of the level of rationality of the organization, misperceived delays can impede learning significantly. Introduction Learning figures prominently in many perspectives on organizations (Cyert and March 1963). Organizational routines and capabilities fundamental to firm performance are learnt through an adaptive process of local search and exploration (Nelson and Winter 1982). Researchers have highlighted learning as an important source of competitive advantage (DeGeus 1988; Senge 1990), central to the firm's performance in high-risk industries (Carroll, Rudolph et al. 2002), and underlying many efficiency gains by firms (Argote and Epple 1990). Therefore understanding the functional and dysfunctional effects of learning processes are vital to understanding firms' performance. Management scholars have studied both positive and negative aspects of learning. On the one hand, this literature portrays learning as an adaptive process (Cyert and March 1963), which brings useful knowledge (Huber 1991), increases efficiency (Argote and Epple 1990), and is central to the success of organizations in the fast-changing world (DeGeus 1988; Senge 1990). At the individual level, it is argued that learning helps individuals find the best strategies, and therefore, this argument is used to justify the individual rationality assumption in economics (Friedman 1953). 7 On the other hand, if learning mechanisms were always functional and could find the optimum strategies fast enough', all organizations in similar market niches would acquire similar strategies and capabilities. Therefore, to understand heterogeneity in a population of firms, we need to understanding the failure and path-dependence of different learning processes, along with the population-level dynamics of selection and reproduction. Organizational learning scholars have studied several factors that hinder learning or result in different outcomes for learning, including payoff noise, complexity, defensive routines, and delays. Noise and ambiguity in the payoff that an organization receives increase the chance of superstitious learning (Argyris and Schon 1978; Levitt and March 1988), where the organization draws wrong conclusions from its experience. Moreover, stochasticity in results creates a bias against more uncertain strategies for which an unlucky first experience prohibits future experimentation (Denrell and March 2001). Another challenge to functional learning lies in the interaction between the competency evolution and the search process. One can conceptualize learning as a process of (virtual or actual) experimentation by an agent (or by other agents in the case of vicarious learning), observing the resulting payoff, and adjusting (mental models and) actions to enhance performance. In this framework, each alternative's payoff not only depends on the inherent promise of that alternative, but also depends on the agent's experience with that alternative. As a result, initially-explored alternatives pay off better, regardless of their inherent promise, and therefore they close the door of experimentation to other, potentially better, alternatives (Levinthal and March 1981; Herriott, Levinthal et al. 1985). Such competency traps (Levitt and March 1988) also inhibit the organization from exploring better strategies once the current routines of the firm become obsolete in face of changing environment, thereby turning core capabilities into core rigidities (Leonard-Barton 1992). Complexity of the payoff landscape on which organizations adapt raises another challenge to organizational learning. The complementarities among different elements of a firm's strategy (Milgrom and Roberts 1990) result in a complex payoff landscape in which improvements in performance often come from changing multiple aspects of the strategy together. In other words, after some local adaptation to find an internally consistent strategy, incremental changes in one aspect of the strategy are often not conducive to performance gains, l Here the speed of learning should be compared to the speed of environmental change and processes of selection and reproduction. 8 so that the firm rests on a local peak of a rugged payoff landscape (Busemeyer, Swenson et al. 1986; Levinthal 1997). Even though discontinuous change can take the organization out of such local peaks (Tushman and Romanelli 1985; Lant and Mezias 1990; March 1991; Lant and Mazias 1992), this spatial complexity of payoff landscape can indeed contribute to the observed firm heterogeneity (Levinthal 2000; Rivkin 2000). Further studies in this tradition have shed light on the challenges of imitation and replication in the presence of spatial complexity (Rivkin 2001), the adaptation processes empirically used by organizations under these conditions (Siggelkow 2002), and the role of cognitive search in adaptation on rugged landscapes (Gavetti and Levinthal 2000). Individuals play a central role in organizations; therefore cognitive biases and challenges for individuals' learning have important ramifications for organizational learning. Studies have shown how dynamic complexity and delays can significantly hamper decision-making (Sterman 1989; Paich and Sterman 1993; Diehl and Sterman 1995) and result in self-confirming attribution errors in organizations (Repenning and Sterman 2002). Moreover, the activation of individual defensive routines closes the communication channels in the organization, creates bias in intraorganizational information flows, seals off flawed theories-in-practice from external feedback, and promotes group-think (Argyris and SchOn 1978; Janis 1982; Argyris 1999). Temporal challenges to learning such as delays between action and payoff have received little attention in organizational learning research. In contrast, as early as in the 1920 s the importance of temporal contiguity of stimulus and response was recognized by psychologists studying conditioning (Warren 1921). However, despite evidence of adverse effects of delays on individual learning (Sengupta, Abdel-Hamid et al. 1999; Gibson 2000) and some case-specific evidence at the organizational level (Repenning and Sterman 2002), with a few exceptions (Denrell, Fang et al. 2004), the effects of delays on organizational learning have not been studied. In fact, few formal models of organizational learning capture such delays explicitly. Denrell and colleagues (2004) build on the literature of learning models in artificial intelligence to specify a Q-learning (Watkins 1989) model of a learning task in a multidimensional space with a single state with non-zero payoff. Consequently the learner needs to learn to value different states based on their proximity to the goal state. Their analysis informs the situations in which several actions with no payoff need to be taken before payoff is observed. This setting is conceptually similar to having a delay between an action and the payoff where the intermediate states that the system can occupy are distinguishable by the decision-maker and 9 lead to no payoff. The study therefore highlights the importance of information about such intermediate states and elaborates on the central problem of credit assignment, how to build associations between actions and payoff in the presence of delays. This study, however, does not discuss the existence of payoff values in the intermediate states, does not consider the tradeoffs between short-term and long-term actions central to real-world delay effects, and uses an information-intensive learning algorithm that requires thousands of action-payoff cycles to discover a short path to a single optimum state. We expand these results by examining learning in a resource allocation setting that resembles the real-world challenges of an organization. We study how much performance and learning weaken as a result of delays, how sensitive the results are to the rationality of the decision-maker, and what the mechanisms are through which delays impede learning. A formal model enables us to pin down the effects of time delays on learning by removing other confounding factors that potentially influence learning. Specifically, we will remove the effects of feedback noise, multiple peaks and competency traps, discount rate, interpersonal dynamics, and confounding contextual cues on learning. Running controlled experiments on the simulation model, we can also gain insight into the mechanisms through which delays impede learning. Moreover, using a formal model, we are able to run a large number of controlled experiments and investigate them closely at a relatively cheap cost, which improves the internal validity of the study compared to using real experiments. Nevertheless, using simulations raises the question of the external validity of the model. Specifically, it is not clear how closely our algorithms of learning correspond to human learning patterns or organizational learning practices, and therefore how well the results can be generalized to real social systems. To meet this challenge we draw on the current literature on learning in organizational behavior, psychology, game theory, and attribution theory to model learning, and investigate four different learning procedures with different levels of rationality, information processing requirements, and cognitive search ability. By including these different algorithms, we can distinguish the behavioral patterns that are common across different algorithms and therefore derive more valid conclusions about the processes that extend to real social systems. Moreover, using these different algorithms, we can differentiate the effects of decision-making rationality from learning problems arising from delayed feedback. Our results confirm the hypothesis that delays complicate the attribution of causality between action and payoff and therefore can hinder learning (Einhorn and Hogarth 1985; 10 Levinthal and March 1993). Furthermore, our analysis shows that performance can still be significantly sub-optimal in very easy learning tasks, specifically, even when the payoff landscape is unchanging, smooth, and has a unique optimum. Moreover, the organization can learn to believe that the sub-optimal performance is really the best it can do. Finally, the results are robust to different learning algorithms and the difficulty of learning in the presence of delays appears to be rooted in the misperception of the delays between actions and outcomes, rather than the specific learning procedures we examined. These results suggest interesting research opportunities, including extending these algorithms; exploring interaction of delays with feedback noise, perception lags, and ruggedness of landscape; investigating the possibility of learning about the structure of time delays; and explaining empirical cases of learning failure. Moreover, the results have practical implications in the design of accounting and incentive systems as well as evaluation of firm performance. In the next section, we describe the model context and structure and discuss the different learning procedures in detail. The "Results and Analysis" section presents a base case demonstrating that all four learning procedures can discover the optimum allocation in the absence of action-payoff delays. Next we analyze the performance of the four learning procedures in the presence of action-payoff delays, followed by tests to examine the robustness of these results under different parameter settings. We close with a discussion of the implications, limitations, and possible extensions. Modeling learning with delayed feedback Resource allocation problems provide an appropriate context for studying the effect of time delays on learning. First, many important situations in multiple levels of analysis involve allocating resources among different types of activities with different delays between allocation and results. Examples include an individual allocating her time between work and education, a factory allocating resources between production, maintenance, and process improvement, and a government allocating budget between subsidies, building roads, and building schools. Moreover, these situations often involve tradeoffs between short-term and long-term results and therefore raise important questions about the short-term vs. long-term success of different social entities. For example, the factory can boost output in the short run by cutting maintenance, but in the long run, output falls as breakdowns increase. Other examples include 11 learning and process improvement (Repenning and Sterman 2002), investment in proactive maintenance (Allen 1993), and environmental degradation by human activity (Meadows and Club of Rome 1972). These tradeoffs suggest that individuals, organizations, and societies can often fail to learn from experience to improve their performance in these allocation and decisionmaking tasks. Previous studies motivated formulation of our model. Repenning and Sterman (2002) found that managers learned the wrong lessons from their interactions with the workforce as a result of different time delays in how various types of worker activity influence the system's performance. Specifically, managers seeking to meet production targets had two basic options: (1) increase process productivity and yield through better maintenance and investment in improvement activity; or (2) pressure the workforce to "work harder" through overtime, speeding production, taking fewer breaks, and, most importantly, by cutting back on the time devoted to maintenance and process improvement. Though the study found, consistent with the extensive quality improvement literature, that "working smarter" provides a greater payoff than working harder, many organizations find themselves stuck in a trap of working harder, resulting in reduced maintenance and improvement activity, lower productivity, greater pressure to hit targets, and thus even less time for improvement and maintenance (Repenning and Sterman 2002). In parallel to this setting, our model represents an organization engaged in a continuoustime resource allocation task. The organization must allocate a fixed resource among different activities. The payoff generated by these decisions can depend on the lagged allocation of resources, and there may be different delays between the allocation of resources to each activity and its impact on the payoff. The organization receives outcome feedback about the payoff and past resource allocations, and attempts to learn from these actions how to adjust resource allocations to improve performance. As a concrete example, consider a manufacturing firm allocating the organizational resources (e.g., employee time) among three activities: production, maintenance, and process improvement. These activities influence production, but with different delays. Time spent on production yields results almost immediately. There is a longer delay between a change in maintenance activity and machine uptime (and hence production). Finally, it takes even longer for process improvement activity to affect output. The organization gains experience by seeing the results of past decisions and seeks to increase production based on this experience. It has some understanding of the complicated mechanisms involved in controlling production captured in the mental models of individual 12 members and in organizational routines, and it may be aware of the existence of different delays between each activity and observed production. Consequently, when evaluating the effectiveness of its past decisions, the organization takes these delays into consideration (e.g., it does not expect last week's process improvement effort to enhance production today). However, the organizational mental model of the production process may be imperfect, and there may be discrepancies between the length of the delays it perceives and the real delays. 1- Allocation and payoff The organization continuously allocates a fraction of total resources to activity j of m possible activities at time t, 15(t)2 where: Fj(t) =l For j:1,...,m (1) J In our simulations we assume m = 3 activities, so the organization has two degrees of freedom. Three activities keep the analysis simple while still permitting different combinations of delay and impact for each. Total resources, R(t), are assumed to be constant so R(t) = R, and resources allocated to activity j at time t, 4(t) are: Aj (t) = Fj (t) * R (2) These allocations influence the payoff with a delay. The payoff at time t is determined by the Effective Allocation, Aj (t), which can lag behind Aj with a payoff generation delay of Tj. We assume a fixed, non-distributed A j (t) = Aj (t - Tj) delay for simplicity 3 : (3) The payoff generation delays are fixed but can be different for each activity. In our manufacturing firm example, R is the total number of employees, Fj (t) is the fraction of the fraction of people working on activity j (j: producing, maintenance, process improvement) and Aj (t) is the number of people working on activity j. In our example, the delays in the impact of these activities on production (the payoff) are presumably ordered approximately as 2 The model is formulated in continuous time but simulated by Euler integration with a time step of 0.125 period. Sensitivity analysis shows little sensitivity of the results to time steps < 0.2 . 3 Different types of delay, including first- and third-order Erlang delays, were examined; the results were qualitatively the same. 13 0 Tproduction< Tmaintenance< Timprovement. For simplicity we assume the payoff, P(t), to be a constant-returns-to-scale, CobbDouglas function of the effective allocations, which yields a smooth landscape with a single peak, allowing us to eliminate learning problems that arise in more complicated landscapes: P(t) = nA (t)a J , yaj =1 (4) 1 2- Perception delays The organization perceives its own actions and payoff and uses this information to learn about the efficiency of different allocation profiles and to develop better allocations. We assume the organization accounts for the delays between past allocations and payoffs, but recognize that its estimate of the length of these delays may not be correct. In real systems perceived payoff and payoff, P(t), are different, but to give the learning algorithms the most favorable circumstances, we assume that measurement and perception to be fast and unbiased. The organization accounts for the delays between allocations and payoff based on its beliefs about the length of the payoff generation delays, T,. It attributes the current observed payoff, P(t), to allocations made Tj periods ago, so the action associated to the current payoff, A, (t), is: Aj (t) = Aj (t - Tj) (5) The values of Aj (t) and P(t) are used as inputs to the various learning and decisionmaking procedures representing the organization's behavior. 3- Learning and decision-making procedures Having perceived its own actions and payoff streams, the organization learns from its experience, that is, updates its beliefs about the efficiency of different allocations and selects what it believes is a better set of allocations. We developed four different learning models to explore the sensitivity of the results to different assumptions about how organizations learn. The inputs to all of these algorithms are the perceived payoff and action associated with that payoff, P(t) and Aj (t); the outputs of the algorithms are the allocation decisions Fj (t) . The learning algorithms differ in their level of rationality, information processing requirements, and assumed prior knowledge about the shape of the payoff landscape. Here, rationality indicates organization's ability to make the best use of the information available by trying explicitly to optimize its allocation decisions. Information processing capability indicates 14 organization's capacity for keeping track of and using information about past allocations and payoffs. Prior knowledge about the shape of the payoff landscape determines the ability to use off-line cognitive search (Gavetti and Levinthal 2000) to find better policies. We use four learning algorithms4 denoted as Reinforcement, Myopic Search, Correlation and Regression. In all the learning algorithms, the decision-maker has a mental representation of how important each activity is. We call these variables "Activity Value," Vj (t) . The activity values are used to determine the allocation of resources to each activity (equations 12-14 explain how allocation of resources is determined based on Activity Values.) The main difference across the different learning algorithms is how these activity values are updated. Below we discuss the four learning algorithms in more details. The complete formulation of all the learning algorithms can be found in the technical appendix (1-1). 1- Reinforcement learning: In this method, the value (or attractiveness) of each activity is determined by (a function of) the cumulative payoff achieved so far by using that alternative. Attractiveness then influences the probability of choosing each alternative in the future. Reinforcement learning has a rich tradition in psychology, game theory and machine learning (Erev and Roth 1998; Sutton and Barto 1998). It has been used in a variety of applications, from training animals to explaining the results of learning in games and designing machines to play backgammon (Sutton and Barto 1998)5. In our model, each perceived payoff, P(t), is associated with the allocations believed to be responsible for that payoff, Aj (t) . We increase the value of each activity, Vj(t), based on its contribution to the perceived payoff. The increase in value depends on the perceived payoff itself, so a small payoff indicates a small increase in the values of different activities while a 4 We tried other algorithms, including Q-learning model used by Denrell et al. (2004), and decided to focus on the four reported here because they are more efficient. The discussion and some results for the Q-learning model are reported in the technical appendix (-1). In summary, to converge to the optimal policy, that model requires far more data than the reported algorithms. 5 The literature on reinforcement learning has different goals and algorithms in different fields. In the context of machine learning, reinforcement learning is developed as an optimization algorithm. Game theory literature on reinforcement learning focuses on providing simple algorithms that give good descriptive accounts of individual learning in simple games. Our use of reinforcement learning here is more aligned with the game theoretic reinforcement learning models. For a survey of machine learning literature on reinforcement learning see Sutton and Barto's book (1998) and Kaelbling et al. (1996). Also see technical appendix (1-1) for the results on a Q-learning model motivated by machine learning literature. 15 large payoff increases the value much more. Therefore large payoffs shift the relative weight of different activity values towards the allocations responsible for those better payoffs. Vj(t)P(t)Rifceme Aj (t) - V, (t) / Reinforcement Forgetting Time (6) Our implementation of reinforcement learning includes a forgetting process, representing the organization's discounting of older information. Discounting old information helps the algorithm adjust the activity values better. Equation 6 shows the main formulation of the Reinforcement algorithm. Here both "Reinforcement Power" and "Reinforcement Forgetting Time" are parameters specific to this algorithm. "Reinforcement Power" indicates how strongly we feedback the payoff as reinforcement, to adjust the "Activity Values" and therefore determines the speed of converging to better policies. "Reinforcement Forgetting Time" is the time constant for depreciating the old payoff reinforcements. Details of the model formulations are included in the technical appendix (1-1), table 1. The Reinforcement method is a low information, low rationality procedure: it continues to do what has worked well in the past, adjusting only slowly to new information, and does not attempt to extrapolate from these beliefs about activity value to the shape of the payoff landscape or to use gradient information to move towards higher-payoff allocations. 2- Myopic search: In this method the organization explores neighboring regions of the decision space (at random). If the last allocation results in a better payoff than the aspiration (based on past performance), the action values are adjusted towards the explored allocation, if the exploration results in a worse payoff, the action values remain unchanged. This procedure is a close variation of Levinthal and March's (1981) formulation for adaptation of search propensity for refinement and innovation. It is also similar to the underlying process for many of the stochastic optimization techniques where, unaware of the shape of the payoff landscape, the algorithm explores the landscape and usually moves to better policies upon discovering them. Optimization models assume decisions switch instantly to better policies, if found, for the next step. In reality, even if individual decisions can change fast, underlying value system adjusts slowly, due to the time required to aggregate new information, process the information, and make changes to the routines; that is, to the factors that lead to organizational inertia. To be behaviorally more realistic, we assume activity value, V, (t), adjusts gradually towards the value-set suggested by the last allocation, V,* (t) = A / Ak (Equation 7). V * (t) is the last 16 allocation if the payoff improved in the last step and remains to be the current allocation value if the payoff did not change significantly or decreased. d Vj (t) = (V (t) - V (t))/2 dt (7) where A is the Value Adjustment Time Constant. The formulation details for Myopic search method can be found in the technical appendix (1-1), table 2. The myopic method is a low rationality method with medium information processing. It compares the previous payoff with the result of a local search and does not attempt to compare multiple experiments; it also does not use information about the payoffs in the neighborhood to make any inferences about the shape of the payoff landscape. 3- Correlation: This method uses principles from attribution theories to model learning. In our setting, learning can be viewed as how organizational decision-makers attribute different allocations to different payoffs and how these attributions are adjusted as new information about payoffs and allocations are continuously perceived. Several researchers have proposed different models for explaining how people make attributions. Lipe (1991) reviews these models and concludes that all major attribution theories are based on the use of counterfactual information. However it is difficult to obtain counterfactual information (information about contingencies that were not realized) so she proposes the use of covariation data as a good proxy. The correlation algorithm is based on the hypothesis that people use the covariation of different activities with the payoff to make inferences about action-payoff causality. In our model, the correlations between the perceived payoff and the actions associated with those payoffs let the organization decide whether performance would improve or deteriorate if the activity increases. A positive (negative) correlation between recent values of Aj (t) and P(t) suggests that more (less) of activity j will improve the payoff. Based on these inferences the organization adjusts the activity values, V (t) , so that positively correlated activities increase above their current level and negatively correlated activities decrease below the current level. d V (t) = Vj (t).f (Action PayoffCorrelationj(t))2 dt f()=l, f'(x)>O (8) The formulation details for the correlation algorithm are found in technical appendix (1-1), Table 3. At the optimal allocation, the gradient of the payoff with allocation will be zero (the top of the 17 payoff hill is flat) and so will the correlation between activities and payoff. Therefore the change in activity values will become zero and the organization settles on the optimum policy. The correlation method is a moderate information, moderate rationality approach: more data are needed than required by the myopic or reinforcement methods to estimate the correlations among activities and payoff. Moreover, these correlations are used to make inferences about the local gradient so the organization can move uphill from the current allocations to allocations believed to yield higher payoffs, even if these allocations have not yet been tried. 4- Regression: This method is a relatively sophisticated learning algorithm with significant information processing requirements. We assume that the organization knows the correct shape of the payoff landscape and uses a correctly specified regression model to estimate the parameters of the payoff function. By observing the payoffs and the activities corresponding to those payoffs, the organization receives the information needed to estimate the parameters of the payoff function. To do so, after every few periods, it runs a regression using all the data from the beginning of the learning task6 . From these estimates the optimal allocations are readily calculated. Raghu and colleagues (Raghu, Sen et al. 2003) use a similar learning algorithm to model how simulated fund managers learn about effectiveness of different funding strategies. For the assumed constant returns to scale, Cobb-Douglas function, the regression is: log(P(t)) =log(a0 ) + The estimates ofaj ,aj, aj. log(Aj(t)) + e(t) (9) are evaluated every "Evaluation Period," E. Based on these estimates, the optimal activity values, V, (t) are given by: Vj,(t) Max(aj */ ai ,0) (10) The organization then adjusts the action values towards the optimal values (see Equation 7). See Table 4 in technical appendix (1-1) for equation details. 6 Discounting older information does not make any positive difference here since the payoff function does not change during the simulation. 7 In the case of negative a* , resulting Vj,(t) 's are adjusted to a close point on the feasible action space. This adjustment keeps the desired action values ( Vj (t) 's) feasible (positive). 18 The regression algorithm helps test how delays affect learning over a wide range of rationality assumptions. The three other algorithms represent an organization ignorant about the shape of the payoff function, while in some cases organization have at least partial understanding of the structure and functional forms of the causal relationships relating the payoff to the activities. Although in feedback-rich settings, mental models are far from perfect and calculation of optimal decision based on the understanding of the mechanisms is often cognitively infeasible (Sterman 1989; Sterman 1989), the regression algorithm offers a case of high rationality to test the robustness of our results. Table I summarizes the characteristics of the different algorithms on the three dimensions of rationality, information processing capacity and prior knowledge of payoff landscape. Table I - Sophistication and rationality of different learning algorithms Dimension Rationality Information Prior Knowledge of Algorithm Processing Capacity Payoff Landscape Reinforcement Low Low Low Myopic Search Low Medium Low Correlation Medium Medium Low Regression High High High The tradeoff between exploration and exploitation is a crucial issue in learning (March 1991; Sutton and Barto 1998). On one hand the organization should explore the decision space by trying some new allocation policies if she wants to learn about the shape of the payoff landscape. On the other hand pure exploration leads to random allocation decisions with no improvement. So exploration is needed to learn about payoff landscape and exploitation is required if any improvement is to be perceived. We use random changes in resource allocation to capture the exploration/exploitation issue in our learning models. The "Activity Values" represent the accumulation of experience and learning by the organization. In pure exploitation, the set of activity values, Vj (t), determines the allocation decision. Specifically, the fraction of resources to be allocated to activity j would be: F, (t) = V (t), Vj(t) (11) The tendency of the organization to follow this policy shows its tendency to exploit the experience it has gained so far. Deviations from this policy represent experiments to explore the 19 payoff landscape. We multiply the activity values by a random disturbance to generate Operational Activity Values, Vj (t) 's, which are the basis for the allocation decisions. V, (t) = Vj(t) +tNi (t) (12) Nj (t)= Ni (t)* V(t) (13) j (t) (14) J F, (t) Vi()/ Nj (t) is a first order auto-correlated noise term specific to each activity. It generates a stream of random numbers with autocorrelation. In real settings, the organization experiments with a policy for some time before moving to another. Such persistence is both physically required (organizations cannot instantly change resource allocations), and provides organizations with a large enough sample of data about each allocation to decide if a policy is beneficial or not. "Activity noise correlation time", 6, captures degree of autocorrelation in the stream Nj (t) . High values of a represent organizations who only slowly change the regions of allocation space they are exploring; a low value represents organizations who explore quickly from one allocation to another in the neighborhood of their current policy8 . We use Adjusted Pink Noise, N. (t) for evaluating the Operational Action Values, Vj (t) 's, because different algorithms have different magnitudes of Action Value and comparable exploration terms need to be scaled. The standard deviation of the noise term, which determines how far from the current allocations is explored, depends on where the organization finds itself on the payoff landscape. If its recent explorations have shown no improvement in the payoff, it concludes that it is near the peak of the payoff landscape and therefore extensive exploration is not required (alternatively, it concludes that the return to exploration is low and reduces experimentation accordingly). If, however, recent exploration has resulted in finding significantly better regions, it concludes that it is in a low-payoff region and there is still room for improvement so it should keep on exploring, so Var ( Nj (t)) remains large: Var (Nj (t))= g (RecentPayoff Improvement(t)), We set the mean of Nj(t) = 0, so the decision-maker has no bias in searching different regions of the landscape. We also truncate the values of PN so that N.(t) > -I to ensure that Vj (t) > 0 8 (Equation 12). 20 g(O)=Minimum Exploration Variance, g'(x)>O (15) In this formulation if the current payoff is higher than recent payoff, Recent Payoff Improvement will be increased, if not, it will decay towards 0. Details of exploration equations can be found at Table 5 of technical appendix (I-1). Results and Analysis In this section we investigate the behavior of the learning model under different conditions. We first explore the basic behavior of the model and its learning capabilities when there are no delays. The no-delay case helps compare the capabilities of the different learning algorithms and provides a base against which we can compare the behavior of the model under other conditions. Next we introduce delays between activities and payoff and analyze the ability of the different learning algorithms to find the optimal payoff, for a wide range of parameters. In running the model, the total resource, R, is 100 units and the payoff function exponents are set to 1/2, 1/3 and 1/6 for activities one, two and three, respectively. The optimal allocation fractions are therefore 1/2, 1/3 and 1/6, and optimal payoff is 36.37. Starting from a random allocation, the organization tries to improve its performance in the course of 300 periods.9 We report two aspects of learning performance: (1) How close to optimal the simulated organization gets, and (2) how fast it converges to that level of performance, if it converges at all. The percentage of optimal payoff achieved by the organization at the end of simulation is reported as Achieved Payoff Percentage. Monte-Carlo simulations with different random noise seeds for the exploration term (equation 13) and for the randomly chosen initial resource allocation give statistically reliable results for each scenario analyzed. To facilitate comparison of the four learning algorithms, we use same noise seeds across the four models therefore any differences in a given run are due only to the differences among the four learning procedures and not the random numbers driving them. 1- Base case The base case represents an easy learning task where there are no delays between actions and the payoff. The organization is also aware of this fact and therefore she has a perfect 9 The simulation horizon is long enough to give the organization opportunity to learn, while keeping the simulation time reasonable. In the manufacturing firm example we have chosen different time constants so that a period represents one month. Therefore the 300 period horizon would represents about 27 years, ample time to examine how much the organization learns. 21 understanding of the correct delay structure. In this setting, we expect all the learning algorithms to find the optimum solution. Figure 1 shows the trajectory of the payoff for each learning algorithm, averaged over 100 simulation runs. The vertical axis shows the percentage of the optimal payoff achieved l ° by each learning algorithm. Rearession Performance of Different Aleorithm s 100 90 80 70 60 0 30 60 90 120 150 180 Time (Period) 210 240 270 300 Figure 1- Percentage of payoff relative to optimal in the base case, averaged over 100 runs. When there are no delays, all four learning algorithms converge to the optimal resource allocation. For comparison, the average payoff achieved by a random resource allocation strategy is 68% of optimal (since the 100 simulations for each learning algorithm start from randomly chosen allocations, all four algorithms begin at an average payoff of 68%.) An important aspect of learning is how fast the organization converges to the allocation policy it perceives to be optimal. In some settings and with some streams of random numbers it is also possible that the organization does not converge within the 300 period simulation horizon. 10Note that by optimum we mean the best payoff one can achieve over the long-term by pursuing any policy which need not to be the highest possible payoff. For example, with a one period delay for activity I and no delay for the two others, the organization can allocate all 100 units to activity I during the current period and allocate all the resource between two other activities during the next period. Under these conditions, it can achieve higher than optimum payoff for the next period, at the expense of getting no payoff this period (because activities 2 and 3 receive no resources) and the period after the next (because activity I receives no resources in that period). A constant returns to scale payoff function prevents such policies from yielding higher payoffs than the constant allocations in longterm. 22 Table 2 reports the payoffs, the fraction of simulations that have converged as well as the average convergence time'" (for those that converged) in the base case. Regression, correlation, and myopic algorithms find the optimum almost perfectly and reinforcement goes a long way towards finding the optimum payoff. Majority of simulations converge prior to the 300 period horizon and the average convergence times range from a low of 38 periods for the regression algorithm to a high of 87 periods for the myopic model. Table 2- Achieved Payoff, Convergence Time and Percentage Converged for the Base case. Statistics from 400 simulations. Learning Algorithm Regression Estimates gp Correlation Myopic a I a It a . a Reinforcement Variable Achieved Payoff percentage at period300 99.96 0.04 99.91 0.07 99.43 1.57 95.18 5.02 Convergence Time 37.97 3.84 75.39 32.68 87.26 60.34 40.28 29.25 Percentage Converged' 1 99.75 99.75 98.75 90 2-The Impact of delays Most models of organizational learning share an assumption that actions instantly payoff and the results are perceived by the organization with no delay. Under this conventional assumption, all our learning algorithms reliably find the optimum allocation and converge to it. In this section we investigate the results of relaxing this assumption. A basic variation is to introduce a delay in the impact of one of the activities, leaving the delay for the other two other activities at zero. We simulate the model for 9 different values of the "Payoff Generation Delay" for activity one, TI, ranging from 0 to 16 periods, while we keep the delays for other activities at 0. In our factory management example the long delay in the impact of activity I is analogous to process improvement activities that take a long time to bear fruit. Because activity one is the most influential in determining the payoff, these settings are l We consider a particular learning procedure to have converged when the variance of the payoff falls below 1% of its historical average. If later the variance increases again, to 10% of its average at the time of convergence, we reset the convergence time and keep looking for the next instance of convergence. 12 Percentage of simulations converging to some policy at the end of 300 period simulations. These numbers are reported based on 400 simulations. 23 expected to highlight the effect of delays more clearly. Delays as high as 16 period are realistic in magnitude, when weighed against the internal dynamics of the model. For example, in the base case, it takes between 38 (for regression) to 87 (for myopic) periods for different learning models to converge when there is no delays, suggesting that dynamics of exploration and adjustment delays unfold in longer time horizons than different delays tested (See technical appendix (1-1 ), Table 6, for an alternative measure of speed of internal dynamics). We analyze two setting for the perceived payoff generation delay. First, we examine the results when the delays are correctly perceived: we change the perceived payoff generation delay for activity one, 1t at the same values as lT, so that we have no misperception of delays: T = Tj. This scenario informs the effect of delays when they are correctly perceived and accounted for. Then, we analyze the effect of misperception of delays by keeping the perceived payoff generation delay for all activities, including activity one, at zero. This setting corresponds to an organization that believes all activities affect the payoff immediately. Figure 2 reports the behavior of the four learning algorithms under the first setting, where delays are correctly perceived. Under each delay time the Average Payoff Percentage achieved by organization at the end of 300 periods and the Percentage Converged 13are reported. In the "Average Payoff Percentage" graph, the line denoted "Pure Random" represents the case where the organization selects its initial allocation policy randomly and sticks to that, without trying to learn from its experience. Results are averaged over 400 simulations for each delay setting. Performance PercentageConverged 100 - 90° 801 · 70soI a. <E 1001 - - 70 70 ___ - .--- ___ __ ___ __ ___ __ <go~~~~~~~~~~~ 0 so O 60 0 40 ! 201 50 0 2 4 8f 8 10 12 14 18 PayoffGenerationDelayfor Action1 - Regression-Correlaton -*- Myopic-Reinforcement 0 2 4 6 8 10 12 14 16 PayoffGenerationDelayfor Action1 -Random Figure 2- Payoff Percentage and Percentage Converged with different, correctly perceived, time delays for first activity. Activities 2 and 3 have no delay in influencing the payoff. 13 Convergence times are (negatively) correlated with the fraction of runs that converge and therefore are not graphed here. 24 The performances of all the four learning algorithms are fairly good. In fact regression model always finds the optimum allocation policy under all different delay settings if it has a correct perception of delays involved and the average performance of all the algorithms remain over 94% performance line. The convergence percentages declines, as continuously exploring the optimal region, a significant number of simulations fail to converge to the optimal policy, even though their performance levels are fairly high. PercentageConverged Performance xIuU 100 o o90 a 80 \ 80 O 60 Cn~~~~~~~~~~~ 'E3.- i. -h . a a =% , - 0; A 80 i 1 - 0 6_ M 40 a 20 . a) 50 I_ __ 0 _ _______. 2 4 , 6 nV 8 10 12 14 16 PayoffGenerationDelayfor Action1 - Regression-e-Correlation -Myopic .__ 0 2 4 6 8 10 12 14 16 PayoffGenerationDelayfor Action1 -*-Reinforcement -- Random Figure 3- Average Payoff Percentage and Percentage Converged with different time delays for first activity, when perceived delays are all zero. Results are averaged over 400 simulations. Figure 3 shows the performance of four learning algorithms under the settings with misperceived delays. This test suggests that the performances of all of the learning algorithms are adversely influenced with the introduction of misperceived delays. It also shows a clear drop in the convergence rates as a consequence of such delays. In summary, the following patterns can be discerned from these graphs. First, learning is significantly hampered if the perceived delays do not match the actual delays. The pattern is consistent across the different learning procedures. As the misperception of delays grow all four learning procedures yield average performance worse than or equal to the random allocation policy (68% of optimal under this payoff function). However, correctly perceived delays have a minor impact on performance. Second, under biased delay perception the convergence percentages decrease significantly, suggesting that algorithms usually keep on exploring as a result of inconsistent information they receive. More convergence happen when the delays are perceived correctly. Meanwhile, still a notable fraction of simulation runs do not converge under correctly perceived delays, which suggest that existence of delays by themselves complicate the task of organization for 25 converging on the optimal policy. More interestingly, under misperceived delays, some simulations converge to sub-optimal policies for different algorithms. This means that the organization can under some circumstances end up with an inferior policy concluding that this is the best payoff it can get, and stops exploring other regions of the payoff landscape, even though there is significant unrealized potential for improvement. Third, different learning algorithms show the same qualitative patterns. They also show some differences in their precise performance, where under misperceived delays, the more rational learning algorithms, regression and correlation, perform better in short delays and under-perform in long delays. In fact regression and correlation models under-perform the random allocation (p<0.000 ), while regression always finds the optimal allocation when the delay structure is correctly perceived. In short, independent of our assumptions about the learning capabilities of the organization, a common failure mode persists: when there is a mismatch between the true delay and the delay perceived by the organization, learning is slower and usually does not result in improving the performance. In fact the simulated organization frequently concludes that an inferior policy is the best it can achieve, and sometimes, trying to learn from their experience, the organizations reach equilibrium allocations that yield payoffs significantly lower than the performance of a completely random strategy. To explore the robustness of these results, we investigate the effect of all model parameters on the performance of each learning algorithm. 3-Robustness of results Each learning model involves parameters whose values are highly uncertain. To explore the sensitivity of the results to these parameters we conducted sensitivity analysis over all the parameters of each learning procedure. In this analysis we aim at investigating the robustness of the conclusions we have derived in the last section about the effect of delays on learning and answer the following questions: - Is there any asymmetry in the effect of misperceived delays on performance? In reality organizations are more prone to underestimating, rather than overestimating, the delays (e.g. managers usually don't overestimate how long it takes the process improvement to payoff), so finding out about such asymmetries is important and consequential. - Is there any nonlinear effect of delays on learning? In specific, there should be a limit to negative effects of delays on performance. Do such saturation effects exist and if so at what delay levels do they become important? 26 We conducted a Monte Carlo analysis, selecting each of the important model parameters randomly from a uniform distribution over a wide range (One to four times smaller/bigger than the base case. See the complete listing and ranges in the technical appendix (1-1), table 7). We carried out 3000 simulations, using random initial allocations in each. Simple graphs and tables are not informative about these multidimensional data. We investigate the results of this sensitivity analysis using regressions with three dependent variables: Achieved Payoff Percentage, Distance Traveled, and Probability of Convergence. Distance traveled measures how far the algorithm has traveled on the action space, and is a measure of the cost of search and exploration. For each of these variables and for each of the learning algorithms, we run a regression over the independent parameters in that algorithm (See technical appendix (1-1) for complete listing). The regressors also include the Perception Error, which is the difference between the Payoff Generation Delay and its perceived value; Absolute Perception Error; and Squared Perception Error. Having both Absolute Perception Error and Perception Error allows us to examine possible asymmetric differences for misperceiving errors. The second order term will allow inspection of nonlinearities and saturation of delay effects. To save space and focus on the main results, we only report the regression estimates for the main independent variables (Delay, Perception Error, Absolute Perception Error, and Squared Perception Error), even though all the model parameters are included on the right hand side of the reported regressions. OLS is used for the Achieved Payoff Percentage and Distance Travelled; logistic regression is used for the Convergence Probability. Tables 3 5 show the regression results, summary statistics are available in the technical appendix (1-1), Table 8. All the models are significant at p < 0.001. The following results follow from these tables: - Increasing the absolute difference between the real delay and the perceived delay always decreases the achieved payoff significantly (Table 4, row 4). The coefficient for this effect is large and highly significant, indicating the persistence of the effect across different settings of parameters and different learning algorithms. Absolute perception error also usually increases the cost of learning (Distance Traveled), even though the effect is not as pronounced as that of achieved payoff (Compare Tables 3 and 4, Rows 4). - Delay alone appears to have much less influence on payoff. The regression model is insensitive to correctly perceived delays and the rest of the models show a small but significant decline of performance (Table 3, Row 3). 27 Three of the models show asymmetric behavior in presence of misperceived delays. For Regression and Correlation, underestimating the delays is more detrimental than overestimating them (Table 3, Row 5) while Myopic is better off underestimating the delays rather than overestimating them. In all models the absolute error has a much larger effect than perception error itself (Table 3, Compare Rows 4 and 5). There is a significant nonlinear effect of misperceived delays on performance (Table 3, Row 6). This effect is in the expected direction, so higher perception errors have smaller incremental effect on learning and the effect saturates (second order effect neutralizes the first order effect) when the delay perception error is around 14 (except for regression which has a much smaller saturation effect). 28 V v .-. 6_ O 2_ 0 0 ° 0 )0 E C)0o °v °v C)- 0C~ O vo 0e W)C ad c00l. .nO.>°O- 00O v o o o-) o clab o0_0 (cl~~~c eF N N oo Co ~6 0 -0 00 <D0 - 0 - - N C) - 0 0 000 0 ., N C C0 ONo 00 ool 00 O 6 0 ON 00 8S --c .OO rrN~0099 000 1 96,W 6 a O O$ Cl _"I/_ _ _ _ : 0 m e 0.0 0 'O cl 0 0 ONc °v °V V -a0 v oo - _ _ Y)v'oc0 0 Vo0c 0 _C00 . ('N _* o~oo £ Q ,OC n o c O. c V 0· 00 ..000 no _) - _ - '.O-rN 00 -~ c - - 0O 0 O 0 c0 N _ a e_ W 00 .N Q 0 - (14 ON "0O 'n a, .E00 C N 0 c c N 0 cC 0 - C0o- 6i. I 0 0 O v - 0 '0 O c ·000 6 0od000 00- 0 0 '.0 S. _C__ CON< _C 6o00000 D - -; - t V '00o v e 5O -q)55c 5 O CCCIC c V NiIQl .2C 00 'T ONcq1 ,~CI5 e ol Nmo 000 00 N.0 '0 6- C)~ lu C)I go a 1. W a cl 2 ~ ~ a(i 6 'lol ~ < I .,~~( ~ '. (N 0 a0 I 0 ~ ~ o ~ llld I I I I I .0 - 0 0 6 0 o o0 > 0 0 0 0 O 4 0 aL C) c C) 0 0 0 ~~3 0 0 0 0 0 -u a 000 - 0 V. O0 0. a o 0 C >0, 06a u"0 6 m 't 0 0 0 e) 02 0 6 Y W) Nd'( N'. 'AN 3 a [(' N I 9 0 0 l ._ I.o 0a oa '0 a~ 9 L C Pt v C0) 0O . co -o ~~ ~i0~ .M 6 - CS l ln00C OCN ( N O' C w0* C C) r '( tA C· 0 0 - Convergence is influenced both by delay and by error in perception. So higher delays, even if perceived correctly, make it harder for the algorithms to converge (Table 5). These results are in general aligned with the analysis of delays offered in the last section and highlight the robustness of the results. The analysis shows that the introduction of delay between resource allocations and their impact causes sub-optimal performance, slower learning, and higher costs for learning. Discussion The results show that learning is slow and ineffective when the organization underestimates the delay between an activity and its impact on performance. It may be objected that this is hardly surprising because the underlying model of the task is mis-specified (by underestimation of the delays). A truly rational organization would not only seek to learn about better allocations, but would also seek to test its assumptions about the true temporal relationship of actions and their impact; if its initial beliefs about the delay structure were wrong, experience should reveal the problem and lead to a correctly specified model. We do not attempt to model such second-order learning here and leave the resolution of this question to further research. Nevertheless, the results suggest such sophisticated learning is likely to be difficult. First, estimating delay length and distribution is difficult and requires substantial data' 4; in some domains, such as the delay in the capital investment process, it took decades for consensus about the length and distribution of the delay to emerge (see Sterman 2000, Ch. I I and references therein). Results of an experimental study are illuminating: in a simple task (one degree of freedom) with delays of one or two periods, Gibson (2000) shows that individuals had a hard time learning over 120 trials. A simulation model fitted to human behavior and designed to learn about delays, learns the delay structure only after 10 times more training data. 4 The Q-learning model discussed in the technical appendix (1-1) allows us to show mathematically why the complexity of the state-space (which correlates well with time and data needed to learn about delay structure) grows exponentially with the length of delays. Details can be found in technical appendix (1-1); however, the basic argument is that if the action space has the dimensionality O(D), then the possibility for one period of delay requires us to know the last action as well as the current one to realize the payoff, increasing the dimensionality to O(D.D) and for delays of length K to O(DK+l). 30 Second, the experimental research suggests people have great difficulty recognizing and accounting for delays, even when information about their length and content is available and salient (Sterman 1989 (a); Sterman 1989 (b); Sterman 1994); in our setting there are no direct cues to indicate the length, or even the presence, of delays. Third, the outcome feedback the decision maker receives may be attributed to changing payoff landscape and noise in the environment, rather than to a problem with the initial assumptions about the delay structure. Decline in convergence and learning speed should be the main feedback for the organization to decide that its assumptions about the delay structure are erroneous. However, in the real world, where the payoff is not solely determined by the organization's actions, several other explanatory factors, from competitors' actions to sheer good luck and supernatural beliefs, come first in the explanation of the inconsistent payoff feedback. Our results are robust to rationality assumption of learning algorithm. Effects of time delays persist for a large range of model complexity and rationality that is attributable to human organizations; therefore, one can have higher confidence in applicability of these results to realworld phenomena. This is not to claim that no learning algorithm can be designed to account for delays' 5; rather, for the range of learning algorithms and experiment opportunities realistically applicable to organizations, delays between action and results become a significant barrier to learning. This conclusion is reinforced by the amount of experimentation needed to learn about a mismatch between the real-world delay structure and what the aggregate mental model of the organization suggests. A few different mechanisms contribute to failure of learning in our setting. First, in the absence of a good mental model of the delay structure, the organization fails to make reasonable causal attributions between different policies and results (the credit assignment problem (Samuel 1957; Minsky 1961)). Moreover, in the context of resource allocation, an increase in resources allocated to any activity naturally cuts down the resources allocated to the other activities; therefore, observing payoff of long-term activities requires going through a period of low payoff. 15There is a rich literature in machine learning, artificial intelligence, and optimal control theory that searches for optimum learning algorithms in dynamic tasks. (See some examples at: Kaelbling 1993; Barto, Bradtke et al. 1995; Kaelbling, Littman et al. 1996; Santamaria, Sutton et al. 1997; Tsitsiklis and VanRoy 1997; Bertsekas 2000). The focus of this literature is on optimal policies, rather than behavioral decision/learning rules and therefore these models are often higher in information-need and level of rationality than the range of empirically plausible decision-rules for human subjects or organizations. 31 For example, investing resources in process improvement will cut down the short-term payoff by reducing the time spent on producing. When delays are not correctly perceived, the transitory payoff shortfall can be attributed to lower effectiveness of the long-term activity, thereby providing a mechanism for learning wrong lessons from experience. Finally, learning may be difficult even when the delays are correctly perceived by the organization, especially when the organization does not have a perfectly specified model of the payoff landscape. In the presence of delays, even if they are correctly perceived, the information provided through payoff feedback may be outdated and be relevant only for a different region of payoff landscape. Consider the case where the delay is three periods and correctly perceived. In this case the organization properly attributes the payoff to the decisions of 3 periods ago and correctly finds out in which direction it could have changed the allocation decision 3 periods ago to improve the payoff. However, during the intervening 3 periods it has been exploring different policies and therefore the indicated direction of change in policy may no longer indicate the best direction to move. Nevertheless, this type of error is not large in our experimental setting as a result of relatively flat payoff surface near the peak. Our modeling of learning includes simplifying assumptions that need to be clear in light of their possible influence on learning performance and speed. First, in line with most other learning models in the literature, these algorithms use stochastic exploration of different possible actions, to learn from the feedbacks that they receive. This encourages a relatively large number of exploratory moves compared to what most real-world situations allow and provides the simulated organization with more data to learn from than most real settings. Moreover, costs and resources are not modeled here and therefore there is no bound on exploration and no pressure on the simulated organization to converge to any policy. These effects may increase the possibility of convergence of the real-world organization and increase the speed of learning; however, it is conceivable that their effect on quality of learning is not positive, possibly leading to more learning of wrong lessons than the current analysis suggests. Such reinforcement of harmful strategies is both theoretically interesting and practically important. For a concrete example, consider a firm facing a production shortfall. Pressuring the employees to work harder generates a short-run improvement in output, as they reallocate time from improvement and maintenance to production. The resulting decline in productivity and equipment uptime, however, comes only after a delay. It appears to be difficult for many 32 managers to recognize and account for such delays, so they conclude that pressuring their people to work harder was the right thing to do, even if it is in fact harmful. Over time such attributions become part of the organization's culture and have a lasting impact on firm strategy. Repenning and Sterman (2002) show how, over time, managers develop strongly held beliefs that pressuring workers for more output is the best policy, that the workers are intrinsically lazy and require continuous supervision, and that process improvement is ineffective or impossible in their organization--even when these beliefs are false. Such convergence to sub-optimal strategies as a result of delays in payoff constitutes an avenue through which temporal complexity of payoff landscape can create heterogeneity in firm performance (Levinthal 2000). Future extensions of this research can take several directions. First, there is room for developing more realistic learning models that capture the real-world challenges that organizations face. Possible extensions can be designing more realistic payoff landscapes (e.g., modeling the plant explicitly) and going beyond simple stimuli-response models (e.g., including culture formation). Second, it is intriguing to look into whether there is any interaction effect between delays and rugged landscape. Third, it is possible to investigate the effect of delays on learning in the presence of noise in payoff and measurement/perception delays. Fourth, distribution of payoff over time may have important ramifications for routines learnt by organizations. For example, it is conceivable that payoff that is distributed smoothly through time does not pass the organizational attention threshold and is thus heavily discounted. Therefore, activities that lead to distributed payoff can be under-represented in learnt routines. Fifth, these results can help us better explain some empirical cases of learning failure. Sixth, one can look at learning with delays in a population of organizations. Finally, it is important to look at the feasibility of the second-order learning, i.e., learning about the delay structure. Our research also has important practical implications. The results highlight the importance of employing time horizons longer than typical strategies for evaluating performance. This is important both in designing incentive structures inside the organization as well as in the market-level evaluation of the firm's performance. Empirical evidence highlights this conclusion. For example, Hendricks and Singhal (200 1) show the inefficiency of financial markets, in that stock prices do not effectively reflect the positive effects of total quality management programs on firm performance at the time the information about the quality initiative becomes known to the public. Moreover, more effective data-gathering and accounting 33 procedures can be designed that explicitly represent the delays between different activities/costs and the expected/observed payoff, and therefore improve the chances of organizational learning in the presence of delays. In sum, the results of our analysis suggest that learning can be significantly hampered and slowed when the delays between action and payoff are not correctly taken into account. In fact, under relatively short delays we repeatedly observe that all learning algorithms fail to achieve the performance of a random allocation policy. The robustness of these results under a large parameter space and their independence from level of rationality and information processing of models adds to their external validity and generalizability. This research highlights the importance of explicitly including the time dimension of action-payoff relationship in learning research and offers a new dimension to explain several cases of learning failure in the real world and the heterogeneity of firms' strategies. 34 References: Allen, K. 1993. Maintenance diffusion at du pont: A system dynamics perspective. Sloan School of Management. Cambridge, M.I.T. Argote, L. and D. Epple. 1990. Learning-curves in manufacturing. Science 247(4945): 920-924. Argyris, C. 1999. On organizational learning. Maiden, Mass., Blackwell Business. Argyris, C. and D. A. Scho5n.1978. Organizational learning: A theory of action perspective. Reading, Mass., Addison-Wesley Pub. Co. Barto, A. G., S. J. Bradtke and S. P. Singh. 1995. Learning to act using real-time dynamicprogramming. Artificial Intelligence 72(1-2): 81-138. Bertsekas, D. P. 2000. Dynamic programming and optimal control. Belmont, Mass., Athena Scientific. Busemeyer, J. R., K. N. Swenson and A. Lazarte. 1986. An adaptive approach to resource- allocation.OrganizationalBehavior and Human DecisionProcesses38(3): 318-341. Carroll, J. S., J. W. Rudolph and S. Hatakenaka. 2002. Learning from experience in high-hazard organizations. Research in Organizational Behavior, Vol 24 24: 87-137. Cyert, R. M. and J. G. March. 1963. A behavioral theory of thefirm. Englewood Cliffs, N.J.,, Prentice-Hall. DeGeus, A. P. 1988. Planning as learning. Harvard Business Review 66(2): 70-74. Denrell, J., C. Fang and D. A. Levinthal. 2004. From t-mazes to labyrinths: Learning from model-based feedback. Management Science 50(10): 1366-1378. Denrell, J. and J. G. March. 2001. Adaptation as information restriction: The hot stove effect. Organization Science 12(5): 523-538. Diehl, E. and J. Sterman. 1995. Effects of feedback complexity on dynamic decision making. OrganizationalBehaviorand Human DecisionProcesses62(2): 198-215. Einhorn, H. J. and R. M. Hogarth. 1985. Ambiguity and uncertainty in probabilistic inference. Psychological Review 92(4): 433-461. Erev, I. and A. E. Roth. 1998. Xxxpredicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. American Economic Review 88(4): 848-881. Friedman, M. 1953. The methodology ofpositive economics. Essays in positive economics. Chicago, IL, Chicago University Press. Gavetti, G. and D. Levinthal. 2000. Looking forward and looking backward: Cognitive and experiential search. Administrative Science Quarterly 45(1): 113-137. Gibson, F. P. 2000. Feedback delays: How can decision makers learn not to buy a new car every time the garage is empty? OrganizationalBehaviorand Human DecisionProcesses 83(1): 141-166. Hendricks, K. B. and V. R. Singhal. 2001. The long-run stock price performance of firms with effective tqm programs. Managmenet Science 47(3): 359-368. Herriott, S. R., D. Levinthal and J. G. March. 1985. Learning from experience in organizations. American Economic Review 75(2): 298-302. Huber, G. 1991. Organizational learning: The contributing processes and literature. Organization Science 2(1): 88-115. Janis, I. L. 1982.Groupthink: Psychologicalstudies ofpolicy decisionsandfiascoes. Boston, Houghton Miffin. Kaelbling, L. P. 1993. Learning in embedded systems. Cambridge, Mass., MIT Press. 35 Kaelbling, L. P., M. L. Littman and A. W. Moore. 1996. Reinforcement learning: A survey. Journal ofArtificial IntelligenceResearch 4: 237-285. Lant, T. K. and S. J. Mazias. 1992. An organizational learning model of convergence and reorientation. Organization Science 3: 47-71. Lant, T. K. and S. J. Mezias. 1990. Managing discontinuous change - a simulation study of organizational learning and entrepreneurship. Strategic Management Journal 11: 147179. Leonard-Barton, D. 1992. Core capabilities and core rigidities - a paradox in managing new product development. Strategic Management Journal 13: 11-125. Levinthal, D. 2000. Organizational capabilities in complex worlds. The nature and dynamics of organizational capabilities. S. Winter. New York., Oxford University Press. Levinthal, D. and J. G. March. 1981. A model of adaptive organizational search. Journal of EconomicBehavior and Organization2: 307-333. Levinthal, D. A. 1997. Adaptation on rugged landscapes. Management Science 43(7): 934-950. Levinthal, D. A. and J. G. March. 1993. The myopia of learning. Strategic Management Journal 14: 95-112. Levitt, B. and J. G(.March. 1988. Organizational learning. Annual Review of Sociology 14: 319340. Lipe, M. G. 1991. Counterfactual reasoning as a framework for attribution theories. Psychological Bulletin 109(3): 456-471. March, J. G. 1991. Exploration and exploitation in organizational learning. Organization Science 2(1): 71-87. Meadows, D. H. and Club of Rome. 1972. The limits to growth; a reportfor the club of rome's project on the predicament of mankind.New York,, Universe Books. Milgrom, P. and J. Roberts. 1990. The economics of modern manufacturing - technology, strategy, and organization. American Economic Review 80(3): 511 -528. Minsky, M. 1961. Steps toward artificial intelligence. Proceedings Institute of Radio Engineers 49: 8-30. Nelson, R. R. and S. G. Winter. 1982. An evolutionary theory of economic change. Cambridge, Mass., Belknap Press of Harvard University Press. Paich, M. and J. Sterman. 1993. Boom, bust, and failures to learn in experimental markets. Management Science 39(12): 1439-1458. Raghu, T. S., P. K. Sen and H. R. Rao. 2003. Relative performance of incentive mechanisms: Computational modeling and simulation of delegated investment decisions. Management Science 49(2): 160-178. Repenning, N. P. and J. D. Sterman. 2002. Capability traps and self-confirming attribution errors in the dynamics of process improvement. Administrative Science Quarterly 47: 265-295. Rivkin, J. W. 2000. Imitation of complex strategies. Management Science 46(6): 824-844. Rivkin, J. W. 2001. Reproducing knowledge: Replication without imitation at moderate complexity. Organization Science 12(3): 274-293. Samuel, A. 1957. Some studies in machine learning using the game of checkers. IBM Journal of Researchand Development3: 210-229. Santamaria, J. C., R. S. Sutton and A. Ram. 1997. Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behavior 6(2): 163-217. Senge, P. M. 1990.Thefifth discipline: The art andpractice of the learning organization.New York, Currency Doubleday. 36 Sengupta, K., T. K. Abdel-Hamid and M. Bosley. 1999. Coping with staffing delays in software project management: An experimental investigation. Ieee Transactions on Systems Man and CyberneticsPart a-Systemsand Humans29(1): 77-91. Siggelkow, N. 2002. Evolution toward fit. Administrative Science Quarterly 47(1): 125-159. Sterman, J. D. 1989. Misperception of feedbak in dynamic decision making. Organizational behaviorand human decisionprocesses 43: 301-335. Sterman, J. D. 1989. Modeling managerial behavior: Misperceptions of feedback in a dynamic decision making experiment. Management Science 35(3): 321-339. Sterman, J. D. 1994. Learning in and about complex systems. System Dynamics Review 10(2-3): 91-330. Sutton, R. S. and A. G. Barto. 1998. Reinforcement learning: An introduction. Cambridge, The MIT Press. Tsitsiklis, J. N. and B. VanRoy. 1997. An analysis of temporal-difference learning with function approximation. Ieee Transactions on Automatic Control 42(5): 674-690. Tushman, M. L. and E. Romanelli. 1985. Organizational evolution: A metamorphosis model of convergence and revolution. Research in Organizational Behavior 7: 171-122. Warren, H. C. 1921. A history of the association psychology. New York Chicago etc., C. Scribner's sons. Watkins, C. 1989. Learning from delayed rewards. Ph.D. Thesis. Cambridge, UK, Kings' College. 37 Essay 2- Heterogeneity and Network Structure in the Dynamics of Diffusion: Comparing Agent-Based and Differential Equation Models'6 Abstract When is it better to use agent-based (AB) models, and when should differential equation (DE) models be used? Where DE models assume homogeneity and perfect mixing within compartments, AB models can capture heterogeneity in agent attributes and in the network of interactions among them. The costs and benefits of such disaggregation should guide the choice of model type. AB models may enhance realism but entail computational and cognitive costs that may limit sensitivity analysis and model scope. Using contagious disease as an example, we contrast the dynamics of AB models with those of the analogous mean-field DE model. We examine agent heterogeneity and the impact of different network topologies, including fully connected, random, Watts-Strogatz small world, scale-free, and lattice networks. Surprisingly, in many conditions differences between the DE and AB dynamics are not statistically significant for key metrics relevant to public health, including diffusion speed, peak load on health services infrastructure and total disease burden. We discuss implications for the choice between AB and DE models, level of aggregation, and model boundary. The results apply beyond epidemiology: from innovation adoption to financial panics, many important social phenomena involve analogous processes of diffusion and social contagion. Keywords: Agent Based Models, Networks, Scale free, Small world, Heterogeneity, Epidemiology, Simulation, System Dynamics, Complex Adaptive Systems, SEIR model Introduction Spurred by growing computational power, agent-based modeling (AB) is increasingly applied to physical, biological, social, and economic problems previously modeled with nonlinear differential equations (DE). Both approaches have yielded important insights. In the social sciences, agent models explore phenomena from the emergence of segregation to organizational evolution to market dynamics (Schelling 1978; Levinthal and March 1981; Carley 1992; Axelrod 1997; Lomi and Larsen 2001; Axtell, Epstein et al. 2002; Epstein 2002; Tesfatsion 2002). Differential and difference equation models, also known as compartmental models, have an even longer history in social science, including innovation diffusion (Mahajan, 16 Co-authored with John Sterman 38 Muller et al. 2000) epidemiology (Anderson and May 1991), economics (Black and Scholes 1973), (Samuelson 1939), and many others. When should AB models be used, and when are DE models most appropriate? Each method has strengths and weaknesses. Nonlinear DE models often have a broad boundary encompassing a wide range of feedback effects but typically aggregate agents into a relatively small number of states (compartments). For example, models of innovation diffusion may aggregate the population into categories including unaware, aware, in the market, recent adopters, and former adopters ((Urban, Houser et al. 1990; Mahajan, Muller et al. 2000). However, the agents within each compartment are assumed to be homogeneous and well mixed; the transitions among states are modeled as their expected value, possibly perturbed by random events. In contrast, AB models can readily include heterogeneity in agent attributes and in the network structure of their interactions; like DE models, these interactions can be deterministic or stochastic. The granularity of AB models comes at some cost. First, the extra complexity significantly increases computational requirements, constraining the extent of sensitivity analysis to investigate the robustness of results. A second cost of agent-level detail is the cognitive burden of understanding model behavior. Linking the observed behavior of a model to its structure rapidly becomes more difficult as model complexity grows. A third tradeoff concerns the communication of results: the more complex a model, the harder it is to communicate the basis for its results to stakeholders and decision makers, and thus have an impact on policy. Finally, limited time and resources force modelers to trade off disaggregate detail and the scope of the model boundary. Model boundary here stands for the richness of the feedback structure captured endogenously in the model. For example, an agent-based demographic model may portray each individual in a population and capture differences in age, sex, and zip code, but assume exogenous fertility and mortality; such a model has a narrow boundary. In contrast, an aggregate model may lump the entire population into a single compartment, but model fertility and mortality as depending on food per capita, health care, pollution, norms for family size, etc., each of which, in turn, are modeled endogenously; such a model has a broad boundary. DE and AB models may in principle fall anywhere on these dimensions of disaggregation and scope, including both high levels of disaggregate detail and a broad boundary. In practice, however, where time, budget, and computational resources are limited, modelers must trade off 39 disaggregate detail and breadth of boundary. Understanding where the agent-based approach yields additional insight and where such detail is unimportant is central in selecting appropriate methods for any problem. The stakes are large. Consider potential bioterror attacks. Kaplan, Craft, and Wein (2002) used a deterministic nonlinear DE model to examine smallpox attack in a large city, comparing mass vaccination (MV), in which essentially all people are vaccinated after an attack, to targeted vaccination (TV), in which health officials trace and immunize those contacted by potentially infectious individuals. Capturing vaccination capacity and logistics explicitly, they conclude MV significantly reduces casualties relative to TV. In contrast, Eubank et al. (2004) and Halloran et al. (2002), using different AB models, conclude TV is superior, while Epstein et al. (2004) favor a hybrid strategy. The many differences among these models make it difficult to determine whether the conflicting conclusions arise from relaxing the perfect mixing and homogeneity assumptions of the DE (as argued by Halloran et al. (2002)) or from other assumptions such as the size of the population (ranging from 10 million for the DE model to 2000 in Halloran et al. to 800 in Epstein et al.), capacity constraints on immunization, parameters, etc. (Koopman 2002; Ferguson, Keeling et al. 2003; Kaplan and Wein 2003). Kaplan and Wein (2003) show that their deterministic DE model closely replicates the Halloran et al. results when simulated with the Halloran et al. parameters, including scaled down population and initial attack size, indicating that differences in the logistics of MV and TV are responsible for the different conclusions, not differences in mixing and homogeneity. Halloran and Longini (2003) respond that the equivalence may not hold when population is scaled up to city size due to imperfect mixing and agent heterogeneity, but do not test this proposition because simulation of the AB model with 10 million agents is computationally prohibitive. Here we carry out controlled experiments to compare AB and DE models in the context of contagious disease. We choose disease diffusion for four reasons. First, the effects of different network topologies for contacts among individuals are important in the diffusion process (Davis 1991; Watts and Strogatz 1998; Barabasi 2002; Rogers 2003), providing a strong test for differences between the two approaches. Second, the dynamics of contagion involve many important characteristics of complex systems, including positive and negative feedbacks, time delays, nonlinearities, stochastic events, and agent heterogeneity. Third, there is a rich literature on epidemic modeling. The DE paradigm is well developed in epidemiology (for 40 reviews see Anderson and May (1991) and Hethcote (2000)), and AB models with explicit network structures are on the rise (for reviews see Newman (2002; 2003) and Watts (2004)). Finally, diffusion is a fundamental process in diverse physical, biological, social, and economic settings. Many diffusion phenomena in human systems involve processes of social contagion analogous to infectious disease, including word of mouth, imitation, and network externalities. From the diffusion of innovations, ideas, and new technologies within and across organizations, to rumors, financial panics and riots, contagion-like dynamics, and formal models of them, have a rich history in the social sciences (Bass 1969; Watts and Strogatz 1998; Mahajan, Muller et al. 2000; Barabasi 2002; Rogers 2003). Insights into the advantages and disadvantages of AB and DE models in epidemiology can inform understanding of diffusion in many domains of concern to social scientists and managers. Our procedure is as follows. We develop an AB version of the classic SEIR model, a widely used nonlinear deterministic DE model. The DE version divides the population into four compartments: Susceptible (S), Exposed (E), Infected (I), and Recovered (R). In the AB model, each individual is separately represented and must be in one of the four states. The same parameters are used for both AB and DE models. Therefore any differences in outcomes arise only from the relaxation of the mean-field assumptions of the DE model. We run the AB model under five widely used network topologies (fully connected, random, small world, scale-free, and lattice) and test each with homogeneous and heterogeneous agent attributes such as the rate of contacts between agents. We compare the DE and AB dynamics on a variety of key metrics relevant to public health, including cumulative cases, peak prevalence, and the speed the disease spreads. As expected, different network structures alter the timing of the epidemic. The behavior of the AB model under the connected and random networks closely matches the DE model, as these networks approximate the perfect mixing assumption. Also as expected, the stochastic AB models generate a distribution of outcomes, including some in which, due to random events, the epidemic never takes off. The deterministic DE model cannot generate this mode of behavior. Surprisingly, however, even in conditions that violate the perfect mixing and homogeneity assumptions of the DE, the differences between the DE and AB models are not statistically significant for many of the key public health metrics (with the exception of the lattice network, where all contacts are local). The dynamics of different networks are close to 41 the DE model: even a few long-range contacts or highly connected hubs seed the epidemic at multiple points in the network, enabling it to spread rapidly. Heterogeneity also has small effects. We also examine the ability of the DE model to capture the dynamics of each network structure in the realistic situation where key parameters are poorly constrained by biological and clinical data. Epidemiologists often estimate potential diffusion, for both novel and established pathogens, by fitting models to the aggregate data as an outbreak unfolds (Dye and Gay 2003; Lipsitch, Cohen et al. 2003; Riley, Fraser et al. 2003). Calibration of innovation diffusion and new product marketing models is similar (Mahajan et al. (2000)). We mimic this practice by treating the A13simulations as the "real world" and fitting the DE model to them. Surprisingly, the fitted model closely matches the behavior of the AB model under all network topologies and heterogeneity conditions tested. Departures from homogeneity and perfect mixing are incorporated into the best-fit values of parameters. The concordance between DE and AB model suggests DE models remain useful and appropriate in many situations, even when mixing is imperfect and agents are heterogeneous. Since time and resources are always limited, modelers must trade off the data requirements and the computational burden of disaggregation against the breadth of the model boundary and the ability to do sensitivity analysis. The detail possible in AB models is likely to be most helpful where the contact network is highly clustered, known, and stable, where results depend delicately on agent heterogeneity or stochastic events (as in small populations), where the relevant population is small enough to be computationally feasible, and where time and budget permit sensitivity testing. DE models will be most appropriate where network structure is unknown, labile, or moderately clustered, where results hinge on the incorporation of a wide range of feedbacks among system elements (a broad model boundary), where large populations must be treated, and where fast turnaround or extensive sensitivity analysis are required. The next section reviews the literature comparing AB and DE approaches. We then describe the structure of the epidemic models, the design of the simulation experiments, and results, closing with implications and directions for future work. A spectrum of aggregation assumptions: AB and DE models should be viewed as regions in a space of modeling assumptions rather than incompatible modeling paradigms. Aggregation is one of the key dimensions of that space. Models can range from lumped deterministic differential equations, to stochastic compartmental models in which the continuous variables of 42 the DE are replaced by discrete individuals, to event history models where the states of individuals are tracked but their network of relationships is ignored, to dynamic network models where networks of individual interactions are explicit (e.g., Koopman et al. (2001). Modelers need to assess the cost and benefits of these alternatives relative to the purpose of the analysis. A few studies compare AB and DE models. Axtell et al. (1996) call for "model alignment" or "docking" and illustrate with the Sugarscape model. Edwards et al. (2003) contrast an AB model of innovation diffusion with an aggregate model, finding that the two can diverge when multiple attractors exist in the deterministic model. In epidemiology, Jacquez and colleagues (Jacquez and O'Neill 1991; Jacquez and Simon 1993) analyzed the effects of stochasticity and discrete agents in the SIS and SI models, finding some differences for very small populations. However, the differences practically vanish for populations above 100. Heterogeneity has also been explored in models that assume different mixing sites for population members. Anderson and May (1991, Chapter 12) show that the immunization fraction required to quench an epidemic rises with heterogeneity if immunization is implemented uniformly but falls if those most likely to transmit the disease are the focus of immunization. Ball et al. (1997) and Koopman et al. (Koopman, Chick et al. 2002) find expressions for cumulative cases and global epidemic thresholds in stochastic SIR and SIS models with global and local contacts, concluding that the behavior of deterministic and stochastic DE models can diverge for small populations. Keeling (1999) formulates a DE model that approximates the effects of spatial structure even when contact networks are highly clustered. Chen et al. (2004) develop AB models of smallpox, finding the dynamics generally consistent with DE models. In sum, AB and DE models of the same phenomenon sometimes agree and sometimes diverge, especially for smaller populations. Model Structure: The SEIR model is a deterministic nonlinear differential equation model in which all members of a population are in one of four states-Susceptible, Exposed, Infected, and Recovered. Infected (symptomatic) individuals can transmit the disease to susceptibles before they are "removed" (i.e., recover or die). The exposed compartment captures latency between infection and the emergence of symptoms. Depending on parameters, exposed individuals may be infectious, and can alternatively be called early stage infectious. Typically, the exposed have more contacts than the infectious because they are healthier and are often unaware that they are potentially contagious, while the infectious are ill and often self-quarantined. 43 SEIR models have been successfully applied to many diseases. Additional states are often introduced to capture more complex disease lifecycles, diagnostic categories, therapeutic protocols, population heterogeneity, assortative mixing, birth/recruitment of new susceptibles, loss of immunity, etc. (Anderson and May (1991) and Murray (2002) provide comprehensive discussion of the SEIR model and its extensions). In this study we maintain the boundary assumptions of the traditional SEIR model (four stages, fixed population, permanent immunity). The DE implementation of the model makes several additional assumptions, including perfect mixing and homogeneity of individuals within each compartment, exponential residence times in infected states, and mean field aggregation (the flows between compartments equal the expected value of the sum of the underlying probabilistic rates for individual agents).17 To derive the differential equations, consider the rate at which each infectious individual generates new cases: ci,*Prob(Contactwith Susceptible)*Prob(TransmissionlContact with Susceptible) (1) where the contact frequency cis is the expected number of contacts between infectious individual i and susceptible individual s per time period; homogeneity implies cis is a constant, denoted c1s, for all individuals i and s. If the population is well mixed, the probability of contacting a susceptible individual is simply the proportion of susceptibles in the total population, S/N. Denoting the probability of transmission given contact between individuals i and s, or infectivity, as iis(which, due to homogeneity, equals ils for all i, s), and summing over the infectious population yields the total flow of new cases generated by contacts between I and S populations, cis*ils*l*(S/N). The number of new cases generated by contacts between the exposed and susceptibles is formulated analogously, yielding the total Infection Rate,f f =(C,.*it *E + cs *is *I)*(S/N). (2) Assuming each compartment is well mixed and denoting the mean emergence (incubation) time and disease duration E and 6, respectively, yields: e = E/E and r = I/. Thefull model isthus: dS = -f, dt (3) dE = f-e, dt dl dR - =e- r, =r. dt dt (4) 17 Due to data limitations the parameters c, i, and N are often lumped together as P=c*i/N; [ is estimated from aggregate data such as reported incidence and prevalence. We distinguish these parameters since they are not necessarily equal across individuals in the AB implementation. 44 The perfect mixing assumption in eq (3) implies residence times in the E and I stages are distributed exponentially. Exponential residence times are not realistic for most diseases, where the probability of emergence and recovery is initially low, then rises, peaks and falls. Note, however, that any lag distribution can be captured through the use of partial differential equations, approximated in the ODE paradigm by adding additional compartments within the exposed and infectious categories (Jacquez and Simon 2002). Here we maintain the traditional SEIR single-compartment structure since the exponential distribution has greater variance than higher-order distributions, maximizing the difference between the DE and AB models. The AB model relaxes the perfect mixing and homogeneity assumptions. Each individual j is in one of the four states S[j], E[j], I[j], and R[j] forj E (1, , N). State transitions depend on the agent's state, agent-specific attributes such as contact frequencies, and interactions with other agents as specified by the contact network among them. The aggregate rates f, e, and r are the sum of the individual transitions flj], e[j], and r[j]. The technical appendix (2-1) details the formulation of the AB model and shows how the DE model can be derived from it by assuming homogeneous agents and applying the mean-field approximation. Table I- Base case parameters. The Technical appendix (2-1) documents the construction of each network used in the AB simulations. Parameter (dimensionless) ES Infectivity, Exposed Infectivity, Infectious is Basic reproduction rate R0 Average links per node k Prob of long-range links (SW) Psw Scaling exponent (scale-free) Y 0.05 0.06 4.125 10 0.05 2.60 Parameter Total population Contact rate, Exposed Contact rate, Infectious Average incubation time Average duration of illness N CES s s 6 200 4 1.25 15 15 Units Person 1/Day 1/Day Day Day _ The key parameter in epidemic models is the basic reproduction number, Ro, the expected number of new cases each contagious individual generates before recovering, assuming all others are susceptible. We use parameters roughly similar to smallpox (Halloran, Longini et al. 2002; Kaplan, Craft et al. 2002; Ferguson, Keeling et al. 2003).18Smallpox has a mid-range reproduction number, Ro 3 6 (Gani and Leach 2001), and therefore is a good choice to observe potential differences between DE and AB models: diseases with R0 < 1 die out quickly, while Smallpox-specific models often add a prodromal period between the incubation and infectious stages (e.g. Eubank et al. (2004)); for generality we maintain the SEIR structure. 18 45 those with Ro >> 1, e.g. chickenpox and measles, generate a severe epidemic in (nearly) any network topology. We set Ro 4 for the DE (Table 1). The AB models use the same values for the infectivities and expected residence times, and we set individual contact frequencies so that the mean contact rate in each network structure and heterogeneity condition is the same as the DE model. We use a small population, N = 200, all susceptible except for two randomly chosen exposed individuals. While small compared to settings of interest in policy design, e.g., cities, the effects of random events and network structure are likely to be more pronounced in small populations (Kaplan and Wein 2003), providing a stronger test for differences between the DE and AB models. A small population also reduces computation time, allowing more extensive sensitivity analysis. The DE model has 4 state variables, independent of population size; computation time is negligible for all population sizes. The AB model has 4*N states, and must also track interactions among the N individuals. Depending on the density of the contact network, the AB model can grow at rates up to O(N2). With a population of only 200 the AB model contains 5 to 50 thousand variables, depending on the network, plus associated parameters and random number calls. We report sensitivity analysis of population size below. The technical appendix (2-1) includes the models and full documentation. Experimental design: We vary both the network structure of contacts among individuals and the degree of agent heterogeneity in the AB model and compare the resulting dynamics to the DE version. We implement a full factorial design with five network structures and two heterogeneity conditions, yielding ten experimental conditions. In each condition we generate an ensemble of 1000 simulations of the AB model, each with different realizations of the random variables determining contacts, emergence, and recovery. Each simulation of the random, small world, and scale-free networks uses a different realization of the network structure (the connected and lattice networks are deterministic). Since the expected values of parameters in each simulation are identical to the DE model, differences in outcomes can only be due to differences in network topology, agent heterogeneity, or the discrete, stochastic treatment of individuals in the AB model. Network topology: The DE model assumes perfect mixing, implying anyone can meet anyone else with equal probability. Realistic networks are far more sparse, with most people contacting only a small fraction of the population. Further, most contacts are clustered among neighbors 46 with physical proximity.' 9 We explore five different network structures: fully connected (each person can meet everybody else), random (Erdos and Renyi 1960), small-world (Watts and Strogatz 1998), scale-free (Barabasi and Albert 1999), and lattice (where contact only occurs between neighbors on a ring). We parameterize the model so that all networks (other than the fully connected case) have the same mean number of links per node, k = 10 (Watts 1999). The fully connected network corresponds to the perfect mixing assumption of the DE model. The random network is similar except people are linked with equal probability to a subset of the population. To test the network most different from the perfect mixing case, we also model a one-dimensional ring lattice with no long-range contacts. With k = 10 each person contacts only the five neighbors on each side. The small world and scale-free networks are intermediate cases with many local and some long-distance links. These widely used networks characterize a number of real systems (Watts 1999; Barabasi 2002). We set the probability of long-range links in the small world networks to 0.05, in the range used by (Watts 1999). We build the scale-free networks using the preferential attachment algorithm (Barabasi and Albert 1999) in which the probability a new node links to existing nodes is proportional to the number of links each node already has. Preferential attachment yields a power law for the probability that a node has k links, Prob(k) = ak - . Empirically y typically falls between 2 and 3; the mean value of y in our experiments is 2.60. Scale free networks contain a few "hubs" with many links, with most others mainly connected to these hubs. The technical appendix (2-1) details the construction of each network. Agent Heterogeneity: Each individual has four relevant characteristics: expected contact rate, infectivity, emergence time, and disease duration. In the homogeneous condition (H=) each agent is identical with parameters set to the values of the DE model. In the heterogeneous condition (H-) we vary individual contact frequencies. Focusing on contact frequencies reduces the dimensionality of required sensitivity analysis at little cost. First, only the product of the contact rate and infectivity matters to R0o and the spread of the disease (see note 2). Second, the exponential distributions for c and 6 introduce significant variation in realized individual incubation and disease durations even when the expected value is equal for all individuals. 19 One measure of clustering is the mean clustering coefficient(l lN)Yi2ni l[k i (k i -1)] where n, is the total number of links among all nodes linked to i and ki is the number of links of the ith node (Watts and Strogatz 1998). 47 Heterogeneity in contacts is modeled as follows. Given that two people are linked (that they can come into contact), the frequency of contact between them depends on two factors. First, how often does each use their links, on average: some people are gregarious; others shy. Second, time constraints may limit contacts. At one extreme, the frequency of link use may be constant, so that people with more links have more total contacts per day, a reasonable approximation for some airborne diseases and easily communicated ideas: a professor may transmit an airborne virus or simple concept to many people with a single sneeze or comment, (roughly) independent of class size. At the other extreme, if the time available to contact people is fixed, the chance of using a link is inversely proportional to the number of links, a reasonable assumption when transmission requires extended personal contact: the professor can only tutor a limited number of people each day. We capture these effects by assigning individuals different propensities to use their links, lij],yielding the expected contact frequency for the link between individuals i and j, c[ij]: (5) c[ij]=Kc*[i]*X]/ (k[i]*kj]) where kUj]is the total number of links individual j has, T captures the time constraint on contacts, and K is a constant chosen to ensure that the expected contact frequency for all individuals equals the mean value used in the DE model. In the homogeneous condition T = and Xj] = 1 for all j so that in expectation all contact frequencies are equal, independent of how many links each individual has. In the heterogeneous condition, those with more links have more contacts per day ( = 0), and individuals have different propensities to use their links. We use a uniform distribution with a large range, 3[j] - U[0.25, 1.75]. Calibrating the DE Model: In real world applications the parameters determining the basic reproduction rate Ro are often poorly constrained by biological and clinical data. For emerging diseases such as vCJD, BSE and avian flu data are not available until the epidemic has already spread. Parameters are usually estimated by fitting models to aggregate data as an outbreak unfolds; SARS provides a typical example (Dye and Gay 2003; Lipsitch, Cohen et al. 2003; Riley, Fraser et al. 2003). Because Roalso depends on contact networks that are often poorly known, models of established diseases are commonly estimated the same way (e.g., (Gani and Leach 2001)). To mimic this protocol we treat the AB model as the "real world" and estimate the parameters of the SEIR DE to yield the best fit to the cumulative number of cases. We estimate 48 the infectivities (iESand is), and incubation time () by nonlinear least squares in 200 AB simulations randomly selected from the full set of 1000 in each network and heterogeneity condition, a total of 2000 calibrations (see the technical appendix (2-1)). Results assess whether a calibrated DE model can capture the stochastic behavior of individuals in realistic settings with different contact networks. Results: For each of the ten experimental conditions we compare the simulated patterns of diffusion on four measures relevant to public health. The maximum symptomatic infectious population (peak prevalence, Imax)indicates the peak load on public health infrastructure including health workers, immunization resources, hospitals and quarantine facilities. The time from initial exposure to the maximum of the infected population (the peak time, Tp) measures how quickly the epidemic spreads and therefore how long officials have to deploy those resources. The fraction of the population ultimately infected (thefinal size, F) measures the total burden of morbidity and mortality. Finally, we calculate the fraction of the epidemic duration in which the infectious population in the DE model falls within the envelope encompassing 95% of the AB simulations (the envelopment fraction, E9 5). The envelopment fraction indicates what fraction of the time the trajectory of the DE falls significantly outside the range of outcomes caused by stochastic variations in the behavior of individual agents. 20 To illustrate, figure I compares the base case DE model with a typical simulation of the AB model (in the heterogeneous scale-free case). The sample scale-free epidemic grows slower than the DE (Tp = 60 vs. 48 days), has similar peak prevalence (Imax= 25% vs. 27%), and ultimately afflicts fewer people (F = 89% vs. 98%). Figure 2 shows the symptomatic infectious population (I) in 1000 AB simulations for each network and heterogeneity condition. Also shown are the mean of the ensemble and the trajectory of the DE model with base case parameters. Table 2 reports the results for the fitted DE models; Tables 3-5 report T, Imax,F and E9 s for each condition, and compare them to the base and fitted DE models. Except for the lattice, the AB dynamics are qualitatively similar to the DE model. Initial diffusion is driven by the positive contagion feedbacks as exposed and infectious populations spread the infection, further increasing the prevalence of infectious individuals. The epidemic peaks when the susceptible population is sufficiently depleted that the 20 We calculate E9 s for 125 days, when cumulative cases in the base DE model settles within 2% of its final value. Longer periods further reduce the discrepancy between the two models. 49 (mean) number of new cases generated by contagious individuals is less than the rate at which they are removed from the contagious population. Figure 1. Left: DE model with base case parameters (Table 1), showing the prevalence fraction for each disease stage. Right: Typical simulation of the equivalent AB model (the heterogeneous condition of the scale free network). 0 0c 0 .- 0 30 60 Days 90 120 150 0 30 60 Days 90 120 150 Table 2. Median estimated basic reproduction number, Ro = CES*iES*E + ClS*ilS*6,for the DE model fit to the recovered population in 200 randomly selected runs of the AB model for each cell of the experimental design. Parameter Connected R Small World H, H. H* H. 2.96 3.09 2.55 2.90 2.43 1.71 0.62 0.58 0.52 0.67 0.61 Median 0.999 0.999 0.999 0.999 0.999 a 0.025 0.049 0.017 0.050 0.042 Median CES*iEs*E+CIs*i lSa z Scale-free 4.19 H= Implied Ro= Random H, H= Lattice H* H= 3.33 2.52 1.56 1.35 0.88 0.66 0.81 0.55 0.999 0.998 0.998 0.985 0.987 0.071 0.040 0.059 0.056 0.043 H, Departures from the DE model, while small, increase from the connected to the random, scale free, small world, and lattice structures (Figure 2; tables 3-5). The degree of clustering explains these variations. In the fully connected and random networks the chance of contacts in distal regions is the same as for neighbors. The positive contagion feedback is strongest in the connected network because an infectious individual can contact everyone else, minimizing local contact overlap. In contrast, the lattice has maximal clustering. When contacts are localized in a small region of the network, an infectious individual repeatedly contacts the same neighbors. As these people become infected the chance of contacting a susceptible and generating a new case declines, slowing diffusion, even if the total susceptible population remains high. 50 In the deterministic DE model there is always an epidemic if Ro > 1. Due to the stochastic nature of interactions in the AB model, it is possible that no epidemic occurs or that it ends early as the few initially contagious individuals recover before generating new cases. As a measure of early burnout, table 3 reports the fraction of cases where cumulative cases remain below 10%.21 Early burnout ranges from 1.8% in the homogeneous connected case to 5.9% in the heterogeneous lattice. Heterogeneity raises the incidence of early burnout in each network since there is a higher chance that the first patients will have low contact rates and will recover before spreading the disease. Network structure also affects early burnout. Greater contact clustering increases the probability that the epidemic burns out in a local patch of the network before it can jump to other regions, slowing diffusion and increasing the probability of global quenching. The impact of heterogeneity is generally small, despite a factor of seven difference between the lowest and highest contact frequencies. Heterogeneity yields a slightly smaller final size, F, in all conditions: the mean reduction over all ten conditions is 0.09, compared to a mean standard deviation in F across all conditions of 0.19 the expected difference in F is small relative to the expected variation in F from differences due to stochastic events during the epidemic. Similarly, heterogeneity slightly reduces Tp in all conditions (by a mean of 4.5 days, compared to an average standard deviation of 25 days). Maximum prevalence (Imax)also falls in all conditions (by 1.5%, compared to a mean standard deviation of 5%). In heterogeneous cases the more active individuals tend to contract the disease sooner, causing a faster take-off compared to the homogeneous case (hence earlier peak times). These more active individuals also recover sooner, reducing the mean contact frequency and hence the reproduction rate among those who remain faster than the homogeneous case. Subsequent diffusion is slower, peak prevalence is smaller, and the epidemic ends sooner, yielding fewer cumulative cases. However, these effects are small relative to the variation from simulation to simulation within each network and heterogeneity condition caused by the underlying stochastic processes for contact, emergence and recovery. Figure 2 (next page): Comparison of the DE SEIR model to the AB model. Graphs show symptomatic cases as % of population (I/N). Each panel shows 1000 simulations of the AB model for each network and heterogeneity condition. The thin black line is the base case DE model with parameters as in Table 1. The thick red lines show the mean of the AB simulations. Also shown are the envelopes encompassing 50%, 75% 21 Except for the lattice, the results are not sensitive to the 10% cutoff. The technical appendix (2-1) shows the histogram of final size for each network and heterogeneity condition. 51 and 95% of the AB simulations. _ Homogeneous 40 I Heterogeneous ____ i -_ - __ __7 __ - i I I 30 a) a) 0 0 20 I 10 i 0 AA "u ii i 30 iI E O rc i 20 i i i 10 iI i[ 0 AllI -·v I i I I 30 U2 20 U U Cu, T 10 -t - - - - - I- _ 0 Al I' X i , 30 /x i U I % , 20 E co I"' I , : a. \ i~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I~~~ i i I 10 i I i 0 i 1~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I , ... .!.: . i . ..... .. . I , I 20 C --1 I d 30 U | ! f 4U ix | - I r iI I. 10 i _ t .. Wm- 0 =|- I ............ 75 0 150 Time (Day) 225 30( 0 w 52 75 150 Time (Day) 225 30( Table 3. Mean and standard deviation of Final Size (F) for the AB simulations. */** indicates the base DE value of F (0.98) falls outside the 95/99% confidence bound defined by the ensemble of AB simulations. These distributions are typically nonnormal (e.g., F is bounded by 100%). Assuming normality would further reduce the significance of the differences between the AB and DE models. (2) The percentage of AB simulations with F < 0.10 indicates the Drevalence of early burnout. (3) Mean and standard deviation of F for the fitted DE models. . _ _ _ __ _ _ r - _ _, _ _ _ __ _ _ / Scale-free Random Connected H. H. H* H. H, 1 AB Mean 0.97 0.90* (a) 0.13) (0.19) 0.92* (0.15) 0.86** (0.17) 2 % F < 0.10 3 Fitted DE p 1.8 0.98 (0.07) 4.4 0.91 (0.17) 2.7 0.92 (0.15) 3.8 0.86 (0.18) (a) Small World H. H* 0.90** 0.84** (0.18) 3.9 (0.20) 5.6 0.90 (0.18) 0.82 (0.22) Lattice H. 0.65 H* 0.51** (0.29) (0.26) H* 0.92 (0.17) 0.83** (0.21) 2.6 0.92 (0.17) 4.8 0.83 (0.22) 3.1 5.9 0.62 (0.28) 0.50 (0.26) Table 4. Mean and standard deviation of peak time Tp and peak prevalence Imaxin 1000 simulations of the AB model for each experimental condition. In the base DE model,. */** indicates the values of in the base DE model (Tp = 48 days and Imax= 27.1%) fall outside the 95/99% confidence bounds defined by the ensemble of AB simulations. Also reported are the mean and standard deviation of Tp and Ima,,for the fitted DE models. Connected E | p a [1 |* I > C '~ = . ] Scale-free Small World H. Ho H. Random H* H. H* H. AB Mean AB a Fitted DE p Fitted DE 49.8 10.8 51.3 9 44.9 12.4 49.6 21.3 52.8 13.4 56.6 21.3 49.5 14.2 58 37.7 45.2 13.1 61.2 56.5 42.7 13.3 49.1 29 86.5 31.5 AB Mean 29.1 27.1 26.5 25.1 26.8 AB a Fitted DE I 4.9 26.9 6.3 26.7 5.2 24.6 5.6 24.2 Fitted DE a 2.8 13.2 9.6 18.1 Lattice Ho H. H 83.6 36.4 75.2 50.9 90.5 79.7 82.1 83.2 25.2 34.3 84.4 57 102.8 77.8 25.3 18.1* 16.5* 9.3** 8.5** 5.9 33.6 6.6 23.4 4.8 5.3 3.4 3.4 17 14.9 11 7.8 42.9 10.8 4.4 5 19.9 9.9 Table 5. Envelopment fraction E9 5. The fraction of the first 125 days (the time required for cumulative cases to settle within 2% of its final value in the base DE) in which the infectious population falls within the 95% confidence interval created by the ensemble of AB simulations in each network and heterogeneity scenario. Reported for both the base and fitted DE cases. I Connected H. H* Random H. I Ho I Scale-free H. 53 Ho I Small World H. H I Lattice H. I H Base DE Fitted DE (a) 1.00 0.99 (0.07) 1.00 0.98 (0.09) 1.00 0.99 (0.04) 1.00 0.97 (0.05) 1.00 0.95 (0.15) 1.00 0.96 (0.10) 0.71 0.98 (0.05) 0.70 0.99 (0.05) 0.55 0.91 (0.16) 0.54 0.94 (0.11) Consider now the differences between the DE and AB cases by network type. Fully Connected: The fully connected network corresponds closely to the perfect mixing assumption of the DE. As expected, the base DE model closely tracks the mean of the AB simulations, never falling outside the envelope capturing 95% of the AB realizations (E 95 = 100% for both H= and HO).The differences in peak time and peak prevalence between the base DE and AB model are small and insignificant in both H= and Ho conditions. Final size is not significantly different in the H= condition. In the Hi condition F is significantly smaller (p < 0.05), but the difference, 8%, is small relative to the 19% standard deviation in F within the AB ensemble 2 2 . Random: Results for the random network are similar to the connected case. The trajectory of the base DE model never falls outside the 95% confidence interval defined by the AB simulations. Differences in Tp and maxare not significant. The difference in cumulative cases is statistically significant for both heterogeneity conditions, but small relative to the variation in F within each condition. The incidence of early burnout rises somewhat because the random network is sparse relative to the fully connected case, increasing contact overlap. Scale-Free: Surprisingly, diffusion in the scale free network is also quite similar to the base DE, even though the scale free network departs significantly from perfect mixing. The envelopment fraction E95 is 100% for both H= and H#. Peak prevalence and peak time are not significantly different from the base DE values. Though most nodes have few links, once the infection reaches a hub it quickly spreads throughout the population. However, as the hubs are removed from the infectious pool, the remaining infectious nodes have lower average contact rates, causing the epidemic to burn out at lower levels of diffusion. Yet while final size is significantly lower than The reported significance levels are based on the actual distribution of 1000 AB simulations in each experimental condition. These distributions are typically nonnormal (e.g., F is bounded by 100%). Hence some differences in key metrics are significant even though they are small relative to the standard deviation for the AB simulations. The conventional assumption of normality would further reduce the significance of the differences between the AB and DE models. 22 54 the DE in both heterogeneity conditions, the differences are small relative to the standard deviation in F caused by random variations in agent behavior within the ensemble of AB simulations. Small World: Small world networks are highly clustered-most contacts are with immediate neighbors, and they lack highly connected hubs. Consequently diffusion is slower compared to the DE and the connected, random, and scale-free networks. Slower diffusion relative to the DE causes the envelopment fraction to fall to about 70%, and peak prevalence is significantly lower than the DE. Nevertheless, the few long-range links are sufficient to seed the epidemic throughout the population. Thus in the H= condition, cumulative cases and peak time are not significantly different from the values of the base DE model. The main impact of heterogeneity is greater dispersion and a reduction in final size. Lattice: In the one-dimensional ring lattice individuals can only contact their k immediate neighbors. Hence diffusion is far slower than in the DE (E95 is only about 55%). The epidemic advances roughly linearly in a well-defined wave-front of new cases trailed by symptomatic and then recovered individuals. Such waves are observed in the spread of plant pathogens, where transmission is mostly local, though in two dimensions more complex patterns are common (Bjornstad, Peltonen et al. 2002; Murray 2002). Because the epidemic wave front reaches a stochastic steady state in which recovery balances new cases, the probability of burnout is roughly constant over time. In the other networks early burnout quickly becomes rare as longrange links seed the epidemic in multiple locations (the technical appendix (2-1) shows distributions of final size). Final size is much lower in the lattice than the base DE, though interestingly, the variance is higher as well, so that in the H= condition the differences in F and Tp are not significant (Imaxis significantly lower in both heterogeneity conditions). Calibrated DE Model: In practice key parameters such as RO and incubation times are poorly constrained and are estimated by fitting models to aggregate data. Table 2 summarizes the results of fitting the DE model to 200 randomly selected AB simulations in each experimental condition. The differences between the AB and best-fit DE models are very small. The median R2 for the fit to the recovered population exceeds 0.985 in all scenarios. The infectious population nearly always remains inside the envelope containing 95% of the AB simulations (Table 5). Across all ten experimental conditions the values of F, Tp, and Ima,,in the fitted 55 models are not statistically different from the means of the AB simulations in 94.8% of the cases (range 89.0% -- 99.5%). The DE model fits extremely well even though it is clearly misspecified in all but the homogeneous fully connected network. As the network becomes increasingly clustered and diffusion slows, the estimated parameters adjust accordingly, yielding a trajectory nearly indistinguishable from the mean of the AB simulations. The estimated parameters, however, do diverge from the mean of those characterizing agents in the underlying model. As expected from the relationship between Ro and final size,23 for more heterogeneous, and more clustered, networks, where final size is smaller, the estimated Ro is also smaller. For example, the fitted parameters for small-world networks show longer incubation times () and lower infectivity of exposed (ES) compared to the base DE values, yielding slower diffusion and smaller Ro and final size than the base DE values. Sensitivity to Population Size: To examine the robustness of the results to population size we repeated the analysis for N = 50 and 800.24 The results change little over a factor of 16. For most network types (Random, Small World, Lattice), the final fraction of the population infected is slightly larger (and therefore closer to the value of the DE) in larger populations (similar in other two networks), and the rate of early burnout falls. Population size has little impact on other key metrics due to two offsetting effects. As expected, the impact of discrete individuals and stochastic state transitions is greater in small populations, increasing the variability in the realizations of the AB model. But greater variability reduces the ability to distinguish between the mean of the AB results and the DE. Details are reported in the technical appendix (2-1). Discussion: Agent-based simulation adds an important new method to the modeling toolkit. Yet computational and cognitive limits require all modelers to make tradeoffs. Agent-based models allow detailed representation of individual attributes and networks of relationships, but increase computational requirements. Traditional deterministic compartment models are computationally efficient, but assume perfect mixing and homogeneity within compartments. By explicitly contrasting agent-based and deterministic differential equation models of epidemic diffusion we test the importance of relaxing the perfect mixing and agent homogeneity Ro = -In(l - F)/F for SEIR models (Anderson and May (1991). Mean links per node (k) must be scaled along with population. We use the scaling scheme that preserves the average distance between nodes in the random network (Newman 2003), yielding k=6 and k= 18 for N=50 and N=800. 23 24 56 assumptions. With the exception of the ring lattice, where there are no long-range contacts, there are few statistically significant differences in key public health metrics between the mean of the AB trajectories and the deterministic DE model. Mean peak times, determining how long public health officials have to respond to an epidemic, are not significantly different from the base case DE in any network or heterogeneity condition. Similarly, mean peak prevalence, determining the maximum load on health infrastructure, does not differ significantly from the base DE for the connected, random, and scale free networks. The difference in final epidemic size between the mean of the AB simulations and the base DE model is statistically significant in 7 of the 10 experimental conditions, but the magnitude of the difference is less than one standard deviation in all but the lattice. (While a pure lattice is unrealistic in modeling human diseases due to the high mobility of modern society, it may be appropriate in other contexts such as the spread of diseases among immobile plant populations.) The AB representation does offer insight unavailable in the DE: the epidemic fizzles out in a small fraction of cases even though the underlying parameters yield an expected value for the basic reproduction rate greater than one. The more highly clustered the network, the greater the incidence of early burnout. The deterministic DE model cannot generate such behavior. Overall, however, differences between the DE and the mean trajectory of the AB simulations are small relative to the variation in realizations of the AB model with a given topology and heterogeneity characteristics. While clustering increases the departure of the mean trajectory of the AB model from the DE, it also increases the variability of outcomes. Intuitively, the location of infectious individuals makes no difference in the connected network homogeneous case, but matters more in highly clustered networks with greater heterogeneity. The increase in variability with clustering and heterogeneity means that the ability to identify network topology from a given trajectory does not grow as fast as the differences in means might suggest. The differences between the DE and mean of the AB simulations are also small relative to the uncertainty in key parameters. For example, Rofor smallpox is estimated to be in the range 3 - 6 (Gani and Leach 2001). Varying Ro over that range in the DE (by scaling both contact frequencies, cEs and cs, proportionately, while holding all other parameters constant) causes F to vary from 94.1% to 99.7%, peak prevalence to vary from 21.7% to 31.8%, and peak time to vary from 36 to 64 days. Variations in epidemic diffusion due to parameter uncertainty 57 are comparable to the differences caused by relaxing the perfect mixing and homogeneity assumptions. Compared to the homogeneous case, heterogeneity in transmission rates causes slightly earlier peak times as high-contact individuals rapidly seed the epidemic, followed by lower diffusion levels as the high-contact individuals are removed, leaving those with lower average transmission probability and a smaller reproduction rate. These results are consistent with the analysis of heterogeneity for SI and SIS models (Veliov 2005). Such dynamics were also observed in the HIV epidemic, where initial diffusion was rapid in subpopulations with high contact rates. Such heterogeneity is often captured in DE models by adding additional compartments to distinguish the highly connected and hermit types. Since parameters characterizing emerging (and some established) diseases are poorly constrained, epidemiologists typically fit models to aggregate data. We tested the impact of this protocol by treating the realizations of the AB model as the "real world" and fitting the DE model to them. Surprisingly, the deterministic DE model replicates the behavior of the stochastic agent-based simulations very well, even when contact networks are highly clustered and agents highly heterogeneous. Calibration results also highlight an important methodological issue. The parameter values obtained by fitting the aggregate model to the data from an AB simulation (and therefore from the real world) do not necessarily equal the mean of the individual-level parameters governing the micro-level interactions among agents. Aggregate parameter estimates not only capture the mean of individual attributes such as contact rates but also the impact of network structure and agent heterogeneity. Modelers often use both micro and aggregate data to parameterize both AB and DE models; the results suggest caution must be exercised in doing so, and in comparing parameter values in different models (Fahse, Wissel et al. 1998). The results inform phenomena beyond epidemics. Analogous processes of social contagion (imitation, word of mouth, etc.) play important roles in many social and economic phenomena, from innovation diffusion to crowd behavior to marketing (Strang and Soule 1998; Rogers 2003). Moreover, modelers tackling policy issues related to innovation and product diffusion face tradeoffs on choice of modeling assumptions similar to those studying epidemics (Gibbons 2004'). Therefore, the results may provide insight into the costs and benefits of agentlevel disaggregation in modeling of social contagion. How, then, should modelers choose between the AB and DE frameworks? The model 58 purpose, time, and resources available condition the choice. If the network structure is known, stable, and unaffected by the diffusion dynamics themselves, one can examine the effect of creating and removing nodes and links to simulate random failures or targeted attacks. Such questions cannot be answered with aggregate models. Data availability is a major concern. Data on contact network and distribution of agent attributes are often hard to obtain and highly uncertain, requiring extensive sensitivity analysis to ensure robust results. Disaggregating the model into AB representation without data to support the formulation of network structure and agent heterogeneity increases the computational cost and dimensionality of sensitivity analysis but for many cases adds little value. The purpose of the model also determines the risks of different methods. The costs of avoidable morbidity and mortality due to inadequate public health response are high. Some researchers therefore suggest it is prudent to use the well-mixed DE model, where diffusion is fastest, as a worst case limit (Kaplan and Lee 1990; Kaplan 1991). The same argument may suggest greater attention to network structure in studies of innovation diffusion and new product launch where policymakers seek to promote, rather than suppress, diffusion. Model complexity can be expanded in different directions. Modelers can add detail, disaggregating populations by attributes such as location, decision rules, and relationship networks. Alternatively they can expand the model boundary to include feedbacks with other subsystems. For example, the results reported here assume fixed parameters including network structure, contact rates and infectivities. All are actually endogenous. As prevalence increases, people change their behavior. Self quarantine and the use of safe practices disrupts contact networks, cuts contact frequencies, and reduces the probability of transmission given a contact. From staying home, increased hand washing, and use of masks (for SARS) to abstinence, condom use, and needle exchange (for HIV), endogenous behavior change can have large effects on the dynamics (Blower, Gershengorn et al. 2000). In some cases behavior change tends to be stabilizing, for example, as self-quarantine and use of safe practices reduce Ro. In others behavior change may amplify the epidemic, for example as people fleeing a bioterror attack seed outbreaks in remote areas and make contact tracing more difficult, or when development of more effective drugs increase risky behavior among AIDS patients (Lightfoot, Swendeman et al. 2005). Such feedback effects may swamp the impact of network structure, heterogeneity, and random events and should not be excluded in favor of greater detail. 59 Before concluding, we consider limitations and extensions. The experiments considered the classic SEIR model. Further work should consider the robustness of results to common elaborations such as loss of immunity, other distributions for emergence and recovery, recruitment of new susceptibles, non-human disease reservoirs and vectors, etc. Note, however, that by using the classic SEIR model, with only four compartments, we maximize the difference between the aggregation assumptions of the DE and AB representations. In practical applications, however, DE models often disaggregate the population more finely to account for heterogeneity arising from, e.g., sex, age, behavior, location, contact frequencies, assortative mixing and other attributes that vary among different population segments and cause clustering in contact networks. Such disaggregation further reduces the differences between the DE and AB representations. A major challenge, however, is choosing the number and definitions of additional compartments optimally. The results suggest AB simulations may be used effectively to design compartment models that capture important heterogeneity and network effects using the fewest additional compartments-if the network structure is reasonably well known and stable. While we examined a wide range of network structures and heterogeneity in agent attributes, the agent models contain many parameters that could be subject to sensitivity analysis, including the mean number of links per node, the probability of long range links (in the small world network), and the scaling exponent (in the scale-free case). Other dimensions of agent heterogeneity and other networks could be examined, including networks derived from field study (Ahuja and Carley 1999). The boundary could be expanded to include endogenously the many effects that alter parameters such as contact rates and network structure as an epidemic progresses, e.g. through panic or policies such as quarantine and immunization. Finally, while diffusion is an important process in physical and social systems, our study does not address the differences between AB and DE models in other contexts. Conclusion: The declining cost of computation has led to an explosion of agent-based simulations, offering modelers the ability to capture individual differences in attributes, behavior, and networks of relationships. Yet no matter how powerful computers become, limited time, budget, cognitive capabilities, and client attention mean modelers must always choose whether to disaggregate to capture the attributes of individual agents, expand the model boundary to capture additional feedback processes, or keep the model simple so that it can be analyzed thoroughly. 60 When should agent-based models be used, and when should traditional models be used? We compared agent-based and differential equation models in the context of contagious disease. From SARS to HIV to the possibility of bioterrorism, understanding the diffusion of infectious diseases is critical to public health and security. Further, analogous processes of social contagion play important roles in many social and economic phenomena, from innovation diffusion to crowd behavior. We first compared the classic SEIR model to equivalent agent-based models with identical (mean) parameters. We tested network structures spanning a wide range of clustering, including fully connected, random, scale-free, small world, and lattice, each tested with homogeneous and heterogeneous agents. Surprisingly, the DE model often captures the dynamics well, even in networks that diverge significantly from perfect mixing and homogeneity. The presence of even a few long-range links or highly connected hubs can seed an epidemic throughout the population, leading to diffusion patterns that, in most cases, are not significantly different from the dynamics of the differential equation model. Second, in practical applications, parameters governing diffusion are poorly constrained by independent data, for both emerging and established diseases (and for new products). Hence model parameters are typically estimated from data on incidence and prevalence as diffusion occurs. We assess this practice by treating individual realizations of the agent-based simulations as the 'real world' and estimating key parameters of the DE model from these 'data'. The calibrated DE model captures the dynamics extremely well in all network and heterogeneity conditions tested. Overall, the differences between the agent-based and differential equation models caused by heterogeneity and network structure are small relative to the range of outcomes caused by stochastic variations in the behavior of individual agents, and relative to the uncertainty in parameters. The results suggest extensive disaggregation may not be warranted unless detailed data characterizing network structure are available, that structure is stable, and the computational burden does not prevent sensitivity analysis or inclusion of other key feedbacks that may condition the dynamics. 61 References: Ahuja, M. K. and K. M. Carley. 1999. Network structure in virtual organizations. Organization Science 10(6): 741-757. Anderson, R. M. and R. M. May. 1991. Infectious diseases of humans : Dynamics and control. Oxford; New York, Oxford University Press. Axelrod, R. 1997. The dissemination of culture - a model with local convergence and global polarization.Journal of ConflictResolution41(2): 203-226. Axtell, R. L., J. Axelrod, J. M. Epstein and M. Cohen. 1996. Aligning simulation models: A case study and results. Computational and Mathematical Organization Theory 1(2): 123-141. Axtell, R. L., J. M. Epstein, J. S. Dean, G. J. Gumerman, A. C. Swedlund, J. Harburger, S. Chakravarty, R. Hammond, J. Parker and M. Parker. 2002. Population growth and collapse in a multiagent model of the kayenta anasazi in long house valley. Proceedings of the NationalAcademy of Sciences of the UnitedStates ofAmerica 99: 7275-7279. Ball, F., D. Mollison and G. Scalia-Tomba. 1997. Epidemics with two levels of mixing. The Annals of Applied Probability7(1): 46-89. Barabasi, A. L. 2002. Linked: How everything is connected to everything else and what it means. Cambridge, Massachusetts, Perseus. Barabasi, A. L. and R. Albert. 1999. Emergence of scaling in random networks. Science 286(5439): 509-512. Bass, F. 1969. A new product growth model for consumer durables. Management Science 15: 215-227. Bjornstad, O. N., M. Peltonen, A. M. Liebhold and W. Baltensweiler. 2002. Waves of larch budmoth outbreaks in the european alps. Science 298(5595): 1020-1023. Black, F. and M. Scholes. 1973. Pricing of options and corporate liabilities. Journal of Political Economy 81(3): 637-654. Blower, S. M., H. B. Gershengorn and R. M. Grant. 2000. A tale of two futures: Hiv and antiretroviral therapy in san francisco. Science 287(5453): 650-654. Carley, K. 1992. Organizational learning and personnel turnover. Organization Science 3(1): 2046. Chen, L. C., B. Kaminsky, T. Tummino, K. M. Carley, E. Casman, D. Fridsma and A. Yahja. 2004. Aligning simulation models of smallpox outbreaks. Intelligence and Security Informatics, Proceedings 3073: 1-16. Davis, G. 1991. Agents without principles? The spread of the poison pill through the intercorporate network. Administrative Science Quarterly 36: 583-613. Dye, C. and N. Gay. 2003. Modeling the sars epidemic. Science 300(5627): 1884-1885. Edwards, M., S. Huet, F. Goreaud and G. Deffuant. 2003. Comparing an individual-based model of behavior diffusion with its mean field aggregate approximation. Journal ofArtificial Societiesand Social Simulation6(4). Epstein, J. M. 2002. Modeling civil violence: An agent-based computational approach. Proceedingsof the NationalAcademy of Sciences of the UnitedStates ofAmerica 99: 7243-7250. Epstein, J. M., D. A. T. Cummings, S. Chakravarty, R. M. Singa and D. S. Burke. 2004. Toward a containmentstrategyfor smallpox bioterror:An individual-basedcomputational approach, Brookings Institution Press. 62 Erdos, P. and A. Renyi. 1960. On the evolution of random graphs. Publications of the MathematicalInstitute of the HungarianAcademyof Sciences 5: 17-61. Eubank, S., H. Guclu, V. S. A. Kumar, M. V. Marathe, A. Srinivasan, Z. Toroczkai and N. Wang. 2004. Modelling disease outbreaks in realistic urban social networks. Nature 429(13 May 2004): 180-184. Fahse, L., C. Wissel and V. Grimm. 1998. Reconciling classical and individual-based approaches in theoretical population ecology: A protocol for extracting population parameters from individual-based models. The American Naturalist 152(6): 838-852. Ferguson, N. M., M. J. Keeling, J. W. Edmunds, R. Gani, B. T. Grenfell, R. M. Anderson and S. Leach. 2003. Planning for smallpox outbreaks. Nature 425: 681-685. Gani, R. and S. Leach. 2001. Transmission potential of smallpox in contemporary populations. Nature 414(6865): 748-751. Gibbons, D. E. 2004. Network structure and innovation ambiguity effects on diffusion in dynamic organizational fields. Academy of Management Journal 47(6): 938-951. Halloran, E. M., I. M. Longini, N. Azhar and Y. Yang. 2002. Containing bioterrorist smallpox. science 298: 1428-1432. Halloran, M. E. and I. M. Longini. 2003. Smallpox bioterror response - response. Science 300(5625): 1503-1504. Hethcote, H. W. 2000. The mathematics of infectious diseases. Siam Review 42(4): 599-653. Jacquez, G. and P. O'Neill. 1991. Reproduction numbers and thresholds in stochastic epidemic models. I. Homogeneous populations. Mathematical Biosciences 107: 161-186. Jacquez, J. A. and C. P. Simon. 1993. The stochastic si model with recruitment and deaths .1. Comparison with the closed sis model. Mathematical Biosciences 117(1-2): 77-125. Jacquez, J. A. and C. P. Simon. 2002. Qualitative theory of compartmental systems with lags. Mathematical Biosciences 180: 329-362. Kaplan, E. H. 1991. Mean-max bounds for worst-case endemic mixing models. Mathematical Biosciences 105(1): 97-109. Kaplan, E. H., D. L. Craft and L. M. Wein. 2002. Emergency response to a smallpox attack: The case for mass vaccination. Proceedings of the National Academy of Sciences 99(16): 10935-10940. Kaplan, E. H. and Y. S. Lee. 1990. How bad can it get? Bounding worst case endemic heterogeneous mixing models of hiv/aids. Mathematical Biosciences 99: 157-180. Kaplan, E. H. and L. M. Wein. 2003. Smallpox bioterror response. science 300: 1503. Keeling, M. J. 1999. The effects of local spatial structure on epidemiological invasions. Proceedings of the Royal Society of London Series B-Biological Sciences 266(142 1): 859-867. Koopman, J. S. 2002. Contolling smallpox. science 298: 1342-1344. Koopman, J. S., S. E. Chick, C. P. Simon, C. S. Riolo and G. Jacquez. 2002. Stochastic effects on endemic infection levels of disseminating versus local contacts. Mathematical Biosciences 180: 49-71. Koopman, J. S., G. Jacquez and S. E. Chick. 2001. New data and tools for integrating discrete and continuous population modeling strategies. Population health and aging. 954: 268294. Levinthal, D. and J. G. March. 1981. A model of adaptive organizational search. Journal of EconomicBehavior and Organization2: 307-333. 63 Lightfoot, M., D. Swendeman, M. J. Rotheram-Borus, W. S. Comulada and R. Weiss. 2005. Risk behaviors of youth living with hiv: Pre- and post-haart. American Journal of Health Behavior 29(2): 162-171. Lipsitch, M., T. Cohen, B. Cooper, J. M. Robins, S. Ma, L. James, G. Gopalakrishna, S. K. Chew, C. C. Tan, M. H. Samore, D. Fisman and M. Murray. 2003. Transmission dynamics and control of severe acute respiratory syndrome. Science 300(5627): 19661970. Lomi, A. and E. R. Larsen. 2001. Dynamics of organizations: Computational modeling and organization theories. Menlo Park, Calif., AAAI Press/MIT Press. Mahajan, V., E. Muller and Y. Wind. 2000. New-product diffusion models. Boston, Kluwer Academic. Murray, J. D. 2002. Mathematical biology. New York, Springer. Newman, M. E. J. 2002. Spread of epidemic disease on networks. Physical Review E 66(1). Newman, M. E. J. 2003. Random graphs as models of networks. Handbook of graphs and networks. H. G. Schuster. Berlin, Wiley-VCH: 35-68. Newman, M. E. J. 2003. The structure and function of complex networks. Siam Review 45(2): 167-256. Riley, S., C. Fraser, C. A. Donnelly, A. C. Ghani, L. J. Abu-Raddad, A. J. Hedley, G. M. Leung, L. M. Ho, T. H. Lam, T. Q. Thach, P. Chau, K. P. Chan, P. Y. Leung, T. Tsang, W. Ho, K. H. Lee, E. M. C. Lau, N. M. Ferguson and R. M. Anderson. 2003. Transmission dynamics of the etiological agent of sars in hong kong: Impact of public health interventions. Science 300(5627): 1961-1966. Rogers, E. M. 2003. Diffusion of innovations. New York, Free Press. Samuelson, P. 1939. Interactions between the multiplier analysis and the principle of acceleration. Review of Economic Statistics 21: 75-79. Schelling, T. C. 1978. Micromotives and macrobehavior. New York, Norton. Strang, D. and S. A. Soule. 1998. Diffusion in organizations and social movements: From hybrid corn to poison pills. Annual Review of Sociology 24: 265-290. Tesfatsion, L. 2002. Economic agents and markets as emergent phenomena. Proceedings of the NationalAcademy of Sciences of the UnitedStates ofAmerica 99: 7191-7192. Urban, G. L., J. R. Houser and H. Roberts. 1990. Prelaunch forecasting of new automobiles. Management Science 36(4): 401-421. Veliov, V. M. 2005. On the effect of population heterogeneity on dynamics of epidemic diseases. Jouranl of MathematicalBiology Forthcoming. Watts, D. J. 1999. Small worlds. Princeton, Princeton University Press. Watts, D. J. 2004. The "new" science of networks. Annual Review of Sociology 30: 243-270. Watts, D. J. and S. H. Strogatz. 1998. Collective dynamics of "small-world" networks. Nature 393(4 June): 440-442. 64 Essay 3- Dynamics of Multiple-release Product Development Abstract Product development (PD) is a crucial capability for firms in competitive markets. Building on case studies of software development at a large firm, this essay explores the interaction among the different stages of the PD process, the underlying architecture of the product, and the products in the field. We introduce the concept of the "adaptation trap," where intendedly functional adaptation of workload can overwhelm the PD organization and force it into firefighting (Repenning 2001) as a result of the delay in seeing the additional resource need from the field and underlying code-base. Moreover, the study highlights the importance of architecture and underlying product-base in multiple-release product development, through their impact on the quality of new models under development, as well as through resource requirements for bug-fixing. Finally, this study corroborates the dynamics of tipping into firefighting that follows quality-productivity tradeoffs under pressure. Put together, these dynamics elucidate some of the reasons why PD capability is hard to build and why it easily erodes. Consequently, we offer hypotheses on the characteristics of the PD process that increase its strategic significance and discuss some practical challenges in the face of these dynamics. Introduction Firm performance and heterogeneity is a central topic of interest for researchers and practitioners alike. According to the resource-based view of strategy, it is important to look inside the firm for capabilities that distinguish it from its competitors (Wernerfelt 1984; Barney 1991; Peteraf 1993) and enable the firm to gain rents. A capability should be valuable, rare, inimitable, and un-substitutable (Barney 1991) so that it can contribute to sustained competitive advantage. Researchers have suggested several broad explanations for why some capabilities elude imitation and replication. Barney (1991) offers three main factors that result in inimitability: history dependence, causal ambiguity, and social complexity. Dierickx and Cool (1989) note that capabilities are stocks, and consequently discuss time compression diseconomies, asset mass efficiencies, interconnectedness of stocks, asset erosion time constants, and causal ambiguity as the main reasons why capabilities are hard to imitate. Finally, Amit and Schoemaker (1993) 65 suggest uncertainty, complexity, and organizational conflict as the main factors that create heterogeneity in managerial decision-making and firm performance. While these general explanations are important starting points, the resource-based literature has not provided a detailed and grounded understanding of the capability evolution that it suggests underlies the barriers to imitation (Williamson 1999). Opening the black-box of capability is required for recognizing a capability independent of performance measures it is supposed to explain, and for proposing testable hypotheses about the relative importance of different capabilities. Such understanding would not only strengthen the resource-based view theoretically; it is also needed to enhance the practical usefulness of the framework. A few theoretical perspectives elaborate on barriers to imitation by discussing the factors that hinder formation of successful strategies through adaptive processes. Leonard-Barton (1992) introduces the concept of core rigidity, where routines underlying a firm's capabilities resist change when the environment demands adoption of new capabilities -- for example, in the face of architectural innovations (Henderson and Clark 1990). The organizational learning literature highlights a similar barrier to learning, competency traps (Levitt and March 1988), where accumulated experience with old routines reduces their operating cost and makes the new, potentially superior, practices less rewarding. Finally, the complexity of imitation is elaborated on through theoretical models of adaptation on rugged fitness landscapes (Levinthal 1997; Rivkin 2001). This view of adaptation suggests that interdependency and interaction among the different elements of the firm strategy (Milgrom and Roberts 1990) create multiple local peaks in the landscape of fit between the organization's strategy and the environmental demands, making it hard for the firm to achieve new combinations of strategic fit through incremental adaptation. More recently, a few empirical studies have enriched the understanding of the nature of capabilities and their dynamics. Henderson and Cockburn (1994) discuss how component and architectural competence contribute to research and development productivity of pharmaceutical firms. An intra-firm evolutionary perspective has been developed (Burgelman 1991; Lovas and Ghoshal 2000) which views the evolution of strategies as processes of random variation and selection inside a firm, and several case studies have elaborated on this perspective. (See the special Summer 1996 issue of Strategic Management Journal on evolutionary perspectives on strategy.) Finally, a few studies explore the nature and dynamics of organizational capabilities in 66 different industries, including banking, semi-conductors, automobiles, communications, pharmaceuticals, and fast food (Dosi, Nelson et al. 2000). These studies highlight that the complexity of capabilities resides in their evolution through time and the path dependence in these dynamics. In this essay, drawing on two in-depth case studies of the software development process, we propose a grounded theory of why successful product development (PD) capability is hard to establish, and easily erodes when established. Using the case study method (Eisenhardt 1989), combined with dynamic simulation modeling (Forrester 1961; see Black, Carlile et al. 2004 for a detailed description of the method), we develop a fine-grained and internally consistent perspective on multiple-release PD capability: how it operates through time, and why it is hard to sustain. Through this theorybuilding practice we follow a tradition of using the micro-foundations of behavioral decision theory to explain a phenomenon of interest in strategy (Zajac and Bazerman 1991; Amit and Schoemaker 1993): operation and erosion of capabilities. Our results show 1) how the long-term consequences of quality and productivity adjustment under pressure can deteriorate product development capability, 2) how the adaptation trap makes it hard for organizations to sustain an efficient product development process, and 3) how continuity in product base and architecture reinforces these dynamics. Context of Study: Multiple-release PD in the Software Industry Product development is often highlighted as the prime example of a dynamic capability (Eisenhardt and Martin 2000; Winter 2003). By creating innovative products that fulfill unmet market needs, firms can create competitive advantage. Moreover, moving into new product markets through successful product introduction enables firms to change their strategic orientation; for example, Hewlett-Packard changed from an instruments company to a computer company through product development (Burgelman 1991). In addition, since dynamic markets demand continuous introduction of new products, companies in these markets cannot find a substitute for the product development capability. The practice of new product development has changed in many ways. While new products were once developed separately in independent releases, development of product families in multiple releases is gaining prominence in high-speed competitive markets 67 (Sanderson and Uzumeri 1997; Cusumano and Nobeoka 1998; Gawer and Cusumano 2002)25. Multiple-release product development entails building a central set of components around which new products in a product family are developed. Each new model of the product adds a few new features to the existing architecture and product base. This process has several benefits, since it allows the firm to get the first product to the market faster and with fewer resources; it provides for learning about market and internal processes from one model to another; it allows for other players in the market to build around the central platform, thereby creating positive externalities for a product line (Gawer and Cusumano 2002); and it enhances competitiveness by increasing control over the pacing of product introduction (Eisenhardt and Brown 1998). For example, Palm currently has three major product families for the PDA market: Zire, Tungsten, and Treo. Each product family has had different product releases/models shaped around a specific platform. For example, the Treo platform combines features of cell phone and classical PDA, and each new model has included new features, e.g., still camera, MP3 player, and video camera. Most software products have traditionally followed the principles of the multiple-release development process, even though using a different vocabulary. In the software industry each new release (=model2 6) of a product (= product family) builds on the past releases by adding new features to the current code base (= product base). This multiple release development strategy operates across most common software products we use, from operating systems (e.g., Mac OS, Windows, DOS) to different applications (e.g., MS Excel, Matlab, Stata). Compared to single-product development, multiple-release product development brings more continuity across releases of a product, since not only do different releases share design and development resources, but they also have similar pools of customers, the same architecture, and a similar product base. These connections among different models highlight the importance of an integrative view of product development which encompasses the dynamics across multiple releases of a product family. While there is a rich literature looking at product development (PD) in general (see Brown and Eisenhardt (1995) and Krishnan and Ulrich (2001) for reviews) 25 Multiple-release product development conceptually has overlap with platform-based PD in definition and its applications. We use the former term because the main focus of this study is the interaction between different releases of a project and not the development of a common platform. 26Throughout the text we use these terms interchangeably. We tend to use the software vocabulary when talking specifically about the two case studies, and use the general terms when discussing the implications for PD in general. 68 and the recent body of work on development of products based on a common platform is growing (e.g., Sanderson and Uzumeri 1995; Meyer, Tertzakian et al. 1997; Cusumano and Nobeoka 1998; Muffatto 1999; Krishnan and Gupta 2001), there has been little research on the dynamics of the multiple-release product development process. This gap is also observable in the literature on project dynamics. There is a rich set of studies discussing single-project dynamics (e.g., Cooper 1980; AbdelHamid and Madnick 1991; Ford and Sterman 1998). These studies have introduced the concept of the rework cycle and have discussed several different processes, including the effect of pressure on productivity and quality, morale, overtime, learning, hiring and experience, phase dependency, and coordination, which endogenously influence the performance of different projects. However, with the exception of Repenning's work (2000; 2001), little attention has been spent on the dynamics when consequent product models are developed by the same development organization. Repenning (2000; 2001) looks at multiple-project R&D systems and discusses tipping dynamics that follow resource allocation between the concept design and development phases of a PD process. He shows that there potentially exist two equilibria with low and high efficiency for the PD organization, and a tipping point (a state of the system at which the behavior changes significantly) between the two. These results, however, rely on the assumption that development gets higher priority and that concept design adds enough value through avoiding future errors in development to justify it on efficiency grounds. Moreover, this work fails to consider the quality considerations of the product development when the product is out of the PD organization. In this study we focus on the dynamics of multiple-release product development with an emphasis on the effects of products in the field and the architecture and product-base. As a result, we take into account the continuity between different releases of a product. Moreover, we focus our analysis on dynamics that make it hard to build and sustain the efficient PD capability that is crucial to the success of a firm in competitive markets. The software industry is well suited as the setting for this study. In the absence of production and distribution barriers, product development is the major part of the software business. On the other hand, the rapid pace of change in the software industry makes it a great example of a dynamic market. Therefore product development is a crucial dynamic capability for a software firm. To differentiate the firm's performance from that of its competitors, a software 69 firm needs a strong PD capability that can produce products with more features, faster, and more cheaply than the competitors. Moreover, despite the importance of PD capability in this industry, software development projects often cost more than budgeted, stretch beyond their deadlines, and fall short of incorporating all their desired features (for a meta-survey see Molokken and Jorgensen 2003). Cost and schedule overruns in software projects have persisted for over two decades, despite significant technological and process improvements, such as object-oriented programming, iterative and agile development, and product line engineering, among other things. The challenge of learning to successfully manage software development is therefore non-trivial. Yet this challenge is not insurmountable: Modest improvements in project measures (see Figure 1) and development productivity (Putnam 2001) are observable in a long time horizon. Furthermore, the size and complexity of software development projects are considered mediumrange (average size of about 100 person-months [Putnam 2001; Sauer and Cuthbertson 2004]), allowing for an in-depth look at all the components of the PD process. Finally, there is a rich literature on the software development process, including some of the pioneering work in applying simulation models to project dynamics (AbdelHamid and Madnick 1991). We can therefore build on this literature for the research at hand. 70 IT Project Performance -1n I1 UU'/o 90% 80% I-n_ ©!I -- J 70% lII I I I I ---- i I I I i-I I I I I 1I 60% 50% 40% 30% 20% 10% 0% I I 1994 1996 1998 2000 2002 Source: StandishGroup Surveys Figure 1- The performance of IT projects in the U.S. based on the bi-annual survey of the Standish Group. Data is based on 8380 projects, mostly software development. Succeeded includes projects that were on-time and on-budget. Failed group includes projects that were never finished, and challenged group includes the rest. The data is compiled from different sources on the Standish Group surveys, including their 1995 and 2001 surveys which are available at their website: http://www.standishgroup.com/index.php. Data Collection This study uses interviews and archival data from two product development teams in a single software organization (hereafter called "Sigma") to build a dynamic simulation model of multiple-release product development. Sigma is a large IT company with over 16,000 employees and several lines of products in hardware and software, which are distributed in multiple locations around the world. In recent years the company management has decided on a strategic shift towards the software side of the business and therefore strong emphasis is put on the success of a few promising software products. The study was supported by the research organization of the company, which was interested in learning about the factors important to the 71 success or failure of software projects. We were given complete access to internal studies and assessments conducted within Sigma. Moreover, one of the research personnel who was familiar with several software projects in the company helped us, both with connecting with the two cases we studied and with the data-gathering effort. We studied closely the development of two products, Alpha and Beta. The two cases were selected because they had significantly different performance on the quality and schedule dimensions, despite similarities in size and location. Despite a promising start, Alpha had come to face many quality problems and delays while Beta's releases were on-time and had high quality. Comparing and contrasting the processes in effect in two cases provided a rich context to build a model capturing the common processes across two projects with different outcomes, therefore helping shed light on how similar structures can underlie different outcomes. Seventy semi-structured interviews (by phone and in person, ranging between 30 to 90 minutes) and archival data gathered in three months of fieldwork in Sigma inform this study. Moreover, we participated in a few group meetings in the organization that focused on the progress of products under study. The study is part of a longer research involvement with the company; therefore, we continued to gather additional data and corroboration on different themes after the three months of major fieldwork. These efforts included 20 additional interviews and a group session which elicited ideas from 26 experienced members of the organization. The interviews included members of all functional areas involved in development, services, and sales, specifically architects and system engineers, developers, testers, customer service staff, sales support personnel, and marketing personnel, as well as managers up to two layers in several of these areas. Interviewees were given a short description of the project, were asked about their role in the company and the process of work, and then they discussed their experiences in the development of their respective products. The details of the data collection process are documented in the technical appendix 3-1. Data Analysis Interviews were recorded and a summary of each interview was created soon after the interview session. We used this information to build a simple model of how the mechanics of the development process work. Moreover, in each interview we searched for factors contributing to quality and delay issues and looked for corroborations or disagreements with old factors. Based 72 on these factors and the observations on the site, we generated several different hypotheses about what processes contribute to, or avoid, delay and quality problems in the development of Alpha and Beta. The complete set of these dynamic hypotheses is documented in the technical appendix 3-2. We then narrowed down the list of hypotheses into a core set based on what themes were more salient in the interviews and in the history of the products. This core set of dynamic hypotheses created a qualitative theory to be formally modeled using the system dynamics modeling framework (Forrester 1961; Sterman 2000). The modeling process included first building a detailed technical model (documented in the technical appendix 3-3) and then extracting the main dynamics in a simple simulation model reported in this essay. The model allows us to understand the relationship between the process of product development and the behavior of a PD organization. Using sensitivity analysis and different simulation experiments, we learned about the dynamic consequences of interaction between the mechanics of product development and decision-rules used to allocate resources and manage the system. In the rest of the essay, we use the simple simulation model to explain why creating, managing, and sustaining multiple-release product development processes is difficult. More specifically, we explain why PD capability in these settings can erode through random environmental shocks and adaptation, and describe the role of product architecture and feedback from the product in the field in the erosion of PD capability. Next, we provide a brief description of the software development process in the two cases studied. In the following section, the simple model of multiple-release product development is introduced and the dynamics of interest are discussed in detail. Overview of Development Process for Alpha and Beta In Sigma, we studied the development of two products closely. Both products are customer relationship management (CRM) solutions (or part of a CRM solution) and follow similar general processes of software development, as described below. The development process described here is for one release of a product; however, usually a new release is well underway when the current release is launched into the market. Therefore, the different phases of development overlap with each other for different releases. 73 The software development process includes three main phases. These phases can be followed in a serial manner (waterfall development) or iteratively. Sigma's development organization followed a largely waterfall approach, with some iterative elements. In the concept design phase, the main features of the product are determined, the product architecture is designed, and general requirements for developing different pieces of code are created. For example, a new feature for the next release of a call-center software can be "linking to national database of households and retrieving customer information as the call is routed to a customer service agent." Software architects decide the method by which this new capability should be incorporated into the current code, which new modules should be written, and how the new code should interact with current modules of code. Subsequently a more detailed outline of the requirements is developed that describes the inputs, outputs, and performance requirements for the new modules, as well as modifications of the old code. The next step of the software development process is developing the code. This is usually the most salient and the largest share of the work. In this step, software engineers (developers) develop the new code based on the requirements they have received from the concept design phase. The quality of the development depends on several factors, including the skills of individual developers, complexity of the software, adherence to good development practices (e.g., writing detailed requirements and doing code reviews), quality of the code they are building on, and quality of the software architecture (MacCormack, Kemerer et al. 2003). Some preliminary tests are often run in this stage to insure the quality of the new code before it is integrated with other pieces and sent to the quality assurance stage. Quality assurance includes running different tests on the code to find problems that stop the software from functioning as desired. Often these tests reveal problems in the code that are referred back to developers for rework. This rework cycle continues until the software passes most of the tests and the product can be released. It is important to note that not all possible errors are discovered during testing. Often tests can cover only a fraction of possible combinations of activities for which the software may be used in the field. Therefore, there are always some bugs in the released software. When the software is released, new customers buy and install it and therefore require service from the company. The service department undertakes answering customer questions and helping them use the software efficiently, as well as following up on bugs reported by 74 customers, and referring these bugs to the research and development organization. These bugs need to be fixed on the customer site through patches and ad hoc fixes if they are critical to customer satisfaction. Current-Engineering (CE) is the term used for this activity of creating short-term fixes for bugs on a specific customer site. CE is often undertaken by the same part of the organization that has developed the code, if not by the same pool of developers. If the bugs are not critical, they can be fixed in later releases of the software. In this study Bug-Fixing refers to the activity of fixing problems discovered from old releases in the new release of the product and thus is different from CE. Consequently bug-fixing usually competes with addition of new features to the next release. The development, launch, and service activities discussed above focus on a single release (version) of the software. However, different releases are interconnected. First, different versions of software build on each other and hence they carry over the problems in code and architecture, if these problems are not remedied through bug-fixes or refactoring of the architecture. (Refactoring is improving the design and architecture of the existing code.) Moreover, the sharing of resources between development and CE means that the quality of past releases influences the resources available for the development of the current release. Products Alpha and Beta share the general processes as discussed above. Product Alpha is a Customer Relationship Management (CRM) software which has been evolving for over 8 years. The research and development team working on Alpha has fluctuated at about 150 fulltime employees who work in five main locations. This product has gone through a few acquisitions and mergers; its first few releases have led the market; and it has been considered a promising, strategic product for the company. However, in the past two years, long delays in delivery and low quality have removed it from its leadership position in the market. In fact, the recent low sales have cast doubt on its viability. Product Beta is part of a complicated telecommunication switch that is developed largely independent of the parent product but is tested, launched, and sold together with the parent product. Beta includes both software and hardware elements and the majority of its over-80person R&D resources focus on the software part. Beta's R&D team is also spread across multiple locations with four main sites, two of which overlap with Alpha product. Beta has had exceptionally good quality and timely delivery consistently through its life. Nevertheless, in two 75 cases in its history, Beta had to delay a release to adhere to a synchronized release plan with the parent product, which was delayed. Model and Analysis In this section we build a simple model of the multiple-release product development process, in which the interactions among quality, productivity, and resource requirements for products in the field and for maintenance of the old code are analyzed. We build and analyze the model in three steps. First, a very simple model is developed to look at the productivity and quality tradeoff in the development phase of the PD process. After analyzing this simple model, we build into it a simple structure to represent what happens when the product is introduced into the field and the dynamics that follow. Finally, we add the architecture and underlying codebase and analyze the full model. The model is built based on the observations of PD processes described in the previous section. Therefore the decision-rules are modeled based on the observation of individuals' decision-making and action in practice, rather than theoretical or normative assumptions such as rationality. Consequently the formulations of the model follow the tenets of behavioral decision theory (Cyert and March 1963) and bounded rationality (Simon 1979) as applied to simulation models (Morecroft 1983; Morecroft 1985). Development Sector: Tipping point in productivity and quality tradeoff Development of a new release of a software product entails adding new features to an existing release or developing all the new features for the first release. In the same fashion, the new car from an existing model has some new features added or modified on the existing platform and a new version of a personal organizer builds on the last version with a few modifications and additional features. Demandfor New Features2 7 usually comes from market research, customer focus groups, benchmarking with competitors, and other strategic sources, with an eye on internal innovative capabilities of the firm. In the case of the software products studied, new features are proposed by sales, service, and marketing personnel, as well as product managers who are in charge of finding out about what is offered by the competition and what is most needed by the potential customers and prioritizing these features to be included in the 27 The names of variables that appear in the model diagrams are italicized when they are first introduced. 76 future releases. Therefore the demand for new features largely depends on competitive pressures and factors outside the direct control of the R&D department, even though the product managers have some flexibility in selecting and prioritizing among hundreds of features that can be included in a new release (see Figure 2). The proposed features to be added in future releases accumulate in the stock of Features Under Development until they are developed, tested, and released in the market to become part of the Features in Latest Release of the software. Feature Release rate determines how fast the PD organization is developing and releasing new features and therefore how often new releases are launched. The quality of the new release depends on the Defects in Latest Release, which accumulates Defect Introduction Rate as defects in the designed product go into the market. Figure 2 shows how these variables represent the flow of tasks and the defects in the product. Formally, the stocks (boxes) represent the integration of different flows (thick valves and arrows); for example, Features Under Development is the integral of Feature Addition minus Feature Release. Appendix A details the equations of the model. Figure 2- The basic stock and flow structure for multiple-release new product development. Demandfor New Features accumulates in the stock of Features Under Development until they are developed and transferred into the Features in Latest Release through Feature Release. Strong product development capability in this setting translates into getting the highest rate of Feature Release given a fixed level of development resources. Companies that can sustain such high rates of features release will have a competitive advantage through having some combination of more features in their new releases, faster release times, and lower cost of product development. 77 In its simplest form, Feature Release (FR) depends on the amount of resources available for development (RD), the productivity of these resources (P), and the quality of their work (i.e., fraction of accomplished work that passes the quality criteria). This fraction depends on two main factors, the quality of work, represented by the fraction of developed features that are defective (Error Rate: e), and the comprehensiveness of testing and quality assurance practices, represented by the fraction of errors discovered in testing (frT). Therefore the fraction of work that is accepted is one minus the fraction of work rejected or frA=I-e*frT. Consequently Feature Release can be summarized as: FR=RD*P*(-1 e*frT) ( 1) Here we measure the development resources (RD) as the number of individuals working on development of new features. Productivity (P) represents how many features a typical developer can create in one month and depends on several factors, including the skill and experience of individual developers, the quality of design and requirements for the development, the complexity of the code to be developed, the development process used (e.g., waterfall vs. iterative), and the availability of different productivity tools. Similarly, Error Rate (e) depends on several factors, including the quality of design and detailed requirements, complexity of architecture, the scope of testing by developers (before the official testing phase), code reviews and documentation, development process used, and the pressure developers face, among other things. Finally, the fraction of errors caught in testing (frT) depends both on comprehensiveness and coverage of test plans as well as on the quality of execution of, and follow-up on, testing. What creates the dynamics in a product development process is the fact that resources, productivity, quality, and testing are not constant, but often change through the life of a product development organization working on multiple releases of a product. While exogenous changes in these variables (e.g., economic downturn results in a lay-offand a reduction in available resources) are important, they add little insight into internal dynamics of the PD processes and offer little leverage for improvement. Therefore here we focus on those changes which are endogenous to the PD process, for example, how resource availability changes because of the quality of past releases. In what follows we expand the model to capture some of the main endogenous effects (feedbacks) and analyze the results. A central concept for understanding the capability erosion dynamics is the balance between resources available and resources required for finishing a release on schedule. This can 78 be operationalized as Resource Gap (RG), the ratio of typical person.months of work required to finish the development of pending Features Under Development by a scheduled date, to available person.months until that date. Therefore the Resource Gap is a function of three variables: how many features are under development, how much time the PD organization is given to develop these features, and how many development resources are available to do this job. (See the exact equations in Appendix A.) In the analysis of the determinants of the resource gap, we can distinguish between Features Under Development that change endogenously due to model dynamics, versus the time given for development and the resources available, which are mainly managerial policy levers. Therefore we keep the resources available and the time to develop the pending features constant when discussing the model dynamics in the absence of policy interventions. Resource Gap is a measure of pressure on the development team and therefore we use the two terms interchangeably. The resource gap can be managed through controlling the amount of resources available or the scheduled finish date. If pressure is consciously controlled, the desired impact of increasing schedule pressure is to increase the productivity of developers so that the gap between available resources and desired resources can be closed. In fact, in the case of software engineering, it is estimated that under pressure people can work as much as 40% faster (AbdelHamid and Madnick 1991). All else being equal, this balancing loop (Work Faster loop -see Figure 3) enables development teams to accomplish more work than in the absence of pressure and potentially meet a challenging schedule. Managers commonly use this lever to increase the productivity of the development process, as reflected in the comments of a program manager: "The thinking that [a higher manager]had, and we think we agree with this, is that if your target is Xand everybody is marchingfor that target,you fill the timefor that target. Ifyou are ahead of the game,people areprobably going to relax a little bit: becausewe aim ahead of the game, we don't need to work so hard,we can take a longer lunch,we can do whatever;and guess what: Murphy's law saysyou now won't make that target. So [ ]you shootfor Xminus something,you drivefor X minus somethingso thatyou build in on Murphy's law to a point. " However, there is a negative side to increasing work pressure. As pressure increases and developers try to work faster to meet the schedule, they start to make more errors. This happens because under pressure they take shortcuts, for example, by less detailed requirement 79 development, little documentation of the code, lack of code review, and poor unit testing (see MacCormack, Kemerer et al. (2003) for a quantitative analysis of the effects of different practices on quality and productivity). Moreover, the stress induced by pressure can increase their error rate and further erode the quality. In the words of a technical manager in Alpha: "There is a lot ofpressure put on people to say how can you do this asfast asyou can, andpeople fall back on the ways that they used to do it, so insteadof going through all this process stuff [they would say] 'I know what to code, I canjust code it up. "' Higher Error Rates surface in a few weeks when the code goes into the testing phase, resulting in the need for more rework. Consequently rework increases the amount of work to be done and therefore the pressure, closing a potentially vicious, reinforcing cycle (Rework Loop -Figure 3). The error rate-productivity trade-off discussed above is well documented for a single project in different domains, from software development to construction (Cooper 1980; AbdelHamid and Madnick 1991). These effects were also salient in Alpha and Beta; for example, developers in Alpha found little time to engage in process work that was supposed to enhance the quality of the product, such as doing code reviews and making detailed requirement plans. The strength of the trade-off between pressure and quality depends not only on individual skills and experience, but also on the development process and incentive structure in place at the PD organization. For example, in Alpha, the quality of development was hardly transparent to management at the time of development; therefore, under acute pressure, developers had more incentive to forgo good practices and sacrifice quality to increase the speed of development. A developer observes: a".."""". am"...... " r m" are ".. m. nmr me a a "Nobody really sees what we do. Youget rewardedwhen your name is next to a dollar sign If we wanted to get recognition,we couldput faults into [the software] so we know exactly where they are, and then wait until a big customerputs a lot of money on it, and thengo fix it quickly...This is not what we do [but]jumping and saving the customer is important [for our reviews].Nobody knows what actually the quality is and what actually we do. " We aggregate the effects of pressure on productivity and quality in two simple functions. Fp(RG), the effect of Resource Gap (RG) on productivity, changes the productivity around a 80 normal level, PN,so that P=PN* Fp(RG). Similarly, Fe(RG) represents the effect of Resource Gap on Error Rate and therefore Error Rate changes around its normal value eN:e=eN*Fe(RG). Substituting in Equation (1) we can see how Feature Release depends on Resource Gap: FR= FR=RD* PN* Fp(RG)*(I- eN*Fe(RG)*frT) (2) The exact behavior of this function depends on the shape of Fp and Fe and therefore on the strength of the two effects, but in general both functions are upward sloping (increase in Resource Gap increases both Productivity and Error Rate) and both saturate at some minimum and maximum levels (Productivity and Error Rate cannot go to infinity or below zero). The qualitative behavior of the system, however, depends only on the general shape of the Feature Release with respect to different levels of Resource Gap. An upward sloping function suggests that the higher the pressure, the higher the output of the development process is (although this can be at odds with quality if a poor testing system lets the defects to go through). A downward sloping function suggests that the least pressure is needed to get the best performance. Finally, a more plausible shape is inverse-U shape, where up until some level of Resource Gap people will work harder and make more progress, but above that level the increase in pressure is actually harmful since people start to take shortcuts that harm the quality significantly and therefore developers end up reworking their tasks and therefore releasing fewer acceptable features on average. This shape in fact proves to be dominant for a wide variety of plausible Fe, Fp, and frT values. Figure 3 illustrates the causal relationships between Resource Gap, productivity, Error Rate, and Feature Release. Two feedback loops of Work Harder and Rework are highlighted in this picture. The Work Harder loop suggests that an increase in the Features Under Development increases the demand for resources and therefore the Resource Gap. As a result, productivity increases and the PD process releases more new features, reducing the stock of Features Under Development (hence the Work Harder loop is balancing (B)). The Rework loop highlights the fact that in the presence of more features to be developed and higher pressure, people take shortcuts and work with lower quality, therefore increasing the Error Rate and consequently the amount of rework needed to fix those errors. This results in slower Feature Release and therefore potentially higher Features Under Development (hence the Rework loop is reinforcing (R)). 81 Assuming the more plausible inverse-U shape for the outcome of the tradeoff between these two loops, the total effect of Resource Gap on output of PD can be categorized into two regions. For lower levels of Resource Gap and pressure, an increase in Features Under Development (and therefore pressure) has a desirable net gain, namely, the PD organization will produce more. However, above some threshold of Resource Gap, the relationship inverses, that is, the PD organization will produce fewer new features as a result of an increase in the pressure. -P Ao 0 AL 0 L 18 I Available - (Pressure) Resources Figure 3- The basic structure of development process and tradeoffs between effects of Resource Gap on Error Rate and productivity. The graphs show the shape of these two effects, both increasing as more pressure is added. The Feature Release vs. Resource Gap graph shows the aggregate result of pressure on Feature Release for eN=0.4 and frT=0.9. The values on the axes are not shown since the qualitative shapes of the functions are what matter in the analysis. To understand the dynamics of this system, consider a PD organization in equilibrium, i.e., with a fixed, constant demand for new features coming in, which the development team is able to meet, resulting in equal Feature Release and demand for new features. A perturbation of the PD organization with a transient increase in workload, surfaces the important dynamics. An 82 example of such a perturbation is a set of unplanned, new features that the higher management asks the development team to incorporate into the next release, beyond the current workload. Figure 4 portrays the two possible outcomes of such perturbation. In Figure 4-a (left), the PD organization is working by developing six new features each month, until time ten, when the demand increases to ten features per month and continues to be so for four months. During this time, the additional demand results in more Features Under Development and therefore higher levels of Resource Gap. This additional pressure in fact increases the overall Feature Release over the equilibrium value, since the increase in the productivity is bigger than the negative effect of the higher Error Rate. Consequently, after the short-term additional demand is removed (feature demand goes back to six), the PD organization continues to release more new features than demanded, and therefore is successful in bringing the stock of Features Under Development back to its equilibrium value. Figure 4- The reaction of the PD process under a pulse of increased demand for features. The pulse lasts for 4 months and increases the feature demand (red); Features Under Development (blue) represents the features demanded by the market but not yet developed. Feature release (green) represents the output of the PD process. Figure 4-a, on the left hand side, represents a case where additional pressure introduced by the pulse is managed through faster work and the Features Under Development is moved back to its equilibrium level. Figure 4-b shows how the PD organization fails to accommodate a slightly stronger pulse, resulting in erosion of PD capability as the Feature Release drops down and stays low. 83 Now consider the same experiment with a slightly stronger perturbation (e.g., demand going to 13 rather than 10). This time we observe a dramatic shift in the performance of the simulated PD organization. In the new state, which, following Repenning (Repenning 2001) we call Firefighting, the Feature Release fails to keep up with demand and in fact goes down, the Features Under Development grow higher and higher since demand continues to exceed release, and, in short, the PD organization moves into a mode of being excessively under pressure, being behind the market demand, and developing fewer new features than they used to in the initial equilibrium. What creates such a dramatic shift in behavior, from addition of just a few features to the workload? Maximum Capacity I J U 0) ) co emand 0) L. C, 7(U IL Initial Pressure Tipping Pressure Resource Gap Figure 5- Schematic inverse-U shape curve of Feature Release as a function of Resource Gap. The picture highlights the stable equilibrium (where the feature demand meets the upward sloping part of the curve), the tipping point (where the feature demand meets the downward sloping part of the curve), and the maximum capacity (where the curve reaches its maximum). A graphical representation of dynamics helps illuminate this question better. Figure 5 shows an inverse-U shape curve that relates Resource Gap to Feature Release. The equilibrium level of feature demand is represented by a horizontal line. Initially the system is in equilibrium at the point marked by stable equilibrium: the Resource Gap is at a level at which Feature Release equals feature demand, and therefore the Features Under Development does not change and the system remains in equilibrium. If the system is perturbed by a small pulse (reduction) in 84 demand, the Features Under Development increase (decrease). Hence the Resource Gap increases (decreases) to a value right (left) of the stable equilibrium point, and that results in values of Feature Release higher (lower) than feature demand. As a result, the Features Under Development decrease faster (slower) than increase, and the stock goes down (up), reducing (increasing) the Resource Gap until the system gets back to the stable equilibrium. The arrows on the curve represent the direction towards which the system moves: for small perturbations in work or schedule pressure the system adjusts itself back into the stable equilibrium. However, there is a threshold of Resource Gap, the Tipping Pressure (see Figure 5), beyond which the system's behavior becomes divergent. An increase in the Resource Gap over the tipping pressure results in Feature Release values lower than feature demand. Therefore the stock of Features Under Development keeps growing, exerting further counterproductive pressure; hence the rework loop dominates in its vicious direction and the PD capacity goes down with no recovery (for a more comprehensive treatment of similar dynamics in a different setting see Rudolph and Repenning (2002)). In the first experiment, the initial perturbation brought the Resource Gap to the right of the stable equilibrium but just short of passing the tipping point. Therefore the simulated PD organization was able to recover by working faster, producing more, and reducing the pressure. However, the perturbation in the second experiment pushed the Resource Gap over the edge and beyond the tipping point. The PD organization was caught in a cycle of working harder, making more errors, spending more time on rework, and sensing more pressure under the unrelenting market demand. Note that the tipping point is different from the point at which the maximum development capacity is realized (see Figure 5). The tipping point is where the feature demand and Feature Release meet on the downward sloping part of the release-pressure curve, while the maximum Feature Release is achieved at the pressure which maximizes this curve. As a result, the Tipping Pressure, as well as the stable Equilibrium Pressure, depend on the feature demand. Changing feature demand shifts the tipping point and stable equilibrium because it shifts the horizontal demand line on the curve, bringing them closer to (by more demand) or further from (by less demand) each other and the maximum capacity. The shift in tipping point and stable equilibrium highlights an important tradeoff between a PD organization's operating capacity and its robustness. We can design more new features by accepting a higher demand rate (e.g., through taking on more tasks in the planning for a new 85 release), but that brings the stable equilibrium and the tipping point closer, and therefore the PD organization can tip into the firefighting mode more easily. For example, an unexpected demand by higher management, a technical glitch, or departure of a few key personnel can tip the system into firefighting mode. In the extreme, we can get the maximum output from the PD organization by demanding features at the maximum capacity rate. In this case we bring the stable equilibrium to the tipping point and therefore the system is at the tip of the slippery slope into firefighting. While highly productive, such a system can easily fall into the firefighting trap in the face of small unexpected perturbations in the environment of the PD work. The dynamics discussed in this section look only at the development part of the PD process. Therefore they do not account for the quality consequences of the features designed, when they go outside of the PD organization. This is not a good assumption and has unrealistic conclusions, e.g., the PD organization is better off to reduce its testing goals, since that way it does not find the defects in the features designed, and consequently does not need to do any rework, avoiding firefighting and increasing the capacity (in simple terms, ignorance is bliss). In the next two sections we expand the model to account for two important consequences of the quality of PD work, and the dynamics that ensue. Current engineering: shifting capacity and creating an adaptation trap New features introduced into the market have important effects for the product development organization. First, there is a demand for customer support and services. The service organization, which is often different from the R&D unit (as is the case in Sigma) is in charge of direct interaction with these customers. However, if service people are unable to solve a customer issue, the customer call is transferred to the R&D department. These transfers (called escalations in Sigma) can arise because of complexity of the software, but also because some defect in the code has created an unexpected issue at the client site. When such defects are found, the development team often needs to decide on their fate. If the problem is interfering with important needs of the customer, R&D personnel are allocated to fix the problem, usually through quick, customized patches at the customer site. These activities are referred to as current engineering (CE) and can take a significant share of R&D resources. In fact, with product Alpha, 30-50 percent of development resources were allocated to CE work, while successful management of quality in Beta had kept this number below 10 percent. 86 Current engineering is not the only connection between the PD organization and the product in the field. Initial installation of the product also requires PD resources; therefore a product design that facilitates quick, smooth installation can save the PD organization from installation challenges. However, improving installability and serviceability of the product gets lower priority when important new features should be added to keep up with the market demand. A more subtle feedback of field performance is through sales level and its influence on resource allocation across a portfolio of products. Delays and quality problems that characterize a PD organization in trouble often reduce customer satisfaction and sales for a product. Since resource allocation across different product lines in the portfolio often correlates with the success of the product in the field, i.e., its sales pattern, poor quality and sales can cost the PD organization its resources for development of future releases. This section focuses on modeling and analyzing the effect of current engineering work. Figure 6 shows the causal structure of the model that includes CE. Here the features flow into the field as new releases are launched, and Defect Introduction captures the flow of defects into the field along with these features. These defects create a demand for PD resources to be allocated for current engineering (Resource Demandfor Current Engineering.) This demand further increases the Resource Gap and can aggravate the Error Rate for the current Features Under Development, since developers must fix customer-specific bugs rather than spend time on quality work for the next releases (Current Engineering loop in Figure 6). The demand for CE resources depends not only on the quality of the product in the field (operationalized as Defects in Latest Release), but also on the sales, fraction of defects that create noticeable problems, and productivity of doing CE work. Details of formulations appear in Appendix A. The analysis that follows holds sales and fraction of noticeable problems constant and focuses on the dynamics in the interaction of the Development and the Product in the Field Sectors. 87 Product In the Field Development :i MU· · · , · ; in - } t' E., ' itX a - W( A ·1~~~~~~~~1 ;2 · *I '! '*M'hd* '§ i d I' An+ ; Devefoptit " t .1g.,>·r-0''~ r > ' ,' ' , · , 4r, ·; · Features in Latest Release I!' frif; 5pm'ent;.iP | As , F.', 4 ', .>i~s lin, Defects in Latest 6,> Release ....J-~~~~ .I I*D isn' h0eaprra .Om"n uroeft Figure 6- The structure of the model with the addition of products in the field. The new structures are highlighted through black variable names and blue, thick causal links. Two major dynamic effects follow the introduction of CE into the analysis. First, the additional resources needed to serve the CE reduce the effective capacity for developing new features. The magnitude of this effect depends on several factors that determine demand for CE resources, some of which are endogenous to the dynamics of development and products in the field. For example, very good quality in the development phase can significantly reduce the CE resource demand since the product has few noteworthy bugs when introduced into the market. Another factor is sales, which increases the CE demand, since it increases the number of customers who may face problems. The strength of the sales effect also depends on the nature of the product. For customizable products, different defects show up in different customer sites because each customer uses a different set of features and those combinations reveal different hidden problems in the design of the product. However, most defects in off-the-shelf products are common across different customers and therefore, by developing one patch, the PD organization serves several customers, reducing the burden of CE significantly. Other relevant factors include the modularity of architecture and complexity of the code, since these determine how resource-consuming the CE work is; the priority of CE work relative to new development; 88 the fraction of defects that surface in typical applications; and the power of the customers in relationship with the focal organization. In the case of Alpha and Beta, these factors added up to a potentially strong CE effect: the products are customizable and the CE work gets a high priority because large customers command special attention in their relationship with Sigma (partly because future sales depend both on word-of-mouth and direct feedback from current customers). However, Alpha and Beta faced different levels of pressure with regard to CE: Alpha had to allocate 30 to 50 percent of its resources (at different times) to CE work, while this figure remained below 10 percent in Beta. The difference between the two can be explained by the internal dynamics of the two cases. Alpha had slipped into a firefighting mode, it faced significant quality problems, and the poor quality of past releases took a toll on the complexity of the code-base and architecture, increasing the time needed to figure out and fix a problem. On the other hand, Beta had largely succeeded in avoiding the firefighting trap and therefore, with a good quality, had very few escalations and little demand for CE work. The second effect of bringing CE into the picture highlights an adaptation trap for the PD organization. Imagine the adaptive path that the management can take to maximize the feature release of the PD organization. Increasing the pressure (through additional demand or tighter schedules), the management can monitor the performance of the system to see whether incremental increases in pressure pay off by higher feature release, and stabilize where the marginal return on additional pressure is zero. Such an adaptation process is analogous to hillclimbing on the Resource Gap-Feature Release curve in Figure 7, and is discussed as one of the main ways through which organizational routines adapt and create better performance (Nelson and Winter 1982). 89 6 Perceived Max moo" 5.25 Sustainable Max Demand 4.5 (a 3.75 W1 3 0, LL 3 0.80 0.90 1 1.10 1.20 1.30 1.40 1.50 1.60 Resource Gap Figure 7- The curve for perceived (blue) and sustainable (red) curves for Feature Release vs. Resource Gap. In the perceived curve, the demand for CE is kept at its initial value despite adaptive increase in feature demand. The sustainable path comes from the same experience, where the CE resource demand is given at its equilibrium value for each level of Error Rate and feature release. An important distinction is that the demand for CE does not increase at the time that the pressure (and consequently the output and error rates) is increased. In fact, it takes at least a few months before the product is out and then some more time before the defects show up and are referred back to the PD organization. During this period, increases in pressure create unrealistic boosts in the feature release rate, since the CE consequences are not realized. Therefore the adaptation is not on the sustainable payoff landscape, which takes into account the future CE demands; rather, it is on a temporary one with unrealistic returns to additional pressure (see Figure 7). As a result, the adaptation process points to an optimum feature demand that is indeed beyond the capacity of the PD organization to supply long-term , and when the CE work starts to flow in, the PD organization finds itself on the slippery slopes of firefighting. Pushed this far, not only does CE demand increase the pressure, but also the tighter schedule and higher demands that were justified in light of adaptive process, keep up their pressure through an increasing backlog of features which should be incorporated in the new releases. . mm.a 6 a a. .. mm a a a a a. urn a ... m Iu . .. mm a I a maaaum The strength of the adaptation trap depends on the magnitude of decoupling between the sustainable adaptation path and the perceived path. Decoupling increases the extent to which 90 I..... demand on PD organization could increase beyond the sustainable level. Therefore higher decoupling increases the magnitude of consequent collapse of the PD process performance and strengthens the trap. The extent of decoupling depends on two main factors: the length of delay in observing the CE resource demand, and the speed of adaptation. A longer delay in perceiving the demand for CE resources, e.g., because of longer release cycle, will result in a longer time horizon in which the adaptive processes work in the absence of the CE feedback. Therefore the risk of overshoot in adaptation into the unsustainable region increases. Moreover, faster adaptation, e.g., through faster learning cycles of pressure adjustment and performance monitoring, increases the speed by which the PD organization expands the pressure and feature demand, before observing the CE demand. Architecture and code base: dynamics of bug-fixing and building on error In this section we expand the simple model of the PD process to include the effects of the architecture and code base in the dynamics of PD. In this discussion, we focus on two main features of architecture and code base. First, we include fixing the bugs found in the past releases when creating a new release, as well as improving the architecture to accommodate further expansion of the code. Second, we discuss the effect of the problems in architecture and code base on the error rate for working on the new features. Releases which are on the market become old when newer products are launched. Therefore Features in Latest Release move into the Underlying Code category, while the defects that existed in them are removed from the field (stock of Defects in Latest Release) and become part of the Defects in Underlying Code (see Figure 8). In this conceptualization we are lumping together the underlying code base and the architecture of the software. Though these two are not the same, they have important similarities in their dynamic effects on PD processes. Bug-fixing is the process of fixing defects that have been found in the field (or found during testing, but ignored under work pressure) in the underlying code. It is different from current engineering since CE happens for specific customers who face a common issue, and the resulting fixes are not integrated with the next release of the code, while bug-fixing incorporates improvements into the main body of the code and will apply to all future customers. The parallel process for architecture is refactoring, which means improving the design of existing code. 91 Quality of underlying architecture and code base both depend on the quality of past development work. Low quality development not only reduces the quality of code base for future work, but also impacts architecture, since taking shortcuts in development could spoil the designed relationships between the different modules of the code. Moreover, although we do not model the concept design phase explicitly in this model, the quality of architecture design depends on similar pressures that impact the development phase. Therefore one can expect the quality of the code base and the architecture to be positively correlated. Architectureand Code Base Figure 8- The model structure after addition of architecture and code base. New loops are highlighted in thick blue causal links. Bug-fixing and refactoring need resources from the PD organization and the demand for these resources depends on the quality of the code base and architecture. This demand for resources can further increase the Resource Gap, leading to more quality problems in a reinforcing cycle we call Bug Fixing (Figure 8). The additional resources needed for bug-fixing are often not separated from those who work on the development of new features, since bug fixes 92 are part of new releases. In fact, most software products have a release schedule with minor and major releases, in which minor releases mainly focus on fixing bugs. Consequently, despite lower visibility, bug-fixing can consume a considerable share of PD resources. For example, one of the recent "minor" releases of Alpha included fixing over 1000 bugs, and proved to be as time- and resource-intensive as any major release. Including the bug-fixing activity in the picture of the PD process has some similar consequences to addition of current engineering. First, the bug-fixing resource demand takes away resources for development of new features, shifting the effective, long-term capacity of the PD process further down. The magnitude of this shift depends on several factors, including the quality of the code developed (Did we create a lot of bugs?), the strength of testing (Did we fail to catch those bugs initially?), the priority given to fixing bugs (Do we care about fixing bugs?), as well as the relative productivity of bug-fixing (Does it take more resources to fix a feature than to develop it correctly in the first place?). In Appendix B we derive the mathematical formula for the sustainable capacity of PD process under different parameter settings. Second, the adaptation trap is made stronger through bug-fixing feedback. The bugfixing activities entail an even longer delay between the original development and the observation of demand for bug-fixing resources. This delay includes the release of new features, the reporting of the bugs, and the scheduling of these bugs in the work for future releases. The longer delay suggests that there is even more time during which adaptive processes fail to observe an overshoot of demand beyond sustainable PD capacity. Moreover, the downward shift in the sustainable capacity of PD process suggests that the gap between the short-term feature release and what can be sustained in the long-term is made wider. Consequently, the chances of adaptation into an unsustainable region increase, and when the feedback of CE and bug-fixing arrives, the PD organization finds itself further down the pressure-Feature Release curve. If allocating resources to bug-fixing reduces the PD capacity and increases the chances of slipping into firefighting mode, why not ignore this activity altogether? The problem is two-fold. First, doing so will generate further customer dissatisfaction and demand for CE when old problems surface in the new features. Moreover, ignoring bug fixes results in the accumulation of defects in the underlying code and architecture, which can significantly increase the chances of making errors in the current development work and trigger another vicious cycle (the Error Creates Error loop, Figure 8). 93 The Error Creates Error loop, when active, is quite detrimental to the product line. On the one hand, the effect of the quality of underlying code and architecture on the error rate is beyond the effect of pressure and can drive quality further down. Such lower quality in fact strengthens the rework, current engineering, and bug-fixing loops in the vicious direction. On the other hand, in the face of longer delays before the architecture and code base effects surface, typically firefighting starts with rework and current engineering loops driving the dynamics, followed by the domination of Error Creates Error that seals the PD organization in firefighting. Consequently, the activation of the Error Creates Error loop suggests that things have been going wrong for quite a while; therefore reversing the trend requires a significant investment in fixing bugs and refactoring the architecture, and in many cases may prove infeasible. In short, it is hard to save a product line so far down into the firefighting path that it faces significant quality problems because of the old architecture and code base quality. An important factor in determining the propensity of a product to face the Error Creates Error loop is the modularity of its architecture. The modularity of architecture can be represented by the average number of other modules each module is interacting with. In a highly modular product, each part is interacting with only a few neighboring parts. In a very integrated product, however, every piece can be interacting with every other piece. In general, new pieces can safely be added to the old code, as long as the pieces they are interacting with are not defective. However, if the interacting pieces are defective, the Error Creates Error feedback becomes important since all modules connected to the new one can impact its quality. Therefore, in a highly modular architecture, we only need a few interacting pieces to be fixed to avoid the feedback. In aggregate this entails keeping the density of errors low (percentage of modules defective). However, in the case of an integral architecture, things can go wrong because any of the interacting pieces are defective; therefore we need to control the absolute number of defects in the code. Controlling density of errors in the code base is much easier than controlling the absolute number. In fact, as long as the density of errors in the old releases is constant, the density in the underlying code base does not increase while the absolute number increases proportional with new features released. Therefore, the more we move from a modular to an integral architecture, the more important the feedback of past errors on error rate becomes, and the more important it becomes to spend resources on bug-fixing. As a senior researcher in Sigma puts it: "A good architecturehelps control entropy." 94 The dynamics of bug-fixing and error-creates-error also highlight the importance of an initially robust architecture to reap the benefits of multiple-release PD. Just as these dynamics can be detrimental when they are activated in the negative direction, a robust, modular architecture would allow for reliable and high-quality development of several releases of a product, with much less resources than needed to develop similar products independently. Discussion In the model and analysis part of this essay, we built a simple model of multiple-release product development that focuses on the interactions among the development of the product and the product in the field, as well as the architecture and underlying code the new releases are built on. When looking only at the development process, the tradeoff between quality and productivity in the face of pressure can result in tipping dynamics. The level of Resource Gap determines where the stable equilibrium and the tipping point for the system are, and the additional pressure brings these two points closer together, as well as closer to the maximum capacity of the PD organization. Consequently, there is a tradeoff between robustness and capacity of PD process; increasing the demand on the system can be beneficial in terms of additional output but increases the chances of tipping into firefighting mode as a result of unexpected changes in the PD work environment. Moreover, beyond the tipping point, an increase in pressure results in less output, driving the PD organization into increasing firefighting. When we expand this framework to include the products in the market, we observe the current engineering dynamics. These dynamics further highlight the importance of quality since resources need to be allocated to CE when product quality is low. This consideration reduces the level of output a PD organization can sustain across multiple releases. Moreover, the delay between changes in the quality and the observation of their effect on CE workload creates a trap for the PD organization in which adaptation to increase output of the process can actually take the system into the firefighting region before the problem is known through feedbacks on the ground. Finally, inclusion of underlying architecture and code base strengthens the continuity of multiple release product development. First, bug-fixing requires resources and further reduces the effective capacity of the PD process for developing new features. Moreover, the delays are 95 longer for bug-fixing, which makes the adaptation trap deeper and more significant. Finally, the effect of the code base and architecture quality on the quality of current development strengthens rework, current engineering, and bug-fixing dynamics and can seal the system in firefighting mode by increasing the costs of change. Multiple-release product development is continuous in nature since different releases of the product are based on the same platform, share resources for development activity, trade features from one release to another, and share resources for current engineering and bug-fixing. Consequently, the firefighting dynamics discussed cannot be contained in a single development project; rather, they pass on from one release to another in a contagious fashion. Once the PD organization tips into firefighting, it is increasingly difficult for future releases to avoid these dynamics. This is why these dynamics can erode the PD capability: they reduce the performance of the PD organization in multiple consecutive development projects. In fact, once in the firefighting mode of behavior, new norms can shape around low quality practices, making future improvements harder. In contrast, if these dynamics are successfully avoided, the PD organization will be able to stay in an equilibrium with low error rate, improve its processes, and build a culture of high quality practices. A Beta manager reflects on his experience: "[T]he error rate drives a lot of what the overall dynamicswould be. [In my group] if we detect an error, not only do we go andfix the error, but I ask where in the process did we break down we investigate2 or 3field escalationsa month here, and not many of those result in a change if this error rate goes high, thenyou are constantlybattling the new investigation and you can't take the time to step back and say why exactly it was introduced. When my team goes back and looks at it and says, aha, we probably should have seen this in our code inspection, we really take that to heart and when we focus on the next round of development, they would say, "rememberthat thing, we are going to make sure that we are watchingfor that kind of stuff " So it increases awareness.Again, we can do that because the error rate is low." In short, the Rework, Current-Engineering, Bug-fixing, and Error Creates Error dynamics can erode PD capability through sustained firefighting. The PD organization enters firefighting not only as a result of unexpected demands when the organization is critically loaded, but also does so following intendedly functional adaptation to get the maximum utility out of the PD resources. Moreover, the dynamics pass on from one release of the product to another, therefore eroding PD capability. These dynamics degrade the architecture and code 96 base of the product line to the extent that the PD organization has difficulty making high-quality products any more, sealing the product line in the firefighting mode. We developed the model based on the case study of two software products. However, the basic dynamics are based on a few boundary assumptions that can hold in domains beyond the software industry and therefore can inform understanding of multiple-release product development processes. The first critical assumption is that there is a level of resource gap beyond which the average output of the organization per resource unit decreases, rather than increases. When this assumption is true, we have an inverse U-shape Resource Gap-Feature Release curve and the rework dynamics exist. In fact, if this assumption is not true, the best option for a PD organization may be overloading the group completely with an unmanageable feature demand so that ever-increasing pressure maximizes the output; the practices of successful PD organizations do not suggest the feasibility, not to mention the optimality, of such a policy. Another assumption in the model is that the quality problems are not lost, i.e., there is a cost for a low-quality product that the company will shoulder through some mechanism, including fixing errors, warranty and lawsuit costs, lost revenues, lower customer loyalty, and loss of brand name, among other costs. This might be close to a truism across the board, but the cost of quality problems differs from one case to another. The higher the cost of finding errors in later stages (e.g., when the product is in the field vs. when the product is under development), the stronger are the firefighting and contagion dynamics. Moreover, there needs to be a link between cost of quality and the resources available to develop future releases. This cost can be direct, as is the case for sharing current-engineering and development resources in software, or indirect, as implied by resource allocation policies that give more resources to more successful products in a company's portfolio. In the latter case, future quality costs influence the amount of resources available to future releases as long as they are accounted for in evaluating the product's success. The existence of quality costs after launch and the effect of these costs on the current development resources are enough to generate the Current-engineering and Bug-fixing dynamic (or their equivalents in other settings) and therefore the contagion of issues from one release to another. The other dynamic (Error Creates Error) rests on a third assumption: that the quality of the current release impacts the quality of work for the next release. This is usually true for the architectural aspects of the product that have strong impact on the quality of work as well as 97 future concept design and architecture. In software, this assumption holds also for code quality, since the documentation of code and its seamless integration with other pieces influence the quality of future code to be built on top of it. Appendix C discusses the boundary conditions of the model in more detail and elaborates on conditions under which these dynamics hold and when they are not important. Erosion of organizational capabilities is theoretically important. On the one hand capability erosion can take away the competitive edge of organizations. This effect is especially salient in the case of the PD capability which is the cornerstone of building other capabilities in dynamic markets (Teece, Pisano et al. 1997; Eisenhardt and Martin 2000). In other words, it is not only important to build a capability, but it is also important to be able to keep it. Moreover, processes that erode organizational capabilities overlap with those that inhibit imitation and replication of these capabilities. Tipping dynamics and adaptation traps not only can erode existing capabilities, but also can stand in the way of building a successful PD capability. These dynamics create multiple ways in the evolutionary path of capabilities for things to go wrong, and therefore reduce the chance that the PD organization can navigate its way into a high performance arrangement. For example, the tipping dynamics suggest that fluctuations in the level of workload can potentially tip the PD organization into firefighting. As a new firm attempts to build its PD capability, there are many instances where an unexpected surge of workload can tip the PD organization into a degrading mode of working fast and doing low-quality work. Furthermore, attempting to get the most out of the PD process early on, when the resource demand from the field is not yet realized, can result in overloading the organization through the adaptation trap dynamics. Therefore the tipping dynamics, the adaptation trap, and the contagion of firefighting dynamics highlighted in this study complement the literature exploring the mechanisms that make capabilities hard to build and sustain (including core rigidities, competency traps, and rugged payoff landscape). The sources of capability erosion discussed in this essay add to the literature on barriers to imitation of capabilities by highlighting the role of time in the operation of adaptive processes. Curiously, in the case of the adaptation trap, not only does the organization fail to realize better opportunities (as is the case in landscape complexity, core rigidity, and competency trap), but also may actually degrade its performance through adaptation. 98 By elaborating on some of the challenges in the way of successful PD processes, this study offers some factors for when PD capability is hard to imitate and under what conditions it is easy to build. If the dynamics discussed here play a visible role in the large picture of product development across different industries, one would expect the strength of these dynamics to partially determine the difficulty of building and maintaining a successful PD organization. Following the resource-based line of argument, one expects PD to explain a larger fraction of firm heterogeneity in industries in which PD is hard to build and sustain. Therefore, our study offers a few hypotheses on the strategic importance of PD across different industries: HI: All else being equal,PD is strategicallymore importantin industrieswhere multiplereleasePD is dominant. This first hypothesis suggests that the move towards multiple-release product development increases the interconnection between different products and therefore increases the significance of firefighting dynamics. As a result PD capability is harder to build and has a higher competitive value in such industries. H2: Flexibilityof PD processes increasestheirperformanceheterogeneityacrossfirms. If the quality of PD is highly dependent on flexible processes that individuals should follow, the effect of pressure on error rate will be stronger. The logic behind this assertion is that flexible processes give individuals more flexibility to mitigate the pressure by cutting corners, taking shortcuts, and using different practices that lead to lower quality in the long run. As a result, the Resource Gap-Feature Release curve will have a sharper U-shaped curve and the firefighting becomes more salient. H3: PD capability is more heterogeneousin industrieswhere quality of theproducts launchedinfluencesthe resourcesavailableto current development. The strength of the adaptation trap depends on how much the resources available to a project are influenced by the quality of products which are launched. The stronger the feedback of past product quality on PD resources (e.g., through CE and bug-fixing), the stronger are the capability erosion dynamics and the higher is the competitive value of PD in an industry. H4: The delay betweendevelopmentand observingthe qualityfeedback on PD resources increasesthe heterogeneityof PD organizationsacross an industry. The decoupling between the perceived and sustainable Resource Gap-Feature Release curves, and therefore the risk of falling into the adaptation trap depends on the delay in the effect 99 of old development quality on the current PD resources. The longer the delay between development and feedback of quality on resources, the stronger is the adaptation trap and the higher is the competitive value of PD in the industry. H5: Quality consequencesof architectureandproduct base increase the heterogeneityof PD organizations'performance in an industry. In industries where architecture and product base have a strong influence on the quality of current PD work, the dynamics of Error Creates Error and Bug-fixing are more salient and make it harder to sustain a successful PD process. Therefore we expect that the stronger the effect of current development on future architecture and product base, the higher would be the value of PD capability. Implications for Practice This study highlights important practical challenges for companies which rely on their product development capability. First, an important concern for the start-up companies is to get their product to the market as fast as possible, with the least resources they can afford. This is both because of limited availability of resources, and because of the importance of early presence in the competitive market. Consequently, the quality of their PD work receives lower priority than time to market and maximizing the feature release rate becomes the highest priority. However, this is the recipe for adaptation trap: at the early stages, when current engineering, bug-fixing, and other multiple-release dynamics are not active, the startup approach yields a very high output which is not sustainable for future releases. During this period, however, the norms of the PD organization are set and routines take shape based on potentially unsustainable practices that want to get the product out as fast as possible. These routines make it much harder during later releases to shift the organizing patterns of the PD process to sustainable arrangements that emphasize better process and higher quality. The challenge of managing a transition from the norms of quick-and-dirty design to sustainable quality work is especially salient when companies expand through acquisitions. In these settings, a large firm enters a new product market by acquiring a start-up in that market. If the product is multiple-release in nature, e.g., software, the parent company will need to support and encourage a transition from the start-up mindset to that of a successful multiple-release PD process. In the absence of such a transition, the new line of product will slip into firefighting as 100 soon as current engineering and bug-fixing feedbacks activate. In fact, the challenges that one product faces can then spill into other lines of development through the fact that resources are shared among different PD groups. Another important challenge in the light of the dynamics discussed in this study is the strategy for managing customer demands and expectations. One of the main customer demands is good product support after the sales, which includes current engineering. CE is the activity with the least value added, compared to development and bug-fixing, because it only solves the problems of one (or a few) customer(s), and does not influence the quality of future releases significantly. However, demand for CE is very salient in the face of customer interaction. Doing CE work solves an urgent problem, thereby it receives high visibility, and it can easily be rewarded because it draws attention from different parts of the company, including the service organization and the higher management, which want to see customer complaints addressed quickly. Therefore, in practice, the PD organization often gives higher priority to CE than warranted based on long-term considerations, thereby compromising the quality of future releases to fix the current problems in the field. This prioritizing challenge is ironic because a policy of ignoring customers' requests for firefighting may make them happier in the long run. A different type of tradeoff with regard to customer demand is in the inclusion of new features. It is generally assumed that including more features in the future releases of the product benefits customers. Contrary to this belief, however, beyond the tipping point, trying to incorporate more features will in fact reduce the output of the PD process. Under these conditions, canceling features may in fact be a favor to customers, since it helps the PD process to recover, to develop more features in total, and to get the next release out sooner and with better quality. Therefore the relationship between customer satisfaction and the efforts to add new features is contingent on the state of the product development organization: after some point, pushing harder does harm to the employees as well as to the customers. Finally, this study highlights the importance of evolution of the architecture and underlying product base. If the resource gap in early releases compromises quality of architecture and product base, the platform cannot last for long and will lose viability soon after Error Creates Error feedback is activated. From a behavioral perspective, this is a noteworthy risk: the feedback from quality of architecture and product base comes with long delays, and therefore the immediate payoff to invest in architecture design, bug-fixing, and refactoring is 101 low. Under the resource gap pressure, the firm is more likely to skip these activities than the CE, even though in the long run that can prove counter-productive. By elaborating on the sources of erosion of PD capability, we can gain insight into some practical ways to avoid the detrimental dynamics and to identify them before it is too late. The following points detail these practical implications: Setting the priorities in resource allocation between different activities: The allocation of resources between different stages becomes critical when these different stages can be used to achieve similar objectives. In a software setting, one can improve quality by having better architecture design, better development, better testing, improved current engineering, and additional bug fixing. However, the costs per unit of error avoided/fixed at each stage are very different (Boehm 1981) and increase as we go towards later phases. The tipping-point and adaptation-trap dynamics are reinforced by the mismatch between priorities given to these phases and their benefits. If priorities were set according to benefits, then under pressure, the most productive phases (in terms of quality of work and rework in next phases) would face the least resource shortage, and therefore create the least harm, minimizing the amount of rework and cost of fixing issues in later phases, and smoothing the inverse U-shaped curve. However, this often is not the case. For example, initial design often gets lower priority than it deserves according to its benefit in avoiding problems, and current engineering gets higher attention because of the customer pressure. Therefore setting the priorities according to their respective effectiveness in avoiding/solving problems is of high leverage. However, priorities often emerge from a set of norms, operating procedures, and incentive structures used in an organization. Changing them therefore requires reshaping some of the norms and procedures in the organization. The appropriate practices are context-specific, yet a few examples of useful or harmful norms can help clarify the idea further: * Taking responsibility for bugs one creates can be very effective in reducing the cornercutting and taking of shortcuts, thereby increasing the priority given to developing the software with high quality in the first place. A good example is Microsoft's practice of having daily builds in which the person whose code breaks the build is responsible for setting up the build the next day, as well as fixing his bugs immediately (Cusumano and Selby 1995). The practice is further reinforced by some nominal fines for the bugs, and even stronger social sanctions, such as putting on a special hat if one's code had bugs the day 102 before. This example shows how a series of elements (in this case "daily builds," "norm of fixing the bugs immediately," "norm of being responsible for build setup if you broke the build the day before," "norm of social sanction in favor of quality") should come together to create a coherent practice that changes the priority given to one activity (in this case initial development). * Root-cause analysis of problems referred from the field can tie the problem to development practices of individuals and thereby increase the priority of developing good code in the first place. In Beta they applied this to all important bugs which were not found until the product was in the field, used the results to find if the problem was a random mistake, a failure in the process used by the group, or a failure to use the correct process. The latter had the most negative effect on individual review, which had a clear message for members of the team: don't skip process. Note that this method is very hard to use for a product which is already in trouble, because root-cause analysis is time consuming and cannot be applied in a project team which is under a lot of pressure and is receiving hundreds of bugs from the field every month. * Considering customer feedback in evaluating developers increases the priority of customers and the current engineering activity, and therefore shifts the priority to CE rather than more efficient parts of the project. The current personnel evaluation practice in Alpha reinforces this problem: managers from different groups sit together and rank the project personnel. In this setting only one manager is really aware of the quality of the work done by each developer. The result is a process of alliance building and competition for attention in which developers who have helped the services organization, or have positive comments from customers, have a much better chance of success. In practice, the organization ends up with a very strong incentive for current engineering and firefighting to save the customers, and with little incentive to develop high-quality product in the first place, the recipe for falling into firefighting. Regulating the pressure: The pressure that results in cutting corners and shifting emphasis away from quality is central to tipping and adaptation trap dynamics. Therefore, regulating that pressure is one of the ways to avoid the detrimental dynamics. Pressure is often regulated through subtle cues in the interactions in the organization. Examples include management reaction to requests for vacation, work hours (especially those of the managers), conversations 103 around need for more work and coming deadlines, whether people are eating lunch in their offices, and so on. What picture of pressure is painted in these interactions, and how this picture is perceived with respect to importance of quality, are critical to the dynamics we discussed. The actions of the management have a strong role in creating this picture and their mental model of how pressure affects the output of the group is critical in shaping their actions. Therefore, discussing these dynamics explicitly can help increase awareness and modify the management stance on pressure. Moreover: · Knowledge of the pressure is often available on the ground, but the information gets filtered out in communication with higher management. People fear to be perceived as shirking responsibility, lazy, and weak, if they complain about pressure. Removing sources of communication break-down is potentially helpful for streamlining the communication of the excessive pressure. Potential tools include anonymous feedback, group sessions with an external facilitator, and role-modeling of pushing back on extra pressure. Note that these tools can only help create trust ifthe management team does not believe that extra pressure is always helpful. * Buffer time: Putting an explicit buffer period at the end of the project (or at the end of milestones in the middle of the project) can help remove the pressure and encourage people to spend time on quality. The period should not be assigned for doing any task; rather, it should be there to make sure that people don't feel overwhelmed by upcoming deadlines, that all problems are satisfactorily fixed, and that unexpected events don't catch the team off guard. Portfolio management: The relationship between different releases, as well as different products, through resource sharing, often makes tipping and adaptation trap dynamics contagious. Therefore, if problems are at portfolio level, the leverage points would also be at this level rather than at project level: if pressure persists at the portfolio level, most project-level efforts to alleviate pressure will be futile. A critical, and common, error at portfolio level is to overwhelm the PD organization with too many projects. Having a good sense of the real capacity of the PD organization is hard to build, and it is easier to start a project than to kill it; accordingly, there is often a bias towards having too many projects. A few ideas can help avoid or at least alleviate this problem: 104 · Measurement of the organization 's capacity: Historical data can be a good starting point to measure the overall capacity of the development organization. Such measurement should take into account not only the initial design and development, but also all the post-design activities of current engineering, bug fixing, refactoring, etc. In software, measures of work such as lines of code, number of changes to the code base (delta count), and numbers of new modules exist. These measures each have their own drawbacks, yet they are better than the alternative of having no measure of the work. Capacity can then be calculated based on the number of people used historically to finish some specific amount of work, as well as its corresponding post-development activities. A growth bias should be carefully included in such measurement: if the organization is growing, the post-development activities are not at their equilibrium level; rather, they lag behind the development. Therefore a simple scaling of the past to future will downplay the significance of these activities, and will surprise the organization when the CE work continues to grow while the new development has come to equilibrium. · Strong gates to start new initiatives: It is much easier and less costly not to initiate projects than to pull the plug on them later on. Having established gates to initiate new projects is therefore crucial for successful management of the portfolio of projects. The current and expected capacity/resource balance should be a strong part of consideration in allowing a new project to start. · Having a resource buffer: Having a resource buffer helps avoid overwhelming the organization, can allow for timely introduction of critical projects, and can help identify the movement of the organization towards tipping point. A resource buffer can be best implemented by adding a safety factor (on the order of 10-20%) to the bottom-up estimates of finish dates. Besides the pressure regulation effects of such a buffer (discussed above), it allows for the organization to avoid going over the tipping point, as a result of a regular crisis in resource availability, customer demands, or technological change. Moreover, the buffer allows for acceptance of strategically important time-sensitive projects before enough resources are released for them through closing other active initiatives or acquiring more resources. Finally, when the project level buffers show signs of being filled up, it is a timely sign that the project loads are mounting beyond the capacity. 105 Monitoring the practices on the ground: The tipping dynamics are created because the practices that help build quality are abandoned in favor of working faster. Therefore, monitoring the practices used on the ground is required for detecting any potentially detrimental practice shift and avoiding it before it is too late. Monitoring design and development practices is more useful than monitoring CE load or other downstream activities, because it is too late to find out about deterioration of quality when the product is out in the market, and the organization may already have tipped into firefighting. The following issues need be considered in the monitoring process: · A subtle dimension of the dynamics we discussed is that quality practices may change only a little due to pressure, but the standards of acceptable practice also change based on the practices used, and therefore the eroding standards allow for poorer performance in future. Such standard adjustment amplifies the effect of pressure on quality and makes it harder to detect. Empirical evidence suggest that this standard adjustment is asymmetrical, i.e., the quality standards erode much faster than they build up; therefore the practices can degrade following random, small changes in the pressure (Oliva and Sterman 2001). Consequently, there is a need for external standards for practices used in the development organization. The list of important practices is context-specific, yet such a list should be created, and explicit standards need to be set for each practice based on benchmarking and best practice studies. * The measurement of these practices is not straightforward. Often people feel unsafe and don't reveal what is against formally advised practices. Therefore, providing a safe environment in which to discuss these issues is very important. A helpful consideration in creating such an environment is to separate measurement and feedback/incentive processes. To avoid measurement biases, the workers need to be sure that their response does not have any impact on their evaluation and incentive structure. For example, it is possible to assign the review of the practices to an outside team, e.g., the research arm of the organization. Of course, the usefulness of such monitoring depends on how seriously it is used by the management group in the organization. This study has several limitations which open up room for further research. First, the model is built based on two case studies in the software industry and therefore not all potential processes that matter to dynamics of product development could be observed. For example, the CE effect is very strong in software, but not so strong in the case of automobile development, 106 while manufacturability is central in the latter case but not important in software. Consequently, the generalizability of the conclusions cannot be decided until further empirical work with a larger number of PD organizations is conducted. The hypotheses developed in the discussion section offer one avenue for such studies. Another question that merits further attention is the persistence of the firefighting dynamics. These dynamics persist despite the time for learning, multiple chances to experience different policies (multiple releases), and high stakes for the PD organization. Some of the discussions in this essay allude to this persistence question -- for example, the contagion of dynamics from one release to another -- yet a deep understanding of this question is crucial if a PD organization wants to enhance its learning from experience. Finally, beyond generalizations about resource gap and quality, this study did not go into details of remedies for firefighting dynamics and what needs be done to avoid them. For example, one may expect loosely structured processes for development to increase the effect of pressure on quality. However, it is possible to combine flexibility and high quality by frequent synchronization of different development activities and providing quick feedback on quality of recent work (Cusumano 1997). Recognizing and designing such processes is of great interest to practitioners who need realistic ways for avoiding the adaptation trap and getting out of firefighting, rather than a description of how one may fall into these dynamics. 107 References: AbdelHamid, T. and S. Madnick. 1991. Software project dynamics: An integrated approach. Englewood Cliffs, NJ, Prentice-Hall. Amit, R. and P. J. H. Schoemaker. 1993. Strategic assets and organizational rent. Strategic Management Journal 14(1): 33-46. Barney, J. 1991. Firm resources and sustained competitive advantage. Journal of Management 17(1): 99-120. Black, L. J., P. R. Carlile and N. P. Repenning. 2004. A dynamic theory of expertise and occupational boundaries in new technology implementation: Building on barley's study of ct scanning. Administrative Science Quarterly 49(4): 572-607. Boehm, B. W. 1981. Software engineering economics. Englewood Cliffs, N.J., Prentice-Hall. Brown, S. L. and K. M. Eisenhardt. 1995. Product development - past research, present findings, and future-directions. Academy of Management Review 20(2): 343-378. Burgelman, R. A. 1991. Intraorganizational ecology of strategy making and organizational adaptation: Theory and field research. Organization Science 2(3): 239-262. Cooper, K. G. 1980. Naval ship production: A claim settled and a framework built. Interfaces 10(6). Cusumano, M. A. 1997. How microsoft makes large teams work like small teams. Sloan Management Review 39(1): 9-20. Cusumano, M. A. and K. Nobeoka. 1998. Thinking beyond lean: How multi-porject management is transformingproduct developmentat toyota and other companies.New York, Free Press. Cusumano, M. A. and R. W. Selby. 1995. Microsoft secrets: How the world's most powerful software company createstechnology,shapes markets, and managespeople. New York, Free Press. Cyert, R. M. and J. G. March. 1963. A behavioral theory ofthefirm. Englewood Cliffs, N.J.,, Prentice-Hall. Dierickx, I. and K. Cool. 1989. Asset stock accumulation and sustainability of competitive advantage. Management Science 35(12): 1504-1511. Dosi, G., R. R. Nelson and S. G. Winter. 2000. The nature and dynamics of organizational capabilities. Oxford; New York, Oxford University Press. Eisenhardt, K. M. 1989. Building theories from case study research. Academy of Management Review 14(4): 532-550. Eisenhardt, K. M. and S. L. Brown. 1998. Time pacing: Competing in markets that won't stand still. Harvard Business Review 76(2): 59-+. Eisenhardt, K. M. and J. A. Martin. 2000. Dynamic capabilities: What are they? Strategic Management Journal 21(10-11): 1105-1121. Ford, D. N. and J. D. Sterman. 1998. Dynamic modeling of product development processes. System Dynamics Review 14(1): 31-68. Forrester, J. W. 1961. Industrial dynamics. Cambridge, The M.I.T. Press. Gawer, A. and M. A. Cusumano. 2002. Platform leadership . How intel, microsoft, and cisco drive industry innovation. Boston, Harvard Business School Press. 108 Henderson, R. and 1. Cockburn. 1994. Measuring competence - exploring firm effects in pharmaceutical research. Strategic Management Journal 15: 63-84. Henderson, R. M. and K. B. Clark. 1990. Architectural innovation - the reconfiguration of existing product technologies and the failure of established firms. Administrative Science Quarterly 35(1): 9-30. Krishnan, V. and S. Gupta. 2001. Appropriateness and impact of platform-based product development. Management Science 47(1): 52-68. Krishnan, V. and K. T. Ulrich. 2001. Product development decisions: A review of the literature. Management Science 47(1): 1-21. Leonard-Barton, D. 1992. Core capabilities and core rigidities - a paradox in managing new product development. Strategic Management Journal 13: 111-125. Levinthal, D. A. 1997. Adaptation on rugged landscapes. Management Science 43(7): 934-950. Levitt, B. and J. G. March. 1988. Organizational learning. Annual Review of Sociology 14: 319340. Lovas, B. and S. Ghoshal. 2000. Strategy as guided evolution. Strategic Management Journal 21(9): 875-896. MacCormack, A., C. F. Kemerer, M. Cusumano and B. Crandall. 2003. Trade-offs between productivity and quality in selecting software development practices. Ieee Software 20(5): 78-+. Meyer, M. H., P. Tertzakian and J. M. Utterback. 1997. Metrics for managing research and development in the context of the product family. Management Science 43(1): 88-111. Milgrom, P. and J. Roberts. 1990. The economics of modem manufacturing - technology, strategy, and organization. American Economic Review 80(3): 511-528. Molokken, K. and M. Jorgensen. 2003. A review of surveys on software effort estimation. IEEE International Symposium on Empirical Software Engineering, Rome, Italy. Morecroft, J. D. 1983. System dybnamics: Portraying bounded rationality. OMEGA 11(2): 131142. Morecroft, J. D. W. 1985. Rationality in the analysis of behavioral simulation models. Management Science 31(7): 900-916. Muffatto, M. 1999. Introducing a platform strategy in product development. International Journal of ProductionEconomics60-1: 145-153. Nelson, R. R. and S. G. Winter. 1982. An evolutionary theory of economic change. Cambridge, Mass., Belknap Press of Harvard University Press. Oliva, R. and J. D. Sterman. 2001. Cutting corners and working overtime: Quality erosion in the service industry. Management Science 47(7): 894-914. Peteraf, M. A. 1993. The cornerstones of competitive advantage - a resource-based view. StrategicManagementJournal 14(3): 179-191. Putnam, D. 2001. Productivity statistics buck 15-year trend. QSM articles andpapers. Repenning, N. P. 2000. A dynamic model of resource allocation in multi-project research and development systems. System Dynamics Review 16(3): 173-212. Repenning, N. P. 2001. Understanding fire fighting in new product development. The Journal of ProductInnovationManagement18:285-300. Rivkin, J. W. 2001. Reproducing knowledge: Replication without imitation at moderate complexity. Organization Science 12(3): 274-293. Rudolph, J. W. and N. P. Repenning. 2002. Disaster dynamics: Undrstanding the role of quantity in organizational collapse. Administrative Science Quarterly 47: 1-30. 109 Sanderson, S. and M. Uzumeri. 1995. Managing product families - the case of the sonywalkman. Research Policy 24(5): 761-782. Sanderson, S. W. and M. Uzumeri. 1997. The innovation imperative : Strategies for managing product models andfamilies. Chicago, Irwin Professional Pub. Sauer, C. and C. Cuthbertson. 2004. The state of it project management in the uk 2002-2003. Oxford, University of Oxford: 82. Simon, H. A. 1979. Rational decision making in business organizations. American Economic Review 69(4): 493-513. Sterman, J. 2000. Business dynamics: Systems thinking and modeling for a complex world. Irwin, McGraw-Hill. Teece, D. J., G. Pisano and A. Shuen. 1997. Dynamic capabilities and strategic management. StrategicManagementJournal 18(7): 509-533. Wernerfelt, B. 1984. A resource-based view of the firm. Strategic Management Journal 5(2): 171-180. Williamson, O. E. 1999. Strategy research: Governance and competence perspectives. Strategic Management Journal 20(12): 1087-1108. Winter, S. G. 2003. Understanding dynamic capabilities. Strategic Management Journal 24(10): 991-995. Zajac, E. J. and M. H. Bazerman. 1991. Blind spots in industry and competitor analysis implications of interfirm (mis)perceptions for strategic decisions. Academy of Management Review 16(1): 37-56. 110 Appendix A- Detailed model formulations The following table lists the equations for the full model. To avoid complexity, all switches and parameters used to break feedback loops are removed from the equations. Allocated Bug Fix Resources = Allocated Resources[BugFix] Units: Person Allocated Resources[Function] = Min(Desired Resources[Function]/SUM(Desired Resources[Function!])*Total Resources, Desired Resources[Function]) Units: Person Desired Resources[Development] = Des Dev Resources Desired Resources[CurrentEng] = Desired Resources for Current Engineering Desired Resources[BugFix] = Desired Bug Fix Resources Development Resources Allocated = Allocated Resources[Development] CE Resources Allocated = Allocated Resources[CurrentEng] Allocated Bug Fix Resources = Allocated Resources[BugFix] Average size of a new release = 40 Units: Feature Bug Fixing = Allocated Bug Fix Resources*Productivity of Bug Fixes Units: Feature/Month Des Dev Resources = Features Under Development / Desired Time to Develop / Productivity / (-Normal Error Rate*Frac Error Caught in Test) Units: Person Desired Bug Fix Resources = Errors in Base Code / Time to Fix Bugs / Productivity of Bug Fixes Units: Person Desired Resources for Current Engineering = Sales*(Errors in New Code Developed+Effective Errors in Code Base)*Fraction Errors Showing Up / Productivity of CE Units: Person Desired Time to Develop = 10 Units: Month Dev Work Pressure = if then else (Resource Pressure Deficit < 0.99, Resource Pressure Deficit, ZIDZ (Des Dev Resources, Development Resources Allocated)) Units: Dmnl Eff Architecture and Code Base on Error Rate = (Modularity Coefficient*"Tl Eff Arc & Base on Error"(Fraction Old Features Defective)+(l -Modularity Coefficient)*Max (1, Errors in Base Code)) Units: Dmnl Eff Pressure on Error Rate = TI Eff Pressure on Error Rate (Dev Work Pressure) Units: Dmnl Effect of Pressure on Productivity = TI Eff Pressure on Productivity (Dev Work Pressure) Units: Dmnl Effective Errors in Code Base = Min (Effective Size of Old Release Portion, Old Features Developed)*Fraction Old Features Defective Units: Feature Effective Size of Old Release Portion = 0 Units: Feature Error Becoming Old = Features Becoming Old*Fraction New Features Defective Units: Feature/Month Error Frac in Released Feature = Error Fraction*(l-Frac Error Caught in Test) / (I-Error Fraction*Frac Error Caught in Test) Units: Dmnl Error Fraction = Min (1, Normal Error Rate*EffArchitecture and Code Base on Error Rate*Eff Pressure on Error Rate) Units: Dmnl 111 Error Generation = Feature Release*Error Frac in Released Feature Units: Feature/Month Errors in Base Code = INTEG(Error Becoming Old-Bug Fixing, Errors in New Code Developed*Time to Fix Bugs / Desired Time to Develop) Units: Feature Errors in New Code Developed = INTEG(Error Generation-Error Becoming Old, Normal Error Rate*(l-Frac Error Caught in Test)*New Features Developed / (l-Normal Error Rate*Frac Error Caught in Test)) Units: Feature External Shock = 0 Units: Feature Feature Addition = Fixed Feature Addition+PULSE (Pulse Time, Pulse Length)*External Shock / Pulse Length Units: Feature/Month Feature Release = Rate of Development*Fraction of Work Accepted Units: Feature/Month Features Becoming Old = New Features Developed / Time Between Releases Units: Feature/Month Features Under Development = INTEG(Feature Addition-Feature Release, Fixed Feature Addition*Desired Time to Develop) Units: Feature FINAL TIME = 100 Units: Month Fixed Feature Addition = 4 Units: Feature/Month Frac Error Caught in Test = 0.9 Units: Dmnl Fraction Errors Showing Up = 0.1 Units: Dmnl Fraction New Features Defective = ZIDZ (Errors in New Code Developed, New Features Developed) Units: Dmnl Fraction of Work Accepted = I-Error Fraction*Frac Error Caught in Test Units: Dmnl Fraction Old Features Defective = ZIDZ (Errors in Base Code, Old Features Developed) Units: Dmnl Modularity Coefficient = 0.5 Units: Dmnl New Features Developed = INTEG(Feature Release-Features Becoming Old, Average size of a new release) Units: Feature Normal Error Rate = 0.4 Units: Dmnl Old Features Developed = INTEG(Features Becoming Old, Initial Old Features) Units: Feature Productivity = 0.1 Units: Feature/(Person*Month) Productivity of Bug Fixes = Productivity*Relative Producitivty of Bug Fixing Units: Feature/(Person*Month) Productivity of CE = Productivity*Relative Productivity of CE Units: Feature/(Person*Month) Pulse Length = I Units: Month Pulse Time = 10 Units: Month Rate of Development = Development Resources Allocated*Productivity*Effect of Pressure on Productivity Units: Feature/Month Relative Producitivty of Bug Fixing = 0.5 Units: Dmnl Relative Productivity of CE = 1 Units: Dmnl Resource Pressure Deficit = SUM (Allocated Resources[Function!]) / Total Resources Units: Dmnl Sales = 10 Units: /Month Time Between Releases = Average size of a new release / Feature Release Units: Month TIME STEP = 0.125 Units: Month 112 Time to Fix Bugs = 9 Units: Month [0,20] "TI Eff Arc & Base on Error" ([(0,0)(1,2)],(0,1),(0.13,1.13),(0.28,1.34),(0.42,1.68),(0.54,1.84),(0.72,1.94),(1,2)) Units: Dmnl TI EffPressure on Error Rate ([(0.5,0)-(2,2)],(0.5,0.8),(1,1),(1.15,1.1 1) ,(1.29,1.33),(1.40,1.54),(1.53,1.74),(1.68,1.89),(1.99,2)) Units: Dmnl T1 Eff Pressure on Productivity ([(0,0)-(2,1.5)],(0,0.6),(0.26,0.61),(0.45,0.64) ,(0.64,0.73),(0.830.84),(1,1),(1.22,1.2),(1.43,1.28),( 1.77,1.32),(2,1.33)) Units: Dmnl Total Resources = 100 Units: Person Appendix B- The maximum PD capacity With a few assumptions, we can draw the analytical expressions for long-term capacity of the product development organization as modeled in the essay. These assumptions include: - That products remain in the field for a fixed time - Product modularity is not very high, therefore we need to keep the absolute level of problems in the code base and architecture low. The logic for deriving the capacity is simple. We can write the total resources needed for development, CE, and bug-fixing in terms of Feature Release and Resource Gap. Since in this model, allocation is proportional to request, the ratio of resources requested to those allocated are the same across the three functions and equal resource gap. Therefore we can find the resources allocated to each activity in terms of resource gap and Feature Release. Consequently, knowing that when resource gap is above 1, the total resources allocated equal the total resources available, we can find Feature Release in terms of resource gap, which is the equation we need to find the optimum capacity. Formally, we can write resources allocated to each activity: RD~ FR fp (RG).PN.(1- R. eN fe (RG).fr 7 ) FR.eN.fe(RG).(1- fr7 ).k * 1 PN.(1-eNfe(RG).fr7).RP(,, _ FR.eN.fe (RG).(1-fr7.) (4) RG * 1 PN.(1-eN.fe(RG).frr).RP,, RG And note the conservation of resources: R =RD + RCE+RBF (6) 113 Plugging the equations 3-5 into equation 6, and solving for FR, we find the following relationship that describes Feature Release as a function of resource gap: _ FR R'PN (1RG fp(RG) fe (RG).frT ).RG +eN.f,(RG).(l- frT).[ k RPE + 1 (7) RPF Note that, as expected, in the special case where bug-fixing and current engineering don't need any resources (e.g., by letting RPCEand RPBF go to infinity), the equation 7 is the same as equation 2, as discussed in the text. To find the maximum FR rate, one needs to take the derivative of FR with respect to RG (for which we need specific fe and fp functions) and equate it to zero. Finding the optimum RG (through numerical or analytical solution), we can plug it back into equation 7 to find the optimum level of capacity. Appendix C- Boundary conditions of the model Here we review the main assumptions that different dynamics in the simulation model outlined above depend on. This allows us to define the boundary conditions within which the conclusions of the model are robust. We first discuss what assumptions each main dynamic is sensitive to, and then elaborate on effects of different resource allocation schemes that can be used for allocating resources between development, current engineering, and bug fixing activities. The tipping dynamics: In the tradeoff between productivity and quality, the tipping dynamics exist only if the aggregate function has an inverse U-shape relationship between pressure and total output. In the case where there is no current engineering and bug fixing, the inverse U-shape function exists if and only if the following function has a maximum inside its feasible region (rather than on the boundaries): f,, (RG).(I - eN fe (RG).fr ) where fp is the function for effect of pressure on productivity, eN is the normal error rate, the fe() is the function for effect of pressure on error rate, and the frT is the fraction of error caught in testing. Since the two functions f and fp are not defined explicitly, there is no general rule one can derive to specify the boundary conditions for existence of such a maximum. Yet a few general patterns are observable: 114 For the general class of functions fp=l+a.(RG-I) and fe= l+b.(RG-l1), the equation simplifies to a second-order expression. Finding the maximum of that expression, and putting the maximum in the boundary condition that the error rate should remain below I (eN.(l+b.(RG-l))<l), we get the general condition for existence of inverse U-shape function: eN.(1-b/a)+l/frT <2 This is a relatively weak assumption since a<l (a= means we can completely compensate any resource gap by working harder) and frT is often not so small (we find a significant fraction of errors before releasing the product). Therefore for this general class of functions, the inverse U-shape assumption is realistic for many real-world settings. The higher values of normal error (eN) rate and fraction of errors caught in testing (frT) increase the possibility that the shape of the function is inverse-U. In other words, if the process has a high error rate, and if we have a good testing capacity, we are more prone to the tipping dynamics. Note that the latter conclusion does not necessarily hold when we include the CE dynamics (see below). The smaller the slope of fp and the larger the slope of fe(), the higher are the chances of having inverse U-shaped functions. In other words, if the effect of pressure on productivity is small, and the effect on quality is strong, the tipping dynamics are more applicable. Tipping point dynamics are impacted by the presence of CE and bug-fixing. Basically, the introduction of CE and bug-fixing increases the costs of low-quality work, so that not only the cost of rework plays a role in the output function (the (1 - eN.fe (RG).fr ) term in the feature release equation becomes important), but also the cost of fixing the bugs at customer sites or later in the code base haunts the development organization (he eN.fe (RG).(l - fr7 ).[ k I + RP ] term in the denominator of feature release; see RP(_ RPBF Appendix A). As a result, the total amount of work which can be managed at higher levels of pressure further decreases. Moreover, this term introduces an effect for poor testing in reinforcing the tipping dynamics; therefore we can no longer avoid the tipping easily by having a poor test. In short, the introduction of CE and bug-fixing exacerbates the tipping dynamics and increases the chance that we observe an inverse U-shaped curve between pressure and output. 115 However, the strength of CE/BF effect on tipping dynamics depends on the effect of pressure on quality; in other words, if quality is not impacted by pressure at all, the CE and BF dynamics CANNOT produce a tipping point alone. The adaptation trap The adaptation trap is created when the quality feedback of current development comes with some delay in the form of extra work to be managed later. The result is that an increase in current development not only requires an initial increase in resources to manage such development, but also will require further resources to be allocated later (to do current engineering and bug fixing). This dynamic exists as long as we have a significant CE or bug fixing component in the work. Moreover, the strength of adaptation depends on the following considerations: - How fast the pressure can change, compared to how delayed the CE and BF feedback is. If the change in pressure is faster than the feedback, the adaptation trap is more important. Specifically, if the CE feedback arrives with significant delays (e.g., long testing and sales lead times separate initial development and observation of CE feedback), the adaptation trap is a significant danger to the organization. Alternatively, if CE/BF feedback comes fast, the management team will have little time to be misled by apparently productive pressure increases. - Strength of CE/BF effect influences the adaptation trap in two ways. First, if CE/BF is in general a large proportion of work, then adaptive increases in pressure, and consequently in output, bring more significant surprises later, when the CE feedback becomes apparent. This proportion is captured by the term k RP.^ + 1 ~ in the simple model. The RP,, second effect is that the proportion of CE/BF effect is itself impacted by the quality of work, i.e., low-quality work takes more CE resources per feature released. Therefore adaptive increases in pressure not only increase the feature release, but also increase the error rate and therefore the relative importance of CE/BF feedback. In short, the significance of adaptation trap depends both on the magnitude of CE/BF effects and the strength of effect of pressure on error rate. While the nature of work in many industries limits the former to small levels and therefore reduces the importance of the discussed dynamics, the processes used to manage the work can impact the latter factor and create 116 heterogeneity in presence of the adaptation trap within an industry or even an organization. Bug fixing and building on errors Bug fixing dynamic is at core similar to CE effect, in the sense that it is created based on the quality of work done in the past, and will take resources away from working on the current release. Its strength, however, depends on some different factors. The main value in doing the bug-fixing/architecture refactoring work is to avoid degradation of quality of future features, and to some extent to avoid old bugs coming out in new releases. Therefore the importance of BF activities depends on the extent to which such degradation matters: - If a long life is predicted for a platform, BF dynamics become more important, since the development organization needs a reliable and productive platform for future releases. However, if the platform is at the end of its life cycle, for the last couple of releases the BF effect becomes less of an issue. This consideration, however, does not have an impact on the "error creates error" dynamic. - To the extent that CE activity can fix the bugs in the product base, the importance of BF activities goes down. In other words, if by fixing the problem for one customer, we can also fix the problem for all future releases and customers, then BF activities remain limited to work on the overall architecture. However, if CE activity is very customized, then separate work needs to be done on the product base to avoid similar failures in future releases of a product, increasing the importance of BF. - The modularity of the architecture has an important impact on significance of "error creates error" dynamics (where the quality of new modules is impacted by the quality of the product base). In a modular architecture, the new pieces have few interactions with old parts, and therefore their quality is largely independent of those parts. Consequently, the "error creates error" feedback does not play a significant role unless the density of error in the product base increases. On the other hand, an integral architecture increases the chances that a new module's quality is impacted by the quality of any other module in the product base, and therefore increases the importance of "error creates error" dynamics. 117 Resource allocation: In the model we have assumed a proportional resource allocation scheme between development, CE, and BF activities. Different allocation strategies have an impact on the results. If we move towards increasing the priority of development activity, the effect of CE and BF start to lose importance, and therefore: - The tipping dynamics weaken (but don't go away; see the discussions above under the tipping dynamics). - The adaptation trap weakens and can potentially go away if development gets all its required resources, before we allocate anything to current engineering or bug fixing. - The impact of error creates error dynamics increases, because spending less resources on bug fixing, we leave the problems to accumulate and impact the quality of current development more significantly. If we move towards increasing the priority of CE: - The tipping dynamics aggravate, since a bigger fraction of resource is sent to CE at higher error rates, increasing the downward slope of the inverse U-shaped curve. - The adaptation trap becomes more significant. - The error creates error dynamics can be impacted differently, depending on parameter settings: on the one hand, the fixing of bugs in the code base through CE can help alleviate the error creates error feedback. On the other hand, the increased pressure on development phase as a result of CE priority will actually increase the error rate in the first place and therefore the magnitude of errors that need be fixed through BF. Finally, higher CE priority will take away resources from BF activity. In practice, it is often unlikely to give higher priority to BF, because it is the least salient, and not productive in either producing new features or keeping the customers happy. Nevertheless, under such practice, the effects on tipping dynamics and adaptation trap are similar to increasing CE priority. The effect on "error creates error" dynamic is positive. In Sigma, product Alpha was tilting towards giving a higher priority to CE, and Beta was close to the proportional allocation scheme used in the model. In this essay we used the proportional 118 formulation, which takes a conservative stance with respect to significance of the main dynamics discussed in the study. 119 Appendix 1-1- Technical material to accompany essay 1: Effects of Feedback Delay on Learning The following supplementary material includes the detailed formulations of the four learning algorithms discussed in the essay, an alternative method for measuring the speed of internal dynamics of the models, the statistics for parameters and results of robustness analysis, formulations for Q-learning algorithm tested for comparison with the four learning models discussed in the text, as well as instructions for inspecting and running the simulation learning models discussed in the essay and the Q-learning model. Mathematics of the models This appendix includes the formulation details for all four learning algorithms; Regression, Correlation, Myopic Search, and Reinforcement; as well as the exploration procedure. Please see the online material on author's website for the simulation models and applets that allow you simulate the models and run different experiments. Table 1- Reinforcement Learning algorithm Reinforcement Learning Algorithm Variable Formulation d Vj I(t)- Gj(t) I, (t) n p(t)Rei forcemenetPower * G, (t) V, (t) / Reinforcement Forgetting Time A (t) Table 2- Myopic Search algorithm Myopic Search Algorithm Variable d () Formulation ((V*(t) - Vj(t))/AMyo* Direction(t) Direction(t) I 0 Historical Smooth(P(t),HistoricalAveraging Time Horizon) Payoff(t) Where smooth function is defined by: d (smooth(x, r)) = x if P(t) > Historical Payoff(t) if P(t) • Historical Payoff(t) __ _ _ _ dt - 120 - smooth(x,r) Initial value of smooth(x, r) is equal to initial x unless explicitly mentioned V* (t) . FractionofActions(t) * Vj(t) Where FractionofActions (t) is: A.(t) Table 3Correlation algorit(t) Table 3- Correlation algorithm Correlation algorithm Variable Formulation d dt (Vt)-VJ(t))/2 (V(t) V (t) V, (t).f(Action _ PayoffCorrelationj (t)) Where f(x) is: e 2*x 2.(1 + e2x ) and y is Sensitivity ofAllocations to Correlations Action _ PayoffCorrelation j (t) Action _ PayoffCo var iancej (t) A ctionVariancej(t) * Payoffariance(t) Action_ PayoffCovariancej(t) smooth(ChangeinAction,(t) * ChangeinPayoff(t), ActionLookupTimeHorizon) Action Variance, (t) smooth(ChangeinAction j (t) 2 , ActionLookupTimeHorizon) PayofflYariance(t) smooth( ChangeinPayoff 2 (t) , ActionLookup TimeHorizon) ChangeinActionj (t) Aj (t) - smooth(Aj (t), ActionLookupTimeHorizon) ChangeinPayoff (t) P(t)- smooth(P(t), ActionLookupTimeHorizon) Table 4- Regression algorithm Regression Algorithm Variable d Vj(t) Vi ' (t) Formulation (Vj*(t) - V (t))/2 Max(aj' (s aj (s),O), I~~~~~~~~~~~~~ 121 if at least one aj (s) is non-zero otherwise V,*(t) V 1(t) s = E* INT(t/E) where E is the evaluation period (the decision maker runs the regression model every E time periods). INT function gives the integer part of t/E a (s) = Max(O,a '(s)) aj* (S) The ax '(s) are the estimates of a, in the following OLS regression model: log(P(t)) = log(a0 ) + E cd,* log(Aj(t)) +e(t) The regression is run every Evaluation Period (EP) periods, using all data between time zero and the current time (every time step is an observation). Table 5- Exploration procedure Exploration Variable Formulation V,(t)Vj (t)+ N(t) Nj (t) Nj (t). Vj (t) J Var (Nj (t)) S.D. Selector(t) MinimumAction StandardDeviation+(MaximumAction Standard Deviation-MinimumAction StandardDeviation)*S.D. Selector(t) Min(1,Max(O,I/SlopeInverseforS.D. Selection*(Recent Improvement(t)/MinImprovement-I)) Where Recent Improvement equals: Smooth(Max(O, P(t)-Historical Payoff(T)), PayoffmprovementUpdatingTime),Initial Recent Improvement=l Speed of internal dynamics of the model In the essay we used the convergence times of different models in the base case as a proxy for the speed of internal dynamics of the models. An alternative measure which does not directly depend on our implementation of convergence criteria can be obtained by fitting a first order negative exponential curve to the base case (no delay) average performances of the algorithms. Following the practice in control theory, we measure four times the time constant of the fitted curve, which gives the time needed for the model to settle in the 2% neighborhood of final value. We use this as a sensible measure for the time needed for the organization to settle down into a 122 policy, and therefore a good measure of the speed of internal dynamics of the model. The following table shows the values of this measure for different algorithms: Table 6- The alternative measure for speed of internal dynamics. These results are consistent with the premise that delays tested in the model are not too long compared to internal dynamics of the system. Statistics for robustness analysis parameters and results The independent parameters in the regressions included all those changed randomly for the Monte-Carlo simulation analysis, picked from uniform distributions, reported below. For each parameter we report the low, high, and base case values, as well as the description of the parameter. In table 7 we also report the summary statistics for the dependent variables of the regressions. Table 7- Parameter settings for sensitivity analysis Parameter Algorithm Low High Bas Description Payoff Generation T1 All Delay[activity 1] 0 Perceived Payoff Generation Delaylactivityll Action Noise Correlation Time Value Adjustment Time Constant T All 0 S All I A Regression 3 Correlation Value Adjustment A Myopic Time Constant I Maximum Action All Standard Deviation Minimum Action Standard Deviation All 0.5 20 0 How long on average it take for resources allocated to activity I to become effective and influence the payoff 20 0 The decision maker's estimate of the delay between resources allocated to activity I and the payoff. 9 3 The correlation time constant for the pink _ noise used for model exploration 20 10 How fast the activity value system moves towards the indicated policy 8 2 How fast the activity value system moves towards the indicated policy 0.05 0.2 0.1 What is the standard deviation of exploration noise, when the exploration is most vigorous 0 0.05 0.01 What is the standard deviation of exploration noise, when the exploration is at minimum 123 Minimum Improvement of Payoff All Slope Inverse for Standard Deviation All Payoff Improvement Updating Time All 3 20 5 Action Lookup Time Horizon Correlation 2 10 6 0.00 0.02 0.01 Normalizing factor for changes in payoff. 5 Changes lower than this will trigger minimum standard deviation for exploration 0 20 9 Inverse of the slope for relating recent improvements to standard deviation of exploration. Higher values suggest smoother transition from high exploration to low. The time constant for updating the recent payoff improvements (which is then normalized to determine the strength of exploration) The time horizon for calculating correlations between an activity and the payoff Sensitivity of Allocations to correlations Reinforcement Forgetting Time Reinforcement Power Evaluation Period Correlation 0.05 0.8 Reinforcem 3 ent Reinforcem 4 ent E Regression 1 20 20 9 0.2 How aggressive the activity values (and therefore allocations) respond to the perceived correlations between each activity and the payoff 10 The time constant for forgetting past activity values. 12 How strongly the differences in payoff will be reflected in action value updates. 3 How often a new regression is conducted to recalculate the optimal policy Table 8- Summary Statistics for Dependent variables Variable Achieved Payoff Percentage[Rgr] Achieved Payoff Percentage[Crr] Achieved Payoff Percentage[Myo] Achieved Payoff Percentage[PfR] Observations 3000 Mean Std Dev Median 87.2 17.8 93.0 3000 75.6 25.4 83.3 3000 79.6 20.2 85.0 3000 81.7 18.7 86.7 Distance Traveled[Rgr] 3000 18.1 11.5 15.1 Distance Traveled[Crr] Distance Traveled[Myo] Distance Traveled[PfR] 3000 3000 3000 17.5 18.4 18.0 11.0 12.2 12.3 14.8 15.4 14.7 124 Convergence Convergence Convergence Convergence Probability[Rgr] Probability[Crr] Probability[Myo Probability[Pfr] ] 3000 3000 3000 3000 0.236 0.122 0.178 0.200 0.424 0.328 0.383 0.400 0 0 0 0 A Q-learning model for resource allocation learning task Among different learning algorithms we considered in the original design of this study, a Q-learning algorithm taken from machine learning literature had some interesting properties to make it a potential candidate for inclusion in the original essay. However, it was excluded from the final draft under space constraints, its relative inefficiency in comparison with other models discussed for a comparable number of trials, and complexity considerations. Here we briefly discuss this algorithm because it gives a better insight into optimization-oriented learning algorithms available in machine learning and artificial intelligence literatures, it has been introduced into organizational learning literature (Denrell, Fang et al. 2004), and it helps build a more concrete mathematical framework to relate complexity of learning to action-payoff delays. In comparison with other reported models, this algorithm has a relatively high level of information processing and rationality; however, it does not use any cognitive search, i.e., it assumes no knowledge of the payoff landscape. Consequently the algorithm is quite general, but is not optimal for the payoff landscape at hand. In the development of this model we follow the general formulation of Q-learning (Watkins 1989) as discussed by Sutton and Barto (Sutton and Barto 1998). Typical reinforcement learning algorithms in machine learning (the general category to which Q-learning algorithm belongs) work based on a discrete state-space and use discrete time. Therefore, to adapt this algorithm to our task, which has both continuous time and action space, we break state-action space of our problem into discrete pieces and use a discrete time in the simulations reported here. Also, here we only report the development of the Q-learning model for the case with no delays. As the analysis reported here and the discussions regarding the extension of the model into delayed feedback show, such extension is possible but increases the complexity of the model exponentially with the size of delays considered; and therefore the number of experimental data-points to allow for a successful performance under that condition goes beyond the neighborhood of what is considered feasible in an organizational setting. 125 The state-space in our setting depicts what resource allocation the organization is using at the current time, i.e., resources allocated to each of the 3 actions, A1 (j=1 ..3); however, since the sum of these resources equals a constant total resource (R=100) at each time, we have a 2dimensional action space, A, and A2 , each changing between 0 and 100, and with the constraint that 100> A/+A 2. We break down each action dimension into N possible discrete values. We choose N=12 in the following analysis since that allows for observation of the peak of the Cobb5 Douglas payoff function used in our analysis (PF=A 10 .A 0 3333 .A 3° 1667 ); 2° consequently, each Aj can take one of the R/N*i values (i=O..N), provided that sum of Aj equals R. To define the action space, we assume that at each period the organization can only explore one of the neighboring states, i.e., by shifting 1/N resources from one activity to another, or can stay at the last state. Consequently the action space includes 7 possible actions (No change, Al to A2 , A2 to Al, Al to A3, ). However, at the boundaries of the state space (where Aj=O for any of the j's) the feasible action set reduces to exclude possibility of existing the feasible state space. Consequently our action-state space, (S, a), includes 559 possibilities, where each S consists of a (Al, A2 ) pair and "a" can take one of its feasible values. At each period the organization takes an action (goes from state S to state S', through action "a"), observes a payoff (P) from that move, and updates the value of the action-states Q(S,a) according to the following updating rule: Q(S, a) - (1- a).Q(S,a) +a.(P +y. MaxQ(S', a')) Parameter a is used to avoid too fast an adjustment of values in case of stochastic payoff functions. Since here the payoff is deterministic, we use a very high discount rate, a=0.9, in the analysis below. Central to the contribution of Q-learning algorithm is that it not only takes into account the reward received from the last action-state, but also gives a reward for how good a state that action has brought us to (maximum value available from S' through any action a'). Therefore this algorithm allows for taking into account the contributions of different states in leading us towards better regions of payoff landscape and creates a potential avenue through which one can capture the effects of time-delays more efficiently, i.e., by rewarding not only what payoff we get at this step, but also what new state the action takes us to, a forward-looking perspective. Parameter y determines the strength of that feedback on model performance. The action taken at each period depends on the last state, the value of different actions originating from that state, and the tendency of the organization to explore different possibilities. 126 Here we can have two types of exploration: one includes exploring actions which we have never taken from the current state (exploring new (S,a)'s); the other reflects the organization's propensity to try an action which has a lower value than maximum, even though it has been tried before (exploring (S,a)'s which have already been explored). Both these types of exploration are needed to lower the chances that the organization gets stuck exploiting a low-payoff location on the state-space. Starting from a value of 0 for all Q(S,a)s and taking into account that all actions in this landscape, except for those on the borders, give a positive payoff, we use the following equations to define the probability of taking action "a" from state S, P(S, a): P(S, a) = V v.Z+ (I - o).K P(S, a) = E if Q(S,a)=0O (-) ifQ(S,a)>O Q(S,a')" ' v.Z +(1- v).K a' Here u represents the attention given to exploring new actions and w represents the priority given to exploring already-tried state-action pairs. In the following analysis we use the values of 0.3 for u and 4 for o which maximized the speed of learning for this model on the given task within a 3 by 3 ([0.1, 0.3, 0.5]*[1, 4, 7]) grid of values for u and o.28 Since this model follows a different logic both in defining the space and time states and in exploration and value adjustment algorithms, we have to find the appropriate simulation length to make a comparison between this model and those reported in the essay meaningful. This is important in the light of the fact that given enough time/experimental data, the Q-learning algorithm is guaranteed to find real values (what they pay off in an optimal policy) of different state-action pairs, and therefore to be able to make optimal allocations. The important question is then whether such convergence can happen faster or slower than with the other learning models, and what is the impact of introducing delays on the time/trial requirements of this model. The basic idea behind such comparison is the availability of similar amount of data about the payoff landscape29 . Since such data arrives through the learning agent's exploration of the 28 To find these parameter values, we ran linear regressions with dependent variable of average payoff over 10 simulations of 100 periods and independent variable of TIME, to find the largest learning rate (the largest regression coefficient on TIME) within the given parameter settings. 29 Note that because of the continuous time nature of the four original algorithms, the autocorrelation of exploration term, and the independence of their behavior from time step of simulation (for dt<0.2), it is not realistic to count the total number of time steps in 300 periods of 127 landscape, gathering comparable amounts of data requires similar speeds for different algorithms when walking on the state-space in a non-converging mode. That is, the distance traveled on the state-space should be similar through the length of simulation for Q-learning and the other four algorithms. In the case of the other four models, the total distance traveled remains below 2030for most of the cases (see Table 4 in the essay 1) where most simulations have not converged. Therefore a reasonable comparison allows the Q-learning algorithm enough time to walk on the state-space for a similar distance. In the case of the Q-learning algorithm where activity dimension is divided into N discrete values for each dimension, each step to a new position moves the organization on the state-space for distance d: d= 1 )2+ )2 N N = N Taking into account the probability that the organization stays where it is in a random walk (which equals 91 ), the distance traveled with each step is d'- 1/N. Consequently, M, total 559 number of periods the Q-learning model needs to run to travel the distance of 20, is: M=20*N=240 Therefore a reasonable comparison between the Q-learning and other models can be made when the Q-learning model is simulated for about 240 periods. Using the parameter values reported above, we run the Q-learning algorithm for as long as 600 periods in order to make sure that we observe the behavior of the learning algorithm outside the range of data-availability allowed for other models. We average the results over 400 simulations for higher confidence. continuous simulation and do a direct comparison with same periods for the Q-learning method. Such logic will give increasing number of trial periods to Q-learning, as we reduce the dt for the continuous model, which is obviously wrong. 30 The distance traveled on state-space was calculated in four original models based on summing the Euclidian distance between points the algorithm visits every period on the 3-dimensional space of fraction of resources allocated. We follow the same logic here. 128 Learning Performance IAr, 1UU 90 a) 0) -' C 80 1.111OWNH11,01' ~l~r~l~hl~ 70 60 o QL I' 50 40 .a ! 0 100 200 . I 300 i ! I 400 I- , I 500 I 600 Period (Q-Learning Scale) - Correlation Myopic - - Reinforcement .... Regression ----- Q-Learning Figure 9- The behavior of Q-learning algorithm (average of 400 simulations) in the resource allocation task, no-delay condition, as compared to the other four algorithms (as reported in the essay). Note that the Time scale is adjusted for the other algorithms to represent the equivalence of data availability. Also note that by making the state-space discrete, the expected payoff of a random allocation has decreased (mainly because of the 0 payoffs in the boundaries); thus the lower start-point for the Q-learning algorithm. As the graph shows, the Q-learning model does not find the optimum allocation even after receiving over two times the data used by other models. The behavior of the Q-learning algorithm under these conditions suggests that in order to find good policies this algorithm requires far more data than the other four models, even under no-delay condition. This should 129 come as no great surprise since the discrete state-action space of this algorithm includes many points for which a few trials need to be realized before a useful mental model of the space is built. Moreover, this algorithm does not use any information about the slope of the payoff landscape in moving to better allocation policies. Nevertheless, the amount of data needed by this algorithm puts it at a disadvantage in terms of applicability for typical organizational learning instances, where few data points are available. In fact, this is the main reason we did not include this algorithm in the analysis reported in the essay. Extension of this model to capture action-payoff delays through an enhanced representation of state-spaces is possible; however it is very costly in terms of data requirements for the model. In the simplest case, where we only can have delays of 1 period between action and payoff, our state-space will need to include not only the current location on the payoff landscape (the number of its members are O(N2 )), but also the last action. This is necessary because both the current state and the state from the last period (which can be deduced from the last action) contribute to the payoff for the current period. This additional complexity results in expansion of the state-space and the corresponding data requirement to O(N2.A), where A is the size of the potential action space (in our setting 7)31. Consequently, to be able to learn about processes with delays of length K the algorithm needs a state-space of size O(N2.AK), which is exponentially growing with the length of the maximum delay in the space, resulting in exponential growth of data requirements to learn about such a payoff function. In the very general case one can consider a maximum delay length of K and arbitrarily distributed delay structure, an arbitrary m-dimensional state space where each dimension can take one of the N possible values, where exploratory actions are possible, and under an arbitrary payoff function. The "NK" type models (Kauffman 1993), widely used in organizational learning literature, are a special case of this general model (where N=2 in our terminology and no delay is present). For our general case the complexity of state-action space is of the order O(Nm.(Nm)K),where the first term, Nm, represents, in Levinthal's (2000) terms, the spatial 31 In fact our model is limiting search to local exploration of neighboring states, and therefore size of A is limited to 7 (boundary states have smaller sizes; for an m-dimensional allocation problem, size of A becomes m*(m- )+; here m=3). If long-distance explorative moves were possible, as most models of organizational learning assume, the A set will expand to all the states, and will itself be of size O(Nm), which increases the speed of complexity growth as a function of delays even further. 130 complexity and the latter term, (Nm)K , represents the temporal complexity (see footnote 31 for derivation). This clearly shows how temporal complexity quickly dominates the spatial complexity for larger values of delays (K> I period). The exponential growth of learning complexity with size of delay further reinforces the main argument of this essay, that delays have an important impact on the complexity of learning, and therefore should be considered explicitly and in more detail as they relate to organizational learning. Simulating the Q-learning model Two versions of the Q-learning model are posted with the online material on the author's website. Both are developed through Anylogic TMsoftware. One interface is a stand alone Java applet with accompanying files and runs under any Java enabled browser with no need to install additional software, the second includes the source code and equations of the model and can be used for detailed inspection of model implementation after installation of Anylogic software, as well as running the model. To open the stand-alone applet, unzip the "Q-Learning_Applet_Files.zip" into one folder (all three files need be in the same folder). You can then open the "Q-Learning-Visual Applet.html" file in any web browser (e.g. Internet Explorer). You can change different model parameters and extend the simulation time as you desire. To open the detailed Anylogic model, you need to first download and install the Anylogic software (you can get the 15-day free trial version from http://www.xitek.com/download/). You can then open "Q-Learning-Visual.alp" in Anylogic. Different objects and modules of code are observable on the left hand column and you can navigate through them and inspect different elements of the model. You can run the model by clicking on run button (or F5) which compiles the model and brings up a similar page as the attached Java applet. You can inspect the behavior of the model both through the interface accompanying the model and applet, or by browsing different variables and graphing them as long as you are in the run mode in Anylogic. For the second purpose, go to "root" tab in the run time mode, where you can see all model variables and can inspect their runtime behavior as well as final value. 131 The implemented interface allows you to observe payoff performance as well as the path of the organization on the state-space, as well as change the following aspects of the model within a simulation (just click pause, make the desired parameter changes, and click run to continue the simulation): I- Change the simulation time to allow for more information for the learning algorithm. 2- Change exploration parameters, u and w, for changing the propensity of the model to explore new action-states or re-visit those already tried. 3- Change the value adjustment parameters, discount rate and strength of feedback to states that become stepping stones for better payoffs (Parameter y in the algorithm above). 4- Change the shape of payoff function with changing the exponents of different activities' effect on payoff. Note that Cobb-Douglas function requires the exponents to add up to 1. You can reload the model by pressing the refresh button of your browser. You can also change the speed of the simulation through the simulation control slider on the top left corner of the applet. ] : Simulating the four other learning models The simulation model used to examine the other four learning algorithms is also posted with the online material. To open the model, download and install the Vensim Model Reader from http://vensi m.com/freedownload.html. Then open "LearningModelPresent.vmf" from Vensim file menu. Navigate through different views of the model using the "Page Up" and "Page Down" buttons or by the tab for selecting views at the bottom left of the screen, or by use of navigation buttons enabled on most of the views. View the equation for each variable by selecting that variable and clicking the "Doc" button in the left toolbar. Note that this model includes more functionalities than discussed in the essay. It allows you to introduce noise in payoff and perception delays, and change the payoff function to linear. Moreover, you can change any of the model parameters and explore different scenarios with this model. Simulating the model: 132 - First choose a name for your simulation and enter it in the field for simulation name in themiddleof the top toolbar: - ' - __ _ - Click on the SET button to the left of this name. - Now change the parameters of the model as desired. The best place to do so is to go to the "control panel" view (Press Page Down until you arrive at the second view) where you can change all the main parameters. The current values of the parameters are shown if you click on each parameter. - Simulate the model by clicking the Run button in the top toolbar: . Examining the behavior: - The control panel includes graphs of the payoff, distance traveled in the space, convergence time, and climbing time (time during which an algorithm improves its performance), based on your latest simulation. - You can also use the tools in the left toolbar to see the behavior of different variables. Select a variable by clicking on it and then click on the desired tool. References: Denrell, J., C. Fang and D. A. Levinthal. 2004. From t-mazes to labyrinths: Learning from model-based feedback. Management Science 50(10): 1366-1378. Kauffman, S. A. 1993. The origins of order : Self organization and selection in evolution. New York, Oxford University Press. Levinthal, D. 2000. Organizational capabilities in complex worlds. The nature and dynamics of organizational capabilities. S. Winter. New York., Oxford University Press. Sutton, R. S. and A. G. Barto. 1998. Reinforcement learning: An introduction. Cambridge, The MIT Press. Watkins, C. 1989. Learning from delayed rewards. Ph.D. Thesis. Cambridge, UK, Kings' College. 133 Appendix 2-1- Technical material to accompany essay 2: Heterogeneity and Network Structure in the Dynamics of Diffusion: Comparing Agent-Based and Differential Equation Models This supplement documents the DE and AB models, the construction of the different networks for the AB simulations, the distribution of final sizes, the details of sensitivity analysis on Final size, and the instructions for running the model. The model, implemented in the Anylogic simulation environment, is available on authors website. Formulation for the Agent-Based SEIR Model We develop the agent based SEIR model and derive the classical differential equation model from it. Figure 1 shows the state transition chart for individual j. Individuals progress from Susceptible to Exposed (asymptotic infectious) to Infectious (Symptomatic) to Recovered (immune to reinfection). (In the classical epidemiology literature the recovered state is often termed "Removed" to denote both recovery and cumulative mortality.) J -------- Rate Rate Rate Figure 1. State Transitions for the AB SEIR model The states SEIR are mutually exclusive; in the AB model the current state has value one and all others are zero. We first derive the individual Infection Rate flJ], then the Emergence and Recovery Rates elj] and r[j]. InfectionRate: 134 I Infectious and Exposed individuals come into contact with others at some rate, and some of these result in transmission (infection). Disease transmission (the transition from S to E) for individual j occurs if the individual has an infectious contact with an individual in E or I state. Contacts for an E or I individual are independent and therefore the time between two contacts is distributed as Exponential (L) where: L = cj.H[j] (1) Where cj the base contact rate, equals cis or CEsdepending on whether individual j is in E or I state, and H[j] is the overall heterogeneity factor for individual j (defined below). The other side of the contacts for individual j is chosen among the N- 1Iother individuals based on the following probability: A[I].NW[I, j]/ K[l]" Probability(j contacting llj has a contact)= Pjj,l]= 2,[i].NW[i, j] /K[i] A (2) Where the variables are defined as follows: X [j]: The individual heterogeneity factor. It equals 1 for homogeneous scenarios and is distributed as Uniform[0.25-1.75] in the heterogeneous case. KUj]:The number of links individual j has in his network NW[i,j] is 1 if there is a link between individuals i and j and 0 if there is non or i=j. I is the set of all individuals in population. T captures the time constraint on contacts. In homogeneous scenarios it is set to 1 so that more links for an individual result in fewer chances of contact with each link, and therefore compensates the heterogeneity created by degree distributions across different networks. For heterogeneous scenarios is set at 0 so that the probability of contact through all links and do not depend on how busy people on two sides of link are. The basic idea behind this probability formulation is that contacts of E or I individuals are distributed between other population members depending on their heterogeneity factor and their connectedness. The overall heterogeneity factor for individual j, H[j], is then defined to ensure that 1- Individuals have contacts directly proportional to their individual heterogeneities (X [I]) 2- Individual contacts depend 135 on their number of links if in heterogeneous condition, and independent of that if homogeneous. 3- Total number of contacts in the population remains in expectation the same as that for the simple DE model (see section on driving DE model from AB one). Consequently: H[j] = A[j].W[j].N r (3) K[j] .WN Where the variables are defined as follows: WLj]:The weighted sum of links to individual j: E i[i].NW[i, I j]/K[i]r (4) WN: The population level weighted sum of W, is a normalizing factor and is defined as: [i].W[i] K[i]' Finally, a contact between individuals j and I only results in the infection of I with probability i, which equals iESor is depending on the state individual j is in (E or I). Combining these three components (the contact rate for individual, the contact probability with different individuals, and the infectivity), we can summarize the rate of infectious contacts between individuals k and j as: Z[j,k]=Limit((Probability of an infectious contact between k and j between t and t+dt)/dt, dt"O) Zj,k]=- L N NW[i,l].A[i].2[l] , NW[j, k].2[j].A[k] (K[j].K[k])r (6) (K[i].K[l]) And since these contact rates are independent, for a susceptible individual, the rate of infection, fIk] is given by: f[k]= Z[j,k] (7) jEEvS Emergence and Recovery: The formulations for emergence elj] and recovery rj] are analogous. Since we are modeling the residence times in each state to be exponential, in parallel with the DE model, these rates are given as events that take A and B time units to happen, where A and B are distributed as: 136 A=Exponential( 1/) (8) B=Exponential( 1/I) (9) And consequently e[j]=1/6 and rj]=1/E. (10) Deriving the mean-field differential equation model: Figure 3 shows the structure of the mean field DE model. Disease Duration Emergence Time tacts of AsymptoImatic Infecti ous Infectivity of Exposed iES Contagion Contact Frequency for Exposed cES I ils Contact Frequency for Infectious clS Total Contacts of Symptomatic Infectious Figure 2. The structure of the SEIR differential equation model. 137 To derive the DE formulation for the infection rate, fDE,sum the individual infection rates flj]: fDE = Z[j,k] J J (11) K In the DE model, homogeneity implies all individuals have the same number of links, Koj]=K[k]=K, and same propensity to use their links, x []= k [k]= 1. Therefore: Z[jk] = ci, .N.NW[j, k] NW[j, k] (12) k,j NWUj,k]depends on the network structure, however, in the DE model individuals are assumed to be wellmixed. Therefore NW[j,k]/ E NW[j,k]=I/N 2 (13) k,j and c Z[j, k] -= N (14) Substituting into the equation for infection rate we have: fDE= E keS jEEvl N =-Z( N N NkES CEs.iEs jEE +ZCI sils) jEl (15) Which simplifies into the DE infection rate discussed earlier in the essay: fDE= (CES is.E + (16) cs.i,s.I).(S / N) The emergence/recovery rate in the DE is the sum of individual emergence/recovery rates: e,,, = X e[j] = / c and r, = Er[j]=I/8 (17) jEl i El Which is the formulation for a first-order exponential delay with time constant £ and 6. The full SEIR model is thus: dS dt dt -f dE d= f - e, dt dI dt =e- r (18) 138 f =(CFS iEs.E +c.ibs.I).(S/ N) (19) e EE/ (20) r=II1 (21) Network Structure: Connected: every node is connected to every other node. Random: The probability of a connection between nodes i and j is fixed, p=k/(N-l), where k is the average number of links per person (10 in our simulations, 6 and 18 in our sensitivity analysis to population size) and N is the total size of population (200 in our base simulations, 50 and 800 in sensitivity analysis). Scale-free: Barabasi and Albert (1999) outline an algorithm to grow scale-free networks. The algorithm starts with mo initial nodes and adds new nodes one at a time. The new node is connected to the previous nodes through m new links, where the probability each existing node j receives one of the new links is m*KUj]// KU] where Klj] is the number of links of node j. The probability that a node j has connectivity smaller than k after t time steps is then: P(KUi]t<k)= l-m 2 t/(k2 (t+mo)) (22) We use the same procedure with m=mo=k (average number of links per person). Following the original paper3 2, we start with mO individuals connected to each other in a ring, and then add the rest of the N-mO individuals to the structure. The model, available on the author's website, includes the Java code used to implement all networks. Small World: Following Watts and Strogatz (1998), we build the small world network by ordering all individuals into a one-dimensional ring lattice in which each individual is connected to the k/2 closest neighbors. Then we rewire these links to a randomly chosen member of the population with probability p=0.05 (23) Ring Lattice: In the ring lattice each node is linked to the k/2 nearest neighbors on each side of the node (equivalent to the small world case with zero probability of long-range links.) 32 Albert (2005). Personal communication. 139 Histogram of Final Size for different network and heterogeneity conditions The histograms for final sizes for different network and heterogeneity conditions are shown below. The heterogeneous cases are graphed in blue and homogeneous cases in red. Note that X axes is truncated in middle for Connected, Random, and Scale-free networks (same scales) since for majority of middle points no data existed. The small-world and ring lattice networks include the complete range. The Final Size for DE (0.983) is shown in the picture with line and arrow head. Connected Network Random Network 20 - 12 10 15 I a, 0, o, a, 0 S 0 10 0 8 6: 0. 0. 4 5' 2 01 L -;"~ r ~ ~ , '~' T Gooddddddddoodddddd ooooro OOO OO C14MI 0CDXCD 0- - W 0000000°00000000°00000000 FinalSize Final Size Scale-free Network Small-World Network 12 35 30 10 II 25 R 0 I 0 0 U 6 i 20 ZD 0. CD 0. 15 I 4 I 10 .. 0 1 k. OEN"tM 000000 5 '!I a 0 Wm 1 0 RZNMtI0911t- 00o 00 60 606600660ze O Mq9NM10s-mg 0000000 000000000 OC Final Size Final Size 140 Ring Lattice Network 4n 1Z 10 8 T 1, c. 6 a. 4 2 0 oo660060 _ N " I Size ooooooFinal 8 E O XO Final Size The Fitted DE Model We calibrate the DE model to the trajectory of the infectious population in 200 simulations of each AB scenario. The best-fit parameters for each infectivity (iEs and is) and emergence duration () are found by minimizing the sum of squared error between the mean AB behavior of the infectious population, IAB,and infectious population in the DE model, IDE,through time T, subject to the DE model structure: 7' min [II(t)-IAB(t)] 2. (24) is is't=0 We use the VensimTMsoftware optimization engine for the estimation. Diffusion is slower in the small world and lattice networks, so the model is calibrated on data for T =300 days (Connected, Random, and Scale-free networks), and T = 500 days (Small world and Lattice) to include the full lifecycle of the epidemic. The estimated parameters are constrained such that 0 < iES,iS and 0 < E < 30 days. Below you can find the complete results of calibrated parameters (Medians and Standard Deviations) as well as the goodness of fit between different metrics in the fitted DE and the AB simulations. The fitted simulations fall within the 90/95 percent confidence intervals of AB simulations for over 75/89 percent of cases in all network and heterogeneity conditions and for all metrics. 141 Table 1. Median and standard deviation, a, of estimated parameters for the DE model fitted to the symptomatic population. Reports results of 200 randomly selected runs of the AB model for each cell of the experimental design. Compare to base case parameters in Table 1. The median and standard deviation of the basic reproduction number, RO=CES*iES*E + clS*ils*5,computed from the estimated parameters is also shown. Parameter Infectivity Connected of H# H= H* H= H= H* H= Ho 0.044 0.049 0.053 0.059 0.025 0.026 0.025 0.040 0.021 0.088 0.023 0.171 0.124 0.252 0.055 0.135 0.280 0.310 0.100 0.025 0.047 0.072 0.046 0.075 0.054 0.011 0.023 0.015 a 0.115 0.067 0.068 0.058 0.068 0.053 0.076 0.060 0.037 0.032 Median 14.47 10.87 12.14 6.788 9.827 3.998 21.05 16.30 6.607 3.562 a 4.567 5.139 4.881 5.323 5.798 4.836 7.813 8.527 13.05 10.70 Median 4.189 2.962 3.090 2.546 2.900 2.425 3.326 2.524 1.561 1.350 Infectious is Incubation Time s Implied Ro = CES*iES*| R H, a 1.712 0.622 0.576 0.524 0.666 0.611 0.880 0.658 0.812 0.550 Median 0.999 0.999 0.999 0.999 0.999 0.999 0.998 0.998 0.985 0.987 a 0.025 0.049 0.017 0.050 0.042 0.071 0.040 0.059 0.056 0.043 +CIS*ilS* z Lattice 0.053 Median Average Small World H= a of Scale-free 0.045 Median Exposed iES Infectivity Random Table 2- The percentage of fitted DE simulations which fall inside the 95% and 90% confidence intervals defined by ensemble of AB simulation, for each of the three metrics of Final Size, Peak Time, and Peak Prevalence, under each network and heterogeneity condition. Metric Conf. Bound Connected Random Scale-free Small World Lattice Bound H H H H H. H, H. H# H. H* 95% 99.5 94 96 95.5 94.5 92 97.5 91.5 96 95 90% 95.5 89.5 88.5 89 89 87.5 89.5 87.5 90 89.5 Peak Time, 95% 97 93.5 95.5 91.5 89 93.5 96 93.5 89.5 93 Tp Peak Prev, 90% 95% 95 97.5 88.5 96.5 90 97.5 86.5 95.5 84.5 92.5 85 96.5 94.5 97 87.5 98 85.5 94 79.5 95.5 Imax 90% 92.5 93.5 89.5 83.5 86.5 94 92 91.5 75.5 85 Final Size Sensitivity analysis on population size: The sensitivity analysis repeats the AB simulations for populations of 50 and 800 individuals, with 1000 simulations for each network and heterogeneity scenario. We do not repeat the calibration operations for this analysis. The following tables report the statistics of the four 142 metrics as well as their comparison with the base case DE model for each population size. Values of metrics for base case DE model are reported with the metric name (left column). The fraction inside envelop is calculated for the time in which the final recovered population settles into 2% of its final value in base DE, which is 76 days for N=800 and 57 days for N=50. */**/*** indicates the DE simulation falls outside the 90/95/99% confidence bound defined by the ensemble of AB simulations. Population Connect N=800 Random ed Final Size (0.983) Homogeneous Heterogeneous Peak Time, Tp (57.2) Homogeneous Heterogeneous Homogeneous Peak Prev., Imax (27%) Heterogeneous Fraction Inside Homogeneous Heterogeneous %F<0.1 AB Mean AB a %F<0.1 AB Mean AB a AB Mean AB a AB Mean AB a AB Mean AB a AB Mean AB a Base DE Base DE Lattice Scale Small Free World 2.1 0.953 0.139 3.9 0.888*** 0.179 99.6** 22.4 87.6* 23.4 18.8*** 3.4 18.2*** 4.1 0.355 0.421 2.2 0.820 0.237 4.8 0.640*** 0.270 153.8 123.7 130.0 101.9 5.4*** 1.6 5.0*** 1.7 0.303 0.316 Lattice 3.1 0.952 0.170 3.9 0.902*** 0.181 58.2 12.4 51.9 12.4 27.0 5.0 25.9 5.4 1 1 2.4 0.942** 0.148 3.5 0.885*** 0.168 61.4 13.1 55.6 12.9 26.0 4.3 24.6 4.9 1 1 2.2 0.954 0.143 4.6 0.846*** 0.186 62.7 12.8 45.9 12.6 26.3 4.2 24.2 5.5 1 0.816 Connect Envelope Population N=50 Final Size Homogeneous (0.983) Heterogeneous Peak Time, Tp (37.8) Homogeneous Heterogeneous Peak Prev., Imax Homogeneous Random Scale Small %F<0.1 ed 2.1 3.2 Free 3.4 World 2.8 AB Mean AB a %F<0.1 AB Mean AB a AB Mean AB a AB Mean AB a AB Mean AB a 0.959 0.144 3.5 0.905 0.170 39.92 12.43 37.89 13.17 33.2 7.29 0.890* 0.166 5.5 0.814** 0.209 42.99 16.13 40.36 16.98 29.3 7.47 0.922 0.172 3.8 0.881** 0.176 39.8 14.9 37.2 14.1 31.7 7.80 0.803 0.251 4.7 0.698** 0.266 51.26 26.71 47.99 27.20 22.1 7.85 3 0.705 0.277 3.8 0.602** 0.267 48.52 28.68 46.86 30.05 18.9 7.11 33 The peak prevalence depends on initial susceptible fraction, besides infectivities, contact rates, and delays and therefore slightly differs for two population sizes where same initial susceptible creates different fractions. 143 Running the model Two versions of the model are posted on the web. Both are developed through Anylogic TM software used for mixed agent-based and differential equation models3 4 . One interface is a stand alone Java applet with accompanying files and runs under any Java enabled browser with no need to install additional software, the second includes the source code and equations of the model and can be used for detailed inspection of model implementation after installation of Anylogic software, as well as running the model. To open the stand-alone applet, unzip the "ABDE-Contagion-applet.zip" into one folder (all three files need be in the same folder). You can then open the "Dynamics of Contagion_AprOS_Paper Applet.html" file in any web browser (e.g. Internet Explorer). The interface is self-explanatory and has a short help file which can be opened by clicking the [?] button. To open the detailed Anylogic model, you need to first download and install the Anylogic software (you can get the 15-day free trial version from http://www.xjtek.com/download/) and then unzip the attached file "ABDE-Contagion-code.zip" in one folder. You can then open "Dynamics of Contagion_AprOS_Paper.alp" in Anylogic. Different objects and modules of code are observable on the left hand column and you can navigate through them and inspect different elements of the model. You can run the model by clicking on run button (or F5) which compiles the model and brings up a similar page as the attached Java applet. You can inspect the behavior of the model both through the interface accompanying the model and applet, or by browsing different variables and graphing them as long as you are in the run mode in Anylogic. For the second purpose, go to "root" tab in the run time mode, where you can see all model variables and can inspect their runtime behavior as well as final value. 34We thank the XJ Technologies development team for building the first version of Anylogic model based on earlier drafts of this paper. The model was later revised and refined by authors to precisely reflect the theoretical setting described in the paper. 144 The implemented interface has the following capabilities that you can use by either way of accessing the model: 1- Run the model for different scenarios in single or multiple runs. 2- Inspect the behavior of the AB model through the total number of people in different states, as well as the state transition rates and different confidence intervals for behavior of the infected population in AB model (you can change the size of confidence intervals as well). 3- Visually inspect the spread of disease in the population. Different nodes and links between them are shown and you can choose between 4 different representations of the network structure. 4- Observe the metrics of peak value, peak time, and final size. 5- Change all important model parameters, including contact rates (note that contact rates of exposed and infected are described in terms of fraction of contact rate of healthy) and infectivities, duration of different stages, size of the population, network type, heterogeneity, number of links per person, and probability of long-range links in small-world network. 6- Save the results for populations of different states at different times into a text file (this feature may have some problem under some browsers, however, it works fine if you install the model through the second method and open it from inside Anylogic) 145 Appendix 3-1- Reflections on the data collection and modeling process In this document I briefly discuss the process used for data gathering, model building, and model analysis that I employed in the research for the essay Dynamics of multiple-release product development.. I focus mainly on a few tools and activities that I found useful for myself and I had not found discussed in detail in the literature. Therefore I leave the comprehensive description of the qualitative research process and model building to relevant books and articles (Glaser and Strauss 1967; Eisenhardt 1989; Strauss and Corbin 1998; Sterman 2000) and keep this document short and operational. I make no claim of the efficiency of the processes I have used; rather I will simply describe my experience and highlight process insights of this experience. The intended audience of this appendix is other researchers who want to build theory based on case studies using simulation, as well as model builders who are looking for ways of doing their testing and analysis faster and more efficiently. This research started in June 2004 and has continued in multiple steps, outlined chronologically below: - June-August 2004: Interviews, archival data gathering, and hypothesis generation - July 2004: building a qualitative model - August 2004: Building a detailed simulation model and preliminary analysis - September-October 2004: Model analysis, building a simple insight model - November 2004: First write-up - December 2004-June 2005: Further data collection (both interviews and quantitative data), and refinement of analysis and text - March 2005 on: Definition of other research questions in the research site and datagathering for these questions. Four main activities are evident in this process: data gathering, model building, model analysis, and model distillation/insight generation. The process of moving between these activities was iterative, with the first iteration being the most intense. Data gathering and qualitative model building 146 In the data gathering phase, the main source of information was interviews with individuals involved with the development of two software products, Alpha and Beta. I entered the site with some general impression of the problem I wanted to study: the quality, cost, and delay problems faced by many new product development projects, when multiple projects are connected to each other. These were very real challenges for product Alpha, and product Beta was distinguishable for avoiding several of these problems3 5. Therefore the contrast between the two was promising for making sense of the guiding questions. I started the interviews with a set of questions (usually tailored to the specific person); however, in general the interviews were open-ended and explored different themes relevant to the problem at hand, as well as the mechanics of the development process and the relationships between different groups involved. Interviews were often recorded and summarized at the end of the day in daily report files. In essence, these summaries were shortened transcripts of the interviews which I wrote down while listening to the audio files at the end of each day. Marking each topic of conversation with a header and a time stamp (the corresponding minute and second of the interview) helped accessibility of these files for future reference, while keeping their size small and therefore their review more feasible. An example of these transcripts is reproduced below: "13:50 Translatingfeatures Product manager comes up with features, gives to system engineers, which for them have been fine in Siebel, since the sameperson has been doing system engineeringacross multiplereleases and was a well-understoodarea. So we got most of requirementswell. We stillfind problemson customerexpectationnot being the same, but that is usuallytheproblem of VPs thinkingthat we need a desktopapplication([PersonA] and [PersonB]) andpushing this, while this ended up appearnot to be central at all 17:00 Feature additionsand drops Weflipflop a little bit. QQQ it was driven mostly by thinking thatwe can sell it with higher price, but customers didn 't want it. We have so much problem getting more than one application with [complementary software x] on the screen that is problematic. " 35 In fact I started with product Alpha and added product Beta to my study when in mid-July (04) I1learnt about its significantly better practices. 147 As the example shows, each paragraph tracks 2-3 minutes of conversation in a couple of crude sentences; moreover, I did a quick coding for action items, data sources mentioned, good quotations, questions to ask, etc. while doing the transcription, by leaving marks such as QQQ in the text, where relevant. These were easy to automatically code in the Atlas.ti software that I used for qualitative analysis and allowed me to quickly extract important action items, people to interview, etc., at the end of the week. This transcription and coding process took about 1.5 times the original interview length. In retrospect, coding for action items, quotations, and data sources is helpful, but too much coding of content at this early stage is not so helpful. Along with the data-gathering process, I built a qualitative model (in causal loops) of dynamics that can impact multiple-release product development. The data-gathering and model building activities were intimately inter-related: interview data provided new hypotheses to be added to the loop-set, while the loop-set guided the questions to be asked for the confirmation of existing dynamic hypotheses and for the exploration of the new relevant themes. Moreover, often the loops played a valuable role as boundary objects to discuss the research with interviewees and to quickly elicit their comments on other dynamics relevant to the problem at hand. Appendix 3-2 outlines the main dynamic hypotheses generated from this part of the process through causal loop diagrams. The loops capture the theoretical story behind the case. A good complement to this theoretical view is the case history which allows the researcher to connect the theoretical model to empirical facts in a coherent fashion. Moreover, the case history reveals the issues not yet covered in the data-gathering process and helps orient further data collection. Finally, it can be used later on for assembling the datasets needed for model calibration and validation. I used an Excel database to build a case history for different releases of the Alpha project. Basically, a case history should follow the evolution of different dimensions of a project through time. A spreadsheet provides a good structure to do so: the rows of this spreadsheet represented the different dimensions of the project which were relevant to my research interest (e.g., resources, schedules, costs, scope of work, amount of work done, sales, etc.) and the columns included different months in the project history. While going through archival data or interview notes, I transferred every interesting piece of information to the relevant cell in this Excel file. This allowed for an easy and comprehensive way of capturing a lot of data from different files into a simple project history. Below you can see a picture of the structure used to capture the case history. 148 A few tips are helpful in constructing such a structure: - Include any piece of the relevant information and expand the categories (left hand side) to capture them. Time pressure would not allow you to iterate this process many times so it is better to capture all relevant information on the first run. - For each item entered into the cell, record which file or interview this data point has originated from. This will facilitate future recovery, elaboration, and replication of the work. - When you see tables or graphs with lots of useful information which are hard to capture in a single cell, copy the bitmap image of the table or graph (e.g., using PrintScreen key) and paste it into the Excel sheet. You can then minimize the picture's size to fit on a cell, and put a description of content inside the cell the picture is covering (you can see a few such tables in the example above). Finally, the organization of data in the data collection phase is crucial to using them effectively. A few tips on useful organizing schemes I used for the data collection phase follow: - A file/table tracked every document I received. I included the document name upon receipt, along with details on the source of the document. Later, when I studied the document in detail, I added some points on what might be the useful points in the document, as it relates to my research (besides extracting the data relevant to the case files). 149 - Another file/table tracked all the interviews. The main fields included: Who did I interview, when, where, for how long, did I record the interview, have I transcribed the interview, have I coded the interview in detail. - A separate file/table recorded people whom I met in the field, their role, their contact information, and any details I wanted to remember from them (e.g., their expertise, interests). - Most of the numerical data were transferred into relational databases. This allows efficient and quick query generation and will become handy for model calibration and validation work. Moreover, most organizations have their data in such data-bases, so you can try to integrate such already available data into your databases. These files were helpful for me to keep the work organized in presence of hundreds of documents, interview files, and individuals which I needed to manage. Moreover, they improve the traceability and reproducibility of the research process. Building the simulation model The qualitative model discussed above informed the formulation of a simulation model for multiple release product development. I started with the core mechanics of product development for a single release, added some of the more salient feedback processes from the causal loop diagrams, and then expanded the model to include multiple releases and the interactions between these releases. Future refinements included more feedback loops and hypotheses to this model. The detailed model formulations are listed in the appendix 3-3. The simulation model resulting from this process was detailed (over 1500 variables and parameters) and complex. Therefore the model required a strict discipline to ensure the integrity and the consistency of the model after frequent additions of new pieces of structure. This need was further pronounced by the tight time-line of model building on which I was working (about I month to build the detailed model from scratch). A set of automated tests facilitated such robustness analysis. I describe these tests below. This testing process is in nature quite similar to the "Reality checks" feature of Vensim; however, I found the implementation of the latter relatively confusing and therefore followed a custom procedure. In short, I created a set of test cases, built them into a command script that executed the relevant simulations consecutively, and ran the command script upon each modification of the model. I 150 then inspected a few relevant metrics in the control panel view of the model to see if the model had broken down as a result of the last modification. Design of the test cases- The correct behavior of the model under a set of extreme conditions and theoretical scenarios is often predictable. For example, in absence of development resources, no task will get done. These cases create the basis for automated testing. Below are a few such cases that I used to test my model: - When there is no work to do, nothing should happen. - When there are no resources to work with, nothing should happen. - When we have unlimited resources for testing and development, the only bottleneck is design and testing time, therefore eachstage should take as much as design timeplus a small amountfor first orderfinishing of tasks in developmentand testingstages and the circling of tasks between rework and testing. - When we install in many sites, we find most of the errors very fast. Ifthen the resources for CE are unlimitedand all errors are consideredimportant,then all the errors are quickly fixed and no errors remain. - Whenwe start all releasesat the same time (0), nothingstrangeshould happen with the scheduled times to finish, etc. - When we take all the problematicfeedback loops out of thepicture, things should go smoothly. Each test case needs a condition that can be translated into parameter changes, and an expected behavior which is easily inspected in the simulation output. Building the command script- DSS version of the Vensim software has the capacities for writing command scripts, pieces of code for controlling and using the software based on pre-defined schemes. Therefore I wrote the command scripts that set up the model based on each test-case scenario, and simulated it with a specific run name. See an example below (the first line includes the test description). ! When we take all the problematic feedback loops out of the picture, things should go smoothly SIMULA TE>SETVAL ISW EffArchtcr Complexity on Productivity = 0 SIMULA TE>SETVAL ISW Eff Old Arch on Design Problems = 0 SIMULA TE>SETVALISWEff Old Code and Archtctr on Error = 0 SIMULA TE>SETVALISWEff Pressure on Error = 0 151 SIMULA TE>SETVAL ISW Eff Pressure on Productivity = 0 SIMULA TE>SETVAL ISW Eff Bug Fix on Scope =0 SIMULATE>SETVALSW Effect Replan on Replan Treshold=0 SIMULA TE>SETVAL ISW Fixing Bugs=O SIMULA TE>SETVALISW Endogneous Sales=O SIMULA TE>RUNNAMEI TL -NoFeedbackActive. vdf 0 MENU>RUNI 0 Running the command scripts and inspecting the model behavior- Testing often leads to finding problems in the model that need to be fixed. To avoid problems of version control (having multiple versions of the model which each have fixed some aspects of the problem), I copied the latest version of the model into a testing directory (which includes all the relevant command files), ran the tests on that version of the model, and edited the model according to the problems found. When finished with this round of testing, I returned this edited version of the model to the main directory and saved it under a new version name. I could then build on this model for future work. I used a control panel which included several of the critical variables of the model to track the behavior of the model in automated testing. Moreover, a few variables (such as the "minimum of all physical stocks") were added to enable the faster inspection of the test results. Meaningful run names help one quickly inspect each test run for potential flaws and go to the next in absence of any flaws. The concept of extreme condition testing is well established in SD literature (see Chapter 21 in Sterman 2000). However, time constraints often cut down on the extensiveness of their application. Automated testing is very quick in running a set of pre-defined tests on the model and therefore allows the modeler to do so very regularly and ensure the robustness of model to extreme conditions and different scenarios, with a little up-front time investment. In fact, for a relatively complicated model, the establishment of the test cases and the command script took me 2-3 hours, and afterwards each round of testing was very fast (a few minutes). Another useful practice in the modeling phase includes the color coding of different variables in the model. Color coding of variables helps the modeler navigate different views much faster, run different tests easier, and make sense of different structures quicker. The basic idea is to have 152 distinct colors for different types of variables (e.g., parameter, data source, switch, table function, etc). The following color codes were developed by MIT Ph.D. students in summer 200436and account for visibility of parameters when in sensitivity analysis modes of Vensim software. Other modelers are encouraged to use a similar scheme to improve compatibility and avoid cognitive switching costs. Also short prefixes mentioned below can help organize the variables better (e.g., SW for switch). For parameters and table functions, it is possible to write a simple code to automatically color code the text version of the model. I encourage anybody interested to write such code and share it with others. Green ExogenousParameters, Dala.' DaLtaseries.Parameterrsfor lble fitnclions Blue TL: Tablefunctions Pink SW: Switch Purple TST.:Testparameters/structuralparameters/exogenous OrangeOutput variables,OPT.:Payoff variables Brown IN.:Initial conditions Red AF: Attentionflags Model analysis and simplification Much is written about the different facets of model analysis. Here I focus on a quick derivation of behavioral loop dominance analysis (Ford 1999) which can be automated within available capabilities of Vensim software. While this method is not as rigorous as eigenvalue analysis and pathway participation methods (Forrester 1982; Eberlein 1989; Mojtahedzadeh, Andersen et al. 2004), it has the benefit of being already operationally available for large models, with some fruitful results. Moreover, the method is intuitive and most modelers already use some version of it in their work; therefore the cognitive costs of adopting it are low. I used this method to create a quick ranking of more important feedback loops from a larger set of hypotheses in the detailed simulation model. The basic idea is to measure the sensitivity of a few basic metrics to existence of different feedback loops in the model. Ranking of the loops in their impact on these metrics determines which ones we want to focus on for further analysis. To do so we need to: 36 Thanks to Jeroen Struben and Gokhan Dogan for their contributions to this development. 153 - Define a set of metrics about which we care (for example the cost and budget over-run) and implement these in the model as explicit variables. - Run a sensitivity analysis switching off all the important loops in the hypothesis set, one at a time. - Observe the sensitivity of different metrics to these loops and rank-order the loops based on these sensitivities to find the more important hypotheses. Vensim allows us to perform the above-mentioned sensitivity analysis very fast (in seconds). For this purpose you need to have switch parameters defined for each loop in the hypothesis set, which makes the loop inactive in the simulation (see Ford 1999). It is helpful to name and parameterize these switch variables consistently, i.e., all of them should start with similar prefixes and loops should be cut when their switch is 1 (or 0). With these settings in place, you can use the optimization option of the Vensim, include your metrics in the payoff function, turn the optimizer off, put sensitivity to "All Constants=l," and simulate the model. Such simulation will automatically change all the parameters of the model, one at a time, 1% upper and lower than their base value, and reports the changes in the payoff from the base case. The results will be recorded in two .tab files, sensitiv.tab and sortsens.tab, one sorted alphabetically and the other sorted based on the extent of sensitivity of payoff to the parameters of the model. Since the switches in your model are put at 0, they will be changed to I and -1, the former of which is equivalent to cutting the feedback loop. Therefore the analysis automatically cuts every feedback loop (that has a switch variable) and observes the impact on the metric of interest. You can then extract the subset of results which are related to loops by copying those records from the alphabetically sorted "sensitive.tab" (remember that same prefixes were used for all the switch variables) and analyze them in a spreadsheet. This process can be further automated within a command script that changes the payoff function to investigate the sensitivity of all different relevant metrics and extracts the results into a spreadsheet. The main goal of this method is to simplify the process of selecting a set of more important hypotheses to include them in smaller models, in write-ups of the study, or in the communication with the client team. This was a salient problem in my study, where I wanted to distill the most critical dynamic hypotheses from a list of over a dozen loops captured in the detailed model discussed above. The strength of the impact of different loops on the behavior of interest is an important consideration for such choices. This method facilitates inclusion of this consideration 154 more robustly. The graph below shows an example of the final results that can come from this analysis. Adding Late Features Cutting Release Quality Criteria More Time to Releases Being Ready Finding More Starting New to Replan Higher Test Problems in Releases Coverage Rework Later Sooner Doing Rework Faster 4U 0 ._O 30 C) . 20 .E 0 10 0) (O C. 0) X -10 -sn Note that this procedure is not without problems. First, the set of feedback loops you choose are often not comprehensive. There are algorithms to select an independent and comprehensive loopset (Oliva 2004) that can relieve this shortcoming. Nevertheless, the main goal of the above method is not to find the dominant loop in the structure, but to find the relative importance of a few different hypotheses (loops). Second, this analysis does not investigate the interactions between loops because it cuts loops one at a time. It is possible to include interactions by doing a full-grid sensitivity analysis of switch parameters with their values set at 0 or 1. Such analysis, however, increases the computational burden to 2 N(from N) and needs further data reduction methods (e.g. regression) to be applied on the outcome data to allow a ranking of loops from such multi-dimensional data. Finally, this method requires the definition of relevant metrics, which can be a complicated task in some settings (for example, when we care about oscillatory behavior). In short, this method is a quick and dirty way of getting an impression on relative importance of main hypotheses in their impact on a few metrics of interest. It is neither 155 comprehensive, nor flawless; however, in the real-world applications where limited time often impacts the quality and thoroughness of the analysis, it can prove useful. Another helpful practice in analysis phase includes examination of the results of a full sensitivity analysis on all model parameters (as described above) one by one, guessing the impacts on each of the metrics and then observing the real impact, and trying to explain the discrepancies between our intuition and the observations. Doing this for the top (most important) 20% of parameters often results in a very good understanding of the model behavior and its origins. Model distillation Case study and qualitative hypothesis generation often push the modeler to build relatively complex models which include hundreds, if not thousands of variables, and are hard to maintain, understand, and communicate. Above, I discussed a few tools I used to facilitate the maintenance, testing, and understanding of one such model. A critical next step of this process is to distill the critical dynamics of the model into a simple model which can be used to communicate the results, write papers, and do more comprehensive simulation analysis or analytical work on the model. The model documented in the essay ("Dynamics of multiplerelease product development") is in fact a distillation of the detailed model discussed above, which allowed for reducing the number of variables over 20 times, while keeping most of the central dynamics. Contrary to intuition, in my experience, it was probably harder to formulate the latter type of the model (small insight model) directly from the detailed data provided through a case study. Taking the intermediate step of constructing a larger and more detailed model allowed me to bridge the gap between the nuanced qualitative data, and the sweeping aggregations and abstractions required for the insight model. Besides, the detailed model can play other roles in calibration, management flight simulator development, and getting buy-in from the client. The practices discussed above allowed me to speed up the process and make it feasible to do both types of models in a relatively short time span (about 4-5 months of full-time work to go through one round of this process); therefore I hope they can prove helpful for some other researchers. References: 156 Eberlein, R. L. 1989. Simplification and understanding of models. System Dynamics Review 5(1). Eisenhardt, K. M. 1989. Building theories from case study research. Academy of Management Review 14(4): 532-550. Ford, D. N. 1999. A behavioral approach to feedback loop dominance analysis. System Dynamics Review 15(1): 3-36. Forrester, N. 1982. A dynamic synthesis of basic macroeconomic theory: Implications for stabilization policy analysis. Sloan School of Management. Cambrdige, MA 02142, MIT. Glaser, B. G. and A. L. Strauss. 1967. The discovery of grounded theory; strategies for qualitative research. Chicago,, Aldine Pub. Co. Mojtahedzadeh, M., D. Andersen and G. P. Richardson. 2004. Using digest to implement the pathway participation method for detecting influential system structure. System Dynamics Review 20(1): 1-20. Oliva, R. 2004. Model structure analysis through graph theory: Partition heuristics and feedback structure decomposition. System Dynamics Review 20(4): 313-336. Sterman, J. 2000. Business dynamics: Systems thinking and modeling for a complex world. Irwin, McGraw-Hill. Strauss, A. L. and J. M. Corbin. 1998. Basics of qualitative research: Techniques and procedures for developing grounded theory. Thousand Oaks, Sage Publications. 157 Appendix 3-2- Feedback loops for multiple-release product development The following set of feedback loops are developed as part of the research leading to the essay "Dynamics of Multiple-release Product Development." The role of loops in the research process is briefly discussed in Appendix 3-1. These loops include several different hypotheses that came out of discussions with members of the product organization. Only a subset of these loops are captured in the detailed simulation model (Appendix 3-3) and a smaller fraction of the most important ones in the original essay. In each page a set of loops which relate to a distinct area of product development and sales is depicted. Short descriptions of loops are also provided. 158 0 n~ C E 0 0 a V cn C cn a 2 = C 0' cn CC 0 A. 0 0) L C) 40) C,) Q O C 4 cn 4-, 0 a) Q) O .°i ,0 C O 0 a) e D O:2 0 o 0 Io , a *) 2Qa)a oL. U1) - U E) , U) E - CC - U,) a 0)4-' L I.. -W 0 a) m co a) Y L CLU C a) (9i o a C: a0 ._ C (U Icr C oC LL E 2 -C, L. 0 0 I.. 0 ._ \ la C E o0 ,- C '0C L r Ucu .n 0 CL o a) , 0C am- rPn 0 cm o OCD oaE> o 0 ) O~~ Uf) _ IL. CO U)~~a 0 a U) _0 4-- O 4CUU 0 o a = E (U .C U O . 4 _ Y Lo . 0Ita CD C. a) 8'n L- O Q) 0O O 0 C~~~L O O = 0 E 0. a. Z° zO UC) 0SC CU DC <B 0 L- o 4.,(L 0 4..CD= 0- U y h C 0 0n 3O °O2 ' 0 *r mL~ D v) -a c) 0 -c 4-' cnO I_ U, cb0 U4 - O L 0~ U) c 0 0 ca or L ' . 9 AI wY at ° * c U) I ) U) (I) C U) a>~~~~~U cr IX ro a)M- CD E 0~ cn 0) 0, C,4)O -~U 4C) In E +I ¢ 4 a) c I0) 0 I o~ E .1 _ cT I O c5a r 0) CU o c O O 0 0UU i rcn e m a Ea EL \ o :D~~~~~~I + c /0 +r 0 L1) ~~~4-' 0c 0 ~ aI) 8D n L Oc aU *) o s O _D t CUa O c @C -l j _ 'l oC) a) C, o .- U + 1 t O '4 IC0 - _ .. )rE U cn ._ > 0 a a a) 0o ( C- a. I'4 C a) 0). m CD,, CD) )>) rm) L- *10 =pn u)~-Q ojo:E C CL 2 a- .. U wa( 0 ·z~ l~~ . .0OCO ..w- <-1. _ -a 0 o> o ) - § a~ C0UC C o,0 a, (D 0 Q 'P .- C °e O cn c: 0 /t *4 D a 0 c n cn O tS U 5' C; cr o 0o cr3 _C 7.. It LL 0 _o C a, - -% L) 0 L) a Er LI 2 o .r E E L) a C) ac ~-0 o 8 I) CD a) Lr- -.- 0 C I.. OD a)a a20 , -- L .. CL o 0 $,-. .U P o fio 0 na,0~~ I a, E~ O. -o 1 I- c °0 3- CoWCa-C : ~~~0 o1 0 t)- o n . . 0) C _ o mt C,) -._ 2?e +o_. L-4> (I O30 x C ) zan + , O 0ci/ 0> Q) ~~~~~~~~~~~~~~~~~3) p 0 I= 4- c ._ §) o L. ' O _ fmm . = ,U 9 I a) l , C, cn 0)\ t 0 .&-: < E0 0 - 0 jw \F - - 0 -6-0 w a ) (D t 0Ds0 _ E W 0 U) $ 1L \ f 0) (0 o~~~~~' ~0 E ,UL0 C O *0o v~L EIE C]._'< 0 ~~~~0 a~~~~~a >0 E,,-.N 0 40 *2~2O :)C UEar L.2VE I- 0O U) -a 0. E 0 Ui) C 4- co u U1) · 1 4-a a) 4- +~ U) U)a o0 o ODO o~~1 0 0 U + 00 E .0 , .- - . *-) n LL 0 I 0~~ O) LZ o2 +E E 0 -+ C 0 m 2sP Z0 B 0 -C o .c ) L _0 o C coC a: C co aCD o ,/ 4- () .00 L) 't-~ E LU 2 -E Un aD 0(I,E aa IL + co zU) 0 0 0 .4. 0O ci) Co I -MaL .~ U ca of L; 4-a . O O C icn .7 .D 0 l C C) a) cn L u4CO, 0 c H _ L h. cn F/ _ 0 - S a em U -C 0 90 (D u L o ,.- X, C\u Q) + +Uo U. Q,>C/) EZ I. 'w0) ,o E ae 0g a) Ao .. >s Y-w -- oU)O L·~~~ ~~LA C 0O 0o Ln .o-a 0 rF~~~~~~~~U a, a. 0a) O Co LQ 0.0 EU) i-) o(L) j""' 4- L Qr Q~~~~~~: DO r. t(n a) (U -a ) -t cn O O 0 a) v \o 0 CD nC a0 C/) a> cDa) o o c a 0 0 cn ar -4-a CU 0 a, a, 0 .. ) E 'o0, C) Ocua 00f= o 2 m~~~ 0~~~ E i O := \ E o o o 0o -- I )/ 0 ' a E L- ) '-c °C )cn 0, 0 0co , ocn o0 ~S n I/, / + /4 0 C:/) ~E aU a / 4- 0 7 .U EO. 0 OO n a) en C 0O -' .'o a) u8 , o n I ( o P + C) 0 0 aO O (9 0 2 + 0 3§ cC , I - .,.R i.T_/ E o E n Q,I L. cn 0Z a, D 0n cn 8 a- V E C) g i'3 - 80;~5 0 co O ' 0 8 () m 730 - -C cn m -ia o=o aQ) a) 0) ' a -to m c D - CD0 - w a) C 4 . a0) ) >5 (n . X L- - :0 ) -C t n 0 E 0 co 0 cn a) o) 50 "U) 0n c 00 5r 0mo cu co 0)D a)o o Coo ,o-2 o..E cn0 c 0) 0) 0 . -. 2 - . ,r O E4)* 0m 0 0LL 50 )' 0Z 0 -0 cn Co 0 0 = 0 c0 0 '8 oo U) 0 x 0 0 cu a) U) *a U U-1i 1-i ,E 0F CD I- Cn i I! Appendix 3-3- Documentation for detailed multiple-release PD model In this appendix the model formulations for the detailed simulation model (discussed in Appendix 3-1) are documented. This model was produced as part of the research project leading to the essay "Dynamics of Multiple-release Product Development." The model, however, was too complex to be discussed in the essay. Appendix 3-1 elaborates on the role of this model in the research process and discusses some of the testing and analysis tools developed for working with this model. The model is broken into different views, each focusing on a different aspect of the development process and other related dynamics. Each view is followed by the equations of the variables introduced in that view, sorted alphabetically. At the end of the document, all the variables of the model are sorted alphabetically and their group/view is listed. Therefore you can find where to look up the equations of any variable by looking up the group name at the end of the document and going to the equations for the relevant group. This should prove most useful for tracking the shadow variables in each view. The color codes used in these views are explained in Appendix 3-1. The model includes design, development, testing, defects in the field, bug fixing, underlying code base quality, scheduling, and sales and financial metrics of 6 consecutive releases of a software product. Several major feedback structures relevant to quality and delays in software projects are captured in the model. The main ones include: resource sharing between different releases and current engineering; effects of pressure on productivity and error rate; code base and architecture quality on error rate and design quality; architecture complexity on productivity; bug fixing on scope; re-planning on re-plan threshold; quality, delay, and features on sales; profit on resources; and endogenous start of different releases and endogenous sales. At least three pieces of formulation in this model offer relatively new and potentially useful options for modeling some common problems and therefore may merit closer attention. These pieces are briefly mentioned below and the references to the relevant model sector are provided. Coflow of tasks and errors - Historically, project dynamics' formulations of tasks and errors either divide tasks into correct/flawed categories, or use a coflow of"task quality" (changing between 0 and 1) or "changes" (required changes accompanying the flow of tasks) that goes with tasks. In the current model I have used an alternative formulation that tracks the number of flaws 169 in the tasks developed, and uses the density of flaws in a consistent formulation to determine chances of discovery of problems in quality assurance phases. This formulation is most useful to follow the logic of many projects in which problems are conceptually different from tasks. For example, in the software development process a module of code (a task) may have several flaws (bugs) that can be discovered (become an MR: Modification Request) or not, and the density of these problems determines the chances of their discovery in different testing phases. Details of this formulation can be seen in "Development" and "Test" views of the model. Tracking the problems in the field - As products are sold to customers, the bugs inside the products also flow into different customer sites. These bugs can be independently found by different customers, reported to the company, fixed, and patched for all, or part, of the existing customer set. The coflow structure to capture these possibilities is non-trivial because each error is flowing in all of the sites, but can be found in any of them. Therefore the enclosed formulation may prove useful for other researchers who tackle similar modeling problems. "Defects in the Field" view of the model elaborates on this formulation. Resource anticipation between different releases with different priorities - In allocating resources between different overlapping releases of a software, the project managers for each release should perceive what the priority of their release is (depending on where in the queue of projects it is located) and plan their schedules and expected resources accordingly. While in logic this is simple, the implementation of the formulation in SD software is not without challenges because as different releases start and finish, their priority changes, and their perceived priority should adjust according to their current position in project priority queue. The formulations on the "Scheduling" view of the model offer one solution to this modeling challenge. This model includes 448 symbols, 17 lookups, 220 levels, 1151 auxiliaries, and 147 constants (total of 1519 variables). 170 C . E < , e ,t= I .. w 6 z o ·/ D , I ,L D _ a **** ************************** Concept and Design ******************************** Ave New Problem per Cancellation[Releasel = ZIDZ ( Problems Per Feature Interaction * Features Designed[Release] , Current Scope of Release[Release] ) Units: Problem/feature This equation assumes that cancellations have detrimental effects to the extent that they interact with other features in the code. Therefore the problems they create also correlates with fraction of features inside the code designed. However, for the part of code not designed, the cancellations create no new issues, since those pieces can be designed with knowledge of cancellation. Ave Prblm in Designed Feature[Releasel = ZIDZ ( Problems in Designed Features[ Release], Features Designed[Release] ) Units: Problem/feature Average number of design problems in designed features. Ave Problem in Features to Design[Release] = ZIDZ ( Problems in Approved Features[ Release] , Features to Design[Release] ) Units: Problem/feature Average number of (coherence) problems in features to be designed. Chng Start Date[R1] = 0 Chng Start Date[Later Releases] = if then else ( Endogenous Release Start Date[ Later Releases] = FINAL TIME, if then else ( VECTOR ELM MAP ( Fraction code Perceived Developed[ RI], Later Releases - 2) > Threshold for New Project Start, 1, 0) * ( Time + I - Endogenous Release Start Date[Later Releases ] ) / TIME STEP, 0) Units: Dmnl This equation puts the release start date for later projects (not first release) at the TIME+1, as soon as the pro ect before them is completed to a Threshold level in development phase. Current Scope of Release[Releasel = Features Designed[Release] + Features to Design[ Release] Units: feature Defect Intro in Coherence of Feature Set[Releasel = Error Rate in Original Feature Selection[ Release] * Features Approved[Release] Units: Problem/Month Des Time Change[Release] = if then else ( "Release Started?"[Release] = I :AND: Sanctioned Design Complete[Release] = 0, Allocated Design Finish Date[ Release] / TIME STEP, "Release Started?"[Release] * (Allocated Design Finish Date[ Release] - Sanctioned Design Complete[Release] ) / Time to Replan ) Units: Dmnl We set new desired times for finishing the design based on what we get allocated. 172 Design[ReleaseJ = Features to Design[Release] / Design Time Left[Release] Units: feature/Month The development of requirements is enforced to finish by the time that designs are scheduled. Design Time Left[Release] = Max ( Min Des Time, Sanctioned Design Complete[ Release] - Time ) Units: Month Design time left depends on the scheduled time for design and a minimum time needed to make any design changes. Eff Bug Fixing on Scope of New Release[Releasel = T1 Eff Bug Fix on Scope ( Relative Size of Expected Bug Fixing[Release] ) * SW Eff Bug Fix on Scope + I - SW Eff Bug Fix on Scope Units: Dmnl The effect of bug fixing on scope of new release depends on how much bug fixing is there compared to what we are planning to accomplish in the next release, if there was no bugs. Eff Old Archtctr on Design Error = Tl Eff Old Arch on Design Problems ( Density of Problems in Architecture ) * SW Eff Old Arch on Design Problems + 1 - SW Eff Old Arch on Design Problems Units: Dmnl The effect of quality of old design and architecture on how good a job we do in design of new pieces. Endogenous Release Start Date[R1] = INTEG( Chng Start Date[R1] , 0) Endogenous Release Start Date[Later Releases] = INTEG( Chng Start Date[Later Releases ], FINAL TIME) Units: Month This release start date is set to start a release as soon as the former one passes a threshold in % of code developed. Error Rate in Feature Design[Release] = Normal Error Rate in Feature Design[ Release] * Eff Old Archtctr on Design Error Units: Problem/feature The error rate in designing features. Includes low quality requirements and unclear specifications etc. Depends on the quality of old code and architecture.. Error Rate in Late Features[Releasel = Error Rate in Original Feature Selection[ Release] * Late Feature Error Coefficient Units: Problem/feature The error rate in the design of late features. Error Rate in Original Feature Selection[Release] = 0.1 Units: Problem/feature Rate of problems in coherence of combinations chosen for new features. 173 Exogenous Release Start Date[Release] = 0, 6, 12, 18, 24, 30 Units: Month Release Start Dates, as an exogenous parameter Feature Cancellation[Releasel = Code Cancellation[Release] / Modules for a feature Units: feature/Month The speed of cancellation of features already designed, under time pressure. Feature Requests = SUM ( Features Approved[Release!] ) Units: feature/Month The rate of request for new features, mainly from customer needs and market pressure. For the equilibrium, it can be put at the features approved level so that there is no net change in the total number of features under consideration (assuming no cancellation) Features Approved[Release] = Features Under Consideration * Scope of New Release[ Release] / TIME STEP * "Ok to Plan?"[Release] Units: feature/Month At the time of OK to Plan, a fraction of total features under consideration are finalized to be designed. Features Designed[Release] = INTEG( Design[Release] - Feature Cancellation[ Release], 0) Units: feature Number of features designed and ready to be coded Features to Design[Release] = INTEG( - Design[Release] + Features Approved[ Release] + Late Feature Introduction[Release], 0) Units: feature Number of features to be designed before being coded. Features Under Consideration = INTEG( Feature Requests - SUM ( Features Approved[ Release!] - Feature Cancellation[Release!] ), Initial Features Under Consideration ) Units: feature The number of features under consideration for future releases. Fraction code Perceived Developed[Release] = 1 - XIDZ ( Perceived Estimated Code to Write[ Release] , Perceived Total Code[Release], 1) Units: Dmnl The fraction of code for each release that is perceived to be left for development. Initial Features Under Consideration = 2000 Units: feature [0,12000] The constant for the initial number of features under consideration for each release. Late Feature Error Coefficient = 3 Units: Dmnl 174 How many times of the normal feature additions do we encounter problems with the design of late features. Late Feature Introduction[Release] = if then else ( Late Intro Time[Release ] = Time, Late Features[Release] / TIME STEP, 0) Units: feature/Month The rate of introduction of late features. Controlled as an exogenous shock here. Late Feature Problems[Release] = Error Rate in Late Features[Release] * Late Feature Introduction[ Release] Units: Problem/Month Rate of introduction of problems into approved features through introduction of late features. Late Features[Release] = 0 Units: feature The number of new features introduced at some point in the process Late Intro Time[Releasel = 15 Units: Month Time at which the late features are introduced Min Des Time = 0.25 Units: Month Minimum time needed to finish the designs, even if after schedule. Normal Error Rate in Feature Design[ReleaseJ = 0.1 Units: Problem/feature Normal Error Rate in Feature Design Normal Scope of New Release[Release] = 0.2 Units: Dmnl The normal scope of the projects, as a fraction of all the features under consideration, if there was no bug fixing. "Ok to Plan?" [Release] = if then else ( Time = Release Start Date[Release], , 0) Units: Dmnl A switch which equals 1 at the time step when we decide to start a new release. Perceived Total Code[Release] = DELAY I (Total Code[Release] , Time to Perceive Current Status ) Units: module The perceived total code for the release. Problem Cancellation[Release] = Feature Cancellation[Release] * Ave Prblm in Designed Feature[ Release] Units: Problem/Month 175 The problems are removed along with cancelled features. Problems from Cancellation[Release] = Feature Cancellation[Release] * Ave New Problem per Cancellation[ Release] Units: Problem/Month The feature cancellation can result in addition of design problems for the rest of the code, since pieces are removed from otherwise coherent architecture. Problems in Approved Features[Releasel = INTEG( Defect Intro in Coherence of Feature Set[ Release] - Problems Moved to Design[Release] + Late Feature Problems[ Release] , Features to Design[Release] * Error Rate in Original Feature Selection[ Release] ) Units: Problem Problems in Designed Features[Releasel = INTEG( Problems Moved to Design[Release ] + Problems from Cancellation[Release] - Problem Cancellation[Release ] + Problems Introduced in Design[Release] , Features Designed[ Release] * Error Rate in Original Feature Selection[Release]) Units: Problem Problems Introduced in Design[Release] = Design[Release] * Error Rate in Feature Design[ Release] Units: Problem/Month The rate of introduction of new problems depends on the design speed as well as the quality of design. Problems Moved to Design[Releasel = Design[Release] * Ave Problem in Features to Design[ Release] Units: Problem/Month Coherence problems carry forward into designed features. Problems Per Feature Interaction = 0 Units: Problem/feature Number of problems arising as a result of each feature interaction. Relative Size of Expected Bug Fixing[Release] = XIDZ ( Size of Next Bug Fix, Features Under Consideration * Normal Scope of New Release[Release ] * Modules for a feature, 3) Units: Dmnl This tells how big will the bug fixes be in the upcoming release, compared to the desired number of features. Release Start Date[Releasel = SW Endogenous Start * Endogenous Release Start Date[ Release] + ( I - SW Endogenous Start) * Exogenous Release Start Date[ Release] Units: Month Release start date can be exogenous or endogenous. Sanctioned Design Complete[Releasel = INTEG( Des Time Change[Release] , 0) 176 Units: Month This is the variable which drives time pressure to design stage. Scope of New Release[Releasel = Normal Scope of New Release[Release] * Eff Bug Fixing on Scope of New Release[ Release] Units: Dmnl The fraction of total features under consideration that we decide to include in this release. SW Eff Bug Fix on Scope = 1 Units: Dmnl Put 1 to adjust the scope of next release according to the level of bug fixing planned for it. Put 0 not to have this effect. SW Eff Old Arch on Design Problems = I Units: Dmnl Put I to have the effect of old architectural and design problems on quality of current design active. SW Endogenous Start = 1 Units: Dmnl Put 1 to have the starts of release endogenously determined, put 0 for exogenous starts. Threshold for New Project Start = 0.6 Units: Dmnl Threshod for starting a new release. As soon as the current release passes this threshold, the new release will be started. Time to Replan = 1 Units: Month This is the time constant for adjustment of expectations about release date, code complete etc, to the data showing that it is going up or down. TI Eff Bug Fix on Scope ( [(0,0)-(4,1)],(O,1),(0.452599,0.692982),(1,0.4),(1.3945,0.232456) ,(2,0.1),(2.77676,0.0263158),(4,0)) Units: Dmnl Effect of bug fixing scope on the scope of the next release Tl Eff Old Arch on Design Problems ( [(0,0)-(1,2)],(0,1),(0.269113,1.16667) ,(0.40367,1.32456),(0.562691,1.60526),(0.691131,1.80702),(0.82263,1.9386) ,(1,2)) Units: Dmnl Table for effect of architecture and code base quality on the quality of design 177 00 cr *0 E CL 0 a' a' O 10 Development Allocated Design Time Left[Releasel = Estmtd Design Time Left[Release] * Prcnt Time Allocated[ Release] Units: Month allocated design time left. Ave Dfct Dnsty in Accptd Code[Release = ZIDZ ( Defects in Accepted Code[Release ], Code Accepted[Release] ) Units: Defect/module Defect density in the accepted code is an important quality measure for us. Ave Dfct in Rwrk Code[Releasel = ZIDZ ( Defects in MRs[Release] , Code Waiting Rework[ Release] ) Units: Defect/module The number of defects that can be fixed by fixing the MR. Ave Dfct in Tstd Code[Releasel = Code Covered by Test * Dfct Dnsty in Untstd Code[ Release] Units: Defect/Test The average number of defects that one can expect to find in a test, given the average density of defects. Bug Fixes Designed[Releasel = Code to Fix Bugs[Release] Units: module/Month The bug fixes included for the upcoming release. Code Accepted[Releasel = INTEG( Test Acceptance[Release] , 0) Units: module Code Cancellation[Release] = Code to Develop[Release] * Frctn Code Cancellation[ Release] Units: module/Month Under pressure we can cancel some of the codes that are designed to be written. This can be both intentionally, or unintentional. Code Covered by Test = 1 Units: module/Test This should be a function of test coverage Code Developed[Release] = INTEG( Development[Release] + Rework[Release] - Test Acceptance[ Release] - Test Rejection[Release] , 0) Units: module The amount of code developed, but yet not successfully tested. 179 Code Passing Test[Releasel = Totl Code to Tst[Release] * Frctn Tst Running[ Release] Units: module/Month The rate of code testing is assumed to be proportional to the amount of test that is performed out of total tests to be performed. This assumption is OK for products which run only the related fraction of the tests. Code to Develop[Releasel = INTEG( Design Rate[Release] - Development[Release ] - Code Cancellation[Release] + Bug Fixes Designed[Release ], 0) Units: module We start from no new code to develop, since no requirement is written at the beginning. Code Waiting Rework[Releasel = INTEG( Test Rejection[Release] - Rework[Release ], 0) Units: module The total amount of code waiting rework. Current Dev Schedule Pressure[Releasel = ZIDZ ( Estmtd Dev Time Left[Release ,Current], Development Time Left[Release] ) Units: Dmnl The schedule pressure depends on how long we have left for the development phase vs. how long we estimate this phase to take. Note that this schedule pressure is based on observed productivity, rather than the normal/base productivity. Therefore it is anchored on what is culturally considered right productivity in the organization not an absolute level of productivity. Therefore it should not be used for determining productivity levels. Defects Accepted[Releasel = ( Tstd Code Accptd[Release] * Dfct Dnsty in Accepted Code[ Release] + ( Test Acceptance[Release] - Tstd Code Accptd[Release ] ) * Dfct Dnsty in Untstd Code[Release]) Units: Defect/Month The defects can be either in the part tested, or in the part not tested. The density of defect in the untested part is same as average untested code, while the density in the tested part is calculated based on Poisson distribution of defects. Defects Fixed[Releasel = Rework[Release] * Ave Dfct in Rwrk Code[Release] * Frac Errors Found in Rwrk Units: Defect/Month It is assumed that a fraction of the defects in the reworked code are removed, and a fraction remain undiscovered. Defects found[Release] = Test Rejection[Release] * Dfct Dnsty in Rejctd Code[ Release] Units: Defect/Month Defects are carried with the code, the higher the rejection rate and the density of defects in the code rejected, the higher will be the defects that are found. Defects from Dev[Releasel = Development[Release] * Error Rate New Dev[Release ] Units: Defect/Month 180 The defects that come from development depend on the error rate and the rate of developement we do. Defects Generated[Releasel = Defects from Dev[Release] + Dfct from Rwrk[Release ] Units: Defect/Month The total defect generated is a function of both defects in the new code as well as in the old code. Defects in Accepted Code[Releasel = INTEG( Defects Accepted[Release] , 0) Units: Defect The number of defects in the accepted code. Defects in Code Developed[Releasel = INTEG( Defects Generated[Release] - Defects Accepted[ Release] - Defects found[Release] + Dfcts Left in RWrk[Release ], 0) Units: Defect The defects from the code of last release are carried over to the current release. Defects in MRs[Release] = INTEG( Defects found[Release] - Defects Fixed[Release ] - Dfcts Left in RWrk[Release] , 0) Units: Defect We find defects through testing and deplete them by either submitting reworked code that has defects or by fixing them. Design Rate[Release] = Design[Release] * Modules for a feature Units: module/Month The total number of modules to be developed depend on the modules in one feature, and the total number of features to be developed. Desired Development Rate[Release] = Code to Develop[Release] / Development Time Left[ Release] Units: module/Month The development rate needed to finish the coding on time for code complete Dev Reschedule[Releasel = if then else (Allocated Dev Finish Date[Release] > Time :AND: Sanctioned Code Complete[Release] = 0, Allocated Dev Finish Date[ Release] / TIME STEP, if then else (Allocated Dev Finish Date[ Release] > Time, 1, 0) * ( Allocated Dev Finish Date[Release ] - Sanctioned Code Complete[Release] ) / Time to Replan) Units: Dmnl The (re)scheduling of the code complete date. Dev Resources to New Dev[Releasel = Min ( ZIDZ ( Dsrd New Dev Resources[Release ], Dsrd Dev Resources[Release] ) * Development Resources to Current Release[ Release] , Dsrd New Dev Resources[Release] ) Units: Person The resources between rework and new development are distributed based on how much need there is for each, proportionately. 181 Dev Resources to Rework[Release] = Min ( ZIDZ ( Dsrd Rwrk Resources[Release ] , Dsrd Dev Resources[Release] ) * Development Resources to Current Release[ Release] , Dsrd Rwrk Resources[Release] ) Units: Person The resources between rework and new development are distributed based on how much need there is for each, proportionately. Development[Releasel = Feasible New Dev Rate[Release] Units: module/Month The development rate is based on resources allocated for development. Development Time Left[Releasel = Max ( Min Dev Time, Sanctioned Code Complete[ Release] - Allocated Design Time Left[Release] * ( 1 - "Frctn Des-Dev Overlap" ) - Time) Units: Month If we run over the code scheduled date, we can't push for faster than a minimum finish time for development. Dfct Dnsty in Accepted Code[ReleaseJ = EXP ( - Tst Effectiveness * Ave Dfct in Tstd Code[ Release] ) * Ave Dfct in Tstd Code[Release] * ( 1 - Tst Effectiveness) Units: I)efect/module The defect density in accepted code is based on Poisson distribution of errors across code, leading to a chance of not discovering any defects for each test that has n errors, therefore we can calculate the expected error per accepted test. Dfct Dnsty in Rejctd Code[Releasel = Max ( 0, ZIDZ ( Totl Code Tstd[Release] * ( Dfct Dnsty in Untstd Code[Release] - Dfct Dnsty in Accepted Code[ Release] ) + Test Rejection[Release] * Dfct Dnsty in Accepted Code[ Release] , Test Rejection[Release] ) ) Units: Defect/module We can't change the density of error in the untested code based by simply testing it. Therefore the density of defect in the exiting code (accepted or rejected) should be the same as that in the stock of code. This equation is based on the free mixing of errors in the stocks of code. Dfct Dnsty in Untstd Code[Releasel = ZIDZ ( Defects in Code Developed[Release ], Code Developed[Release]) Units: Defect/module The density of Defects in the code developed but not tested. Dfct from Rwrk[Releasel = Rework[Release] * New Error Rate in Rwrk[Release] Units: Defect/Month the rate of errors created by rework depend on the error rate and the amount of rework we do. Dfcts Left in RWrk[Releasel = Rework[Release] * Ave Dfct in Rwrk Code[Release ] * ( I Frac Errors Found in Rwrk ) Units: Defect/Month A fraction of errors are not corrected in each iteration of rework, since not all defects are caught by testing at the first place. 182 Dsrd Dev ResourceslReleasel = Dsrd New Dev Resources[Release] + Dsrd Rwrk Resources[ Release] Units: Person Desired development resources for new code and rework. Dsrd New Dev Resources[Releasel = Max ( Desired Development Rate[Release] / Normal Productivity[Release] , 0) Units: Person Desired new development rate depends on the productivity in normal situations and the rate of development desired. Dsrd Rwrk Rate[Release] = Code Waiting Rework[Release] / Dsrd Time to Do Rwrk Units: module/Month The desired rework rate depends on the amount of rework in backlog and the how fast we want to do that rework. Dsrd Rwrk Resources[Releasel = Max ( 0, Dsrd Rwrk Rate[Release] / Normal Productivity[ Release] ) Units: Person The desired resources for rework depend on the normal productivity of people and the desired rework rate. Dsrd Time to Do Rwrk = 0.25 Units: Month We want to do our rework in about a week. Eff Archtcr Complexity on Productivity = Max ( 1, Accmltd Code Accepted / Minimum Code Size With an Effect ) A ( Strength of Complexity on Productivity Effect * Density of Problems in Architecture ) Units: Dmnl The effect of code complexity on productivity not only depends on size of the code base, but also on the quality of the architecture. A power law is assumed for the shape of function. "Eff Old Code & Archtctr on Error" [Release] = "TI Eff Old Code & Archtctr on Error" ( Weighted Error Density in Architecture and Code Base ) * SW Eff Old Code and Archtctr on Error + 1 - SW Eff Old Code and Archtctr on Error Units: Dmnl Effect of old code and architecture on the quality of development in the current release depends on the overall quality of old code base and architecture. Eff Pressure on Error Rate[Releasel = Tl Eff Pressure on Error Rate ( Natural Dev Schedule Pressure[ Release] ) * SW Eff Pressure on Error + I - SW Eff Pressure on Error Units: Dmnl Effect of work pressure on error rate is created because under work pressure people take short cuts, skip unit testing, don't study the requirements well enough, don't do code reviews, don't 183 develop as good a design based on given requirements etc. This effect is not only on quality of coding, but also exacerbates the problems with quality that result from poor design and architectural issues. Eff Pressure on Productivity[Releasel = TI Eff Pressure on Productivity ( Natural Dev Schedule Pressure[Release] ) * SW Eff Pressure on Productivity + I - SW Eff Pressure on Productivity Units: Dmnl The effect of schedule pressure on productivity is captured by a table function. Error per Design Problem in Features = 10 Units: Defect/Problem How many coding defects each design problem can create. Error Rate New Dev[Release] = Errors from Coding[Release] + Errors from Design and Architecture[ Release] Units: I)efect/module The number of defects found in a module, in average. Errors from Coding[Releasel = Normal Errors from Coding * Eff Pressure on Error Rate[ Release] * "Eff Old Code & Archtctr on Error"[Release] Units: Defect/module The error rate depends on the inherent quality of the workforce (Normal errors) as well as the work pressure they are working under and the quality of the code and architecture from old releases. Errors from Design and Architecture[Release] = Ave Prblm in Designed Feature[ Release] * Error per Design Problem in Features / Modules for a feature * Eff Pressure on Error Rate[Release] Units: Defect/module Errors from design and architecture have a base rate which depend on quality of architecture, but can get worse depending on the work pressure. Feasible New Dev Rate[Releasel = Dev Resources to New Dev[Release] * Productivity[ Release] Units: module/Month The development rate that is feasible based on availability of resources for development of new code. Feasible Rwrk Rate[Release] = Dev Resources to Rework[Release] * Productivity[ Release] Units: module/Month Feasible rework rate from resources depends on the availability of resources as well as their productivity. Frac Errors Found in Rwrk = 0.7 Units: Dmnl The defects per module for reworked code. 184 Frctn Code Cancellation[Release] = TI Eff Schdul Prssr on Cancellation ( Current Dev Schedule Pressure[ Release]) * Maximum Cancellation Fraction Units: /Month The cancellation fraction is a funciton of schedule pressure and can not exceed some maximum level. "Frctn Des-Dev Overlap" = 0.5 Units: Dmnl The fraction of overlap between design and development phases is assumed constant, while it really is not. The impact is negligible due to short design phase. Maximum Cancellation Fraction = 0.5 Units: 1/Month There is a limit on the cancellation of features feasible. Min Dev Time = 0.25 Units: Month The minimum time for development of the remaining code, used when we go beyond schedule. Minimum Code Size With an Effect = 10000 Units: module The minimum size of code where effect of complexity is observable. Modules for a feature = 10 Units: module/feature Number of modules needed for a new feature. Natural Dev Schedule Pressure[Release] = ZIDZ ( Estmtd Dev Time Left[Release ,Normal], Development Time Left[Release] ) Units: Dmnl The natural schedule pressure is felt based on what one sees to be the pressure relative to natural productivity. New Error Rate in Rwrk[Release] = Normal Error Rate in Rwrk * Eff Pressure on Error Rate[ Release] Units: Defect/module The error rate for creating new errors in the reworked code. It depends on different factors including the work pressure. Normal Error Rate in Rwrk = 0.5 Units: Defect/module The normal error rate in rework phase. Normal Errors from Coding = 0.5 Units: Defect/module 185 Average number of defects in coding that are created in each module, if the schedule pressure is very low. Normal Productivity[Release] = 20 Units: module/(Month*Person) [10,30] Normal productivity represents the work rate of an average worker, if s/he follows good processes and does not take any shortcuts. Productivity[Release] = Normal Productivity[Release] * Eff Pressure on Productivity[ Release] * ( I / EffArchtcr Complexity on Productivity * SW EffArchtcr Complexity on Productivity + 1 - SW Eff Archtcr Complexity on Productivity) Units: module/(Month*Person) The productivity depends on the work pressure and code complexity effects. Rework[Releasel = Feasible Rwrk Rate[Release] Units: module/Month Rework rate depends on feasible rate given by the current resource availability. Sanctioned Code Complete[Releasel = INTEG( Dev Reschedule[Release], 0) Units: Month This is the time people think is OK to have the code complete, based on the changing pressures in the field. It is not clear if the code complete is so well set. Strength of Complexity on Productivity Effect = 1 Units: feature/Problem The exponent for the power law governing the effect of complexity on productivity SW Eff Archtcr Complexity on Productivity = 0 Units: Dmnl Put I to get the effect of code complexity and architecture quality on productivity, put 0 to avoid that. SW Eff Old Code and Archtctr on Error = 1 Units: Dmnl Put 1 to have the effect of old code and architecture on error rate on, put zero to remove that effect. SW Eff Pressure on Error = 1 Units: Dmnl Put I to get the feedback from schedule pressure on error rate. Put 0 to get no such feedback. SW Eff Pressure on Productivity = I Units: Dmnl Put I to have the effect of work pressure on productivity on, put 0 to switch that effect off. Test Acceptance[Releasel = Code Passing Test[Release] - Test Rejection[Release ] Units: module/Month 186 The code accepted in testing is what is not rejected, which in turn depends on the coverage as well as total test done. Test Rejection[Releasel = Totl Code Tstd[Release] * Tst Rjctn Frac[Release] Units: module/Month Testing rejects in a fraction of cases, and therefore creates proportional amount of rework. "TI Eff Old Code & Archtctr on Error" ( [(0,0)-(1,3)],(0,1),(0.16208,1.17105) ,(0.333333,1.68421),(0.480122,2.31579),(0.663609,2.75),(0.844037,2.92105) ,(1,3)) Units: Dmnl The shape of this table funciton might be not S-shape since the worse the old architecture is, the more problematic new coding will be, yet there should be some saturation level, hence the current formulation.\!\! Ti Eff Pressure on Error Rate ( [(0,0)-(4,4)],(0, I),(1,1),(1.27217,1.22807) ,(1.48012,1.5614),(1.71254,2),(2.01835,2.36842),(2.42202,2.72368) ,(3,3),(10,4)) Units: Dmnl Table for effect of time pressure on error rate. Ti Eff Pressure on Productivity ( [(0,0)-(3,2)],(0,0.8),(0.623853,0.885965) ,(1,1),(1.27523,1.16667),(1.54128,1.38596),(1.85321,1.54386),(2.37615,1.65789) ,(3,1.7),(10,2) ) Units: Dmnl Table for the effect of time pressure on productivity. Ti Eff Schdul Prssr on Cancellation ( [(0,0)-(10, 1)],(0,0),(1,0),(2,0),(10,0) ) Units: Dmnl This table represents the reaction of the development group by cancelling features to the schedule pressure they perceive. Total Development[Releasel = Development[Release] + Rework[Release] Units: module/Month Total development consists of new development and the rework for the current release. Totl Code to Tst[Release] = Code Developed[Release] + Code to Develop[Release ] + Code Waiting Rework[Release] Units: module The total code that needs testing include those pieces developed, those under rework, and the parts that need to be developed. Totl Code Tstd[Releasel = Code Covered by Test * Test Rate[Release] Units: module/Month the total amount of code that is really tested, depends on the number of tests and the coverage of each test. Tst Effectiveness = 0.9 187 Units: Dmnl The effectivness of tests in finding bugs, if they are in the area that is tested. Tst RjectnFrac[ReleaseJ = I - EXP ( - Tst Effectiveness * Ave Dfct in Tstd Code[ Release] ) Units: Dmnl The fraction of tests that reject are calculated based on the assumption that defects are randomly distributed in the whole code base. Therefore the number of defects per test is a Poisson distribution with mean of'Ave Dfct in Tstd Code'. Assuming an independent chance of'Tst Effectiveness' for finding each of those errors, we can find the chance of being rejected by finding the summation of that chance over all number of errors (O to infinity). Tstd Code Accptd[Release] = Test Rate[Release] * Code Covered by Test * ( I - Tst Rjctn Frac[Release]) Units: module/Month The portion of tested code that gets accepted, depends on test coverage and the fraction of acceptance. 188 Cm a) ACU a) FQo H 1= a) , m -oa)u LL C C 00 C, I -I 4) I A AO I a) a, 00 \ -cA -C O ° I\ C v\ < T o A\ a)a ) 0 -a V 'I O E .o_ _ L) Test ******************************** Demand for Test Resources[Releasel = Tst Rate Feasible[Release] / Test Resource Productivity Units: Person The total demand for test resources across ongoing releases. Frac Total Tst Feasible[Releasel = TI Tst Dependency on Dev[Release] ( Fraction Code Written[ Release] ) Units: Dmnl This is the fraction of total tests that are ready to be executed. We can't run tests without having them ready, because of interdependencies etc. Blocking issues are included in this table function. Fraction Code Accepted[Release] = ZIDZ ( Code Accepted[Release] , Total Code[ Release] ) Units: Dmnl The fraction of the total code that is done and tested. Fraction Code Written[Releasel = ZIDZ (Code Accepted[Release] + Code Developed[ Release] , Total Code[Release]) Units: I)mnl Fraction of code that is written and either tested or ready to test. This determines the fraction of tests one can do. Note that the inputs to this do not include any delays, which assumes that the delays are relatively small for test to know where the code is. Frctn Tst Running[Releasel = ZIDZ ( Test Rate[Release] , Tests to Run[Release ] ) Units: /Month The fraction of total tests that is being run at this time Min Tst Time = 0.1 Units: Month The minimum time needed to run a test. Retest Required[Releasel = Test Rate[Release] * Tst Rjctn Frac[Release] Units: Test/Month The tests that are rejected need to be redone, after the code is revisited. Test Coverage = 0.5 Units: Dmnl How much of the code the test set to be designed covers Test Design[Release] = ( Design[Release] * Modules for a feature + Bug Fixes Designed[ Release] ) / Code Covered by Test * Test Coverage Units: Test/Month 190 The tests are assumed to be designed as we design the features or bug fixes. So test design depends on how fast we develop the requirements, what is our test coverage, how big is each requirement, and how much code each test covers. Test Feasible[Release] = Max ( 0, Frac Total Tst Feasible[Release] * Total Tests to Cover[ Release] - ( Total Tests to Cover[Release] - Tests to Run[Release ] ) ) Units: Test The feasible tests are those that are not already done and are feasible based on the precedence relationships. Test Ran[Releasel = INTEG( Test Rate[Release] - Retest Required[Release] , 0) Units: Test The number of tests that are already done and have passed. In coflow with code, it parallels the code accpeted stock. Test Rate[Releasel = Tst Rate from Resources[Release] Units: Test/Month The test rate depends on both availability of resources that are requested based on feasibility of tests considering the current development leve Test Resource Productivity = 50 Units: Test/(Month*Person) The productivity of test personnel for doing tests. Test Resources Allocated[Releasel = Min ( Demand for Test Resources[Release ], ZIDZ ( Demand for Test Resources[Release] * Testing Resources, SUM ( Demand for Test Resources[Release!] ) )) Units: Person The test resources are given to the release based on availability of those resources and proportional to requests. Testing Resources = 20 Units: Person The total number of testing resources available. Assumed to be fixed here. Tests to Run[Release] = INTEG( Retest Required[Release] + Test Design[Release ] - Test Rate[Release], 0) Units: Test Note that in a coflow of tests with code, this stock represents all the code that is not yet administered, i.e. code to develop, developed, or waiting rework. TI Tst Dependency on Dev[Release] ( [(0,0)-(1,1)],(0,0),(0.9,0),(0.95,0.1), (1,1),(1.1,1)) Units: Dmnl The shape of the curve suggests what fraction of our tests we can do based on the current level of development. 191 Total Code[Releasel = Code Accepted[Release] + Totl Code to Tst[Release] Units: module The total code Total Tests to Cover[Release] = Test Ran[Release] + Tests to Run[Release] Units: Test Total number of tests that are designed and can be run. Tst Rate Feasible[Release] = Test Feasible[Release] / Min Tst Time Units: Test/Month The maximum feasible testing rate depend on what amount of tests we can potentially administer and how long each test takes an minimum. Tst Rate from Resources[Release] = Test Resources Allocated[Release] * Test Resource Productivity Units: Test/Month The resources for testing can put an upper bound on the amount of testing we can do. 192 3 o r a. n . a El el) LY cr l ******************************** Defects in the Field ******************************** Ave Def with Patch[Releasel = ZIDZ ( Defects with Patch[Release] , Sites Installed[ Release] ) Units: Defect The average number of defects in sites for which we have already developed a patch Ave Known Def in Site[Release] = ZIDZ ( Known Defects in Field[Release] , Sites Installed[ Release] ) Units: Defect The number of defects known in a typical site. Ave Unknown Def in Site[Releasel = ZIDZ ( Unknown Defects in Field[Release] , Sites Installed[Release]) Units: Defect The average number of unknown defects that is in use in a typical site Average Defects Patched[Release] = ZIDZ ( Defects Patched[Release] , Sites Installed[ Release] ) Units: Defect The average number of defects patched in a typical site. Average Outstanding Defects = Total Ave Defect - SUM ( Average Defects Patched[ Release!] ) Units: I)efect This is the number of defects outstanding in the sites, an overall measure of quality Average Time in the Field = 30 Units: Month The product remains in the field for about 50 month Calls from Defects With Patch[Releasel = Defects with Patch[Release] / Time to Show up for Def with Patch Units: Defect*Site/Month Number of calls coming from defects for which a patch exist Calls from Known Defects[Release] = Known Defects in Field[Release] / Time to Show Up for Defects Units: Defect*Site/Month The number of defects that show up again as customer calls, even though we know about them. Calls from Unknown Defects[Release] = Unknown Defects in Field[Release] / Time to Show Up for Defects Units: Defect*Site/Month 194 The number of MRs created by real, unknown problems in the customer site. Defect Covering by Patch[Release] = Patch Development[Release] * Defect per Patch Units: Defect/Month The total number of defects that are covered by these patches Defect not Patching[Release] = Defect Covering by Patch[Release] * Sites Installed[ Release] * ( 1 - Fraction Sites Installing ) Units: Defect*Site/Month The number of defects for which we have developed patches, yet people have not installed those patches. Defect Patching[Release] = Defect Covering by Patch[Release] * Sites Installed[ Release] * Fraction Sites Installing Units: Defect*Site/Month The number of defects that are patched in different installation of the software in the field. Defect Show up[Releasel = Calls from Unknown Defects[Release] * Sites Installed[ Release] / Sites Reporting One Defect[Release] Units: Defect*Site/Month Every defect that is found becomes known and therefore is known for all the sites installed. Defects Patched[Release] = INTEG( Defect Patching[Release] + Installing Known Patches[ Release] - Patches Retired[Release] + Patched Defects Sold[ Release] , 0) Units: Defect*Site I assume there is no patches at the beginning of simulation, therefore there is no defect that is covered by patches Defects Retired[Release] = Products Retired[Release] * Ave Unknown Def in Site[ Release] Units: Defect*Site/Month As products are retired, some of the defects that have remained unknown also retire with the solution. Defects with Patch[Releasel = INTEG( Defect not Patching[Release] - Defects with Patch Retired[ Release] - Installing Known Patches[Release] + Unpatched Defects Sold with Patch[ Release], 0) Units: Defect*Site I assume there is no patches at the beginning of simulation, therefore there is no defect that is covered by patches Defects with Patch Retired[Release] = Products Retired[Release] * Ave Def with Patch[ Release] Units: Defect*Site/Month Defects with developed patches are retired as installations are retired 195 Fraction Defect Unknown[ReleaseJ = XIDZ ( Unknown Defects in Field[Release] , Total Defects in Field[Release], 1) Units: Dmnl Fraction of defects which are unknown. Fraction of Defects Important = 0.15 Units: Dmnl The fraction of total defects in the code shipped that can lead to customer calls and quality issues. Fraction of sites Installing All patches for a Call = 0.1 Units: /Defect Each defect that escalates results in the site installing the available patches or not. This number indicates what fraction of sites (or patches) are installed following each call. Fraction Patches Installed At Sales[Release] = 0.8 Units: Dmnl What fraction of already known problems patched, are fixed at the installation time. Fraction Sites Installing = 0.5 Units: Dmnl What fraction of sites we can persuade to install the patchs at the time they are out. Installing Known Patches[Release] = Total Customer Calls from Defects[Release ] * Fraction of' sites Installing All patches for a Call * Ave Def with Patch[ Release] Units: Defect*Site/Month Any customer call that escalates results in installing the available patches on the system, therefore removing the possible issues created by those patches Known Defects in Field[Release] = INTEG( Defect Show up[Release] - Defect not Patching[ Release] - Defect Patching[Release] - Known Defects Retired[ Release] + Known Defects Sold[Release] , Defect Show up[Release ] * Time to Show Up for Defects) Units: Defect*Site The total number of known defects, coming from unknown defects discovered or known defects sold. Known Defects Retired[Release] = Products Retired[Release] * Ave Known Def in Site[ Release] Units: Defect*Site/Month Known defects are retired as we retire the systems installed. Known Defects Sold[Release] = Total Defects Introduced[Release] * XIDZ ( Known Defects in Field[ Release], Total Defects in Field[Release], 0) Units: Defect*Site/Month The total number of known defects introduced since we known some defects exist when we are introducing the product. 196 Patched Defects Sold[Release] = Total Defects Introduced[Release] * ZIDZ ( Defects Patched[Release] + Defects with Patch[Release], Total Defects in Field[ Release] ) * Fraction Patches Installed At Sales[Release] Units: Defect*Site/Month Of total defects for which we have already made a patch, a fraction will be sold with the patches installed. Patches Retired[Releasel = Products Retired[Release] * Average Defects Patched[ Release] Units: Defect*Site/Month Defects we have fixed through patches also retire with retired installments. Products Retired[Releasel = Sites Installed[Release] / Average Time in the Field Units: Site/Month The rate of retiring the systems depends on the number of systems we have installed and how long they last in the field. Sites Installed[Releasel = INTEG( Products Sold[Release] - Products Retired[ Release], Products Sold[Release] * Average Time in the Field) Units: Site/Month The total number of sites with a version of the product installed in their site. Sites Reporting One Defect[Releasel = 1 Units: Site/Month The number of sites that report a single bug Time to Show up for Def with Patch = 10 Units: Month The defects for which patches are developed but not installed usually get longer to trigger any problem. Time to Show Up for Defects = 50 Units: Month The time it takes for an average defect to show up Total Ave Defect = SUM ( Ave Def with Patch[Release!] + Ave Known Def in Site[ Release!] + Ave Unknown Def in Site[Release!] + Average Defects Patched[ Release!] ) Units: Defect Total number of average defects, distributed between different stocks. Total Defects in Field[Release] = Defects Patched[Release] + Defects with Patch[ Release] + Known Defects in Field[Release] + Unknown Defects in Field[ Release] Units: Defect*Site The total number of patched or unpatched defects in the field 197 Total Defects Introduced[Releasel = Products Sold[Release] * ( Defects in Accepted Code[ Release] + Defects in Code Developed[Release] + Defects in MRs[Release ] ) * Fraction of Defects Important Units: Defect*Site/Month The observable defects introduced depend on the rate of sales, the average number of defects in an installation, and the fraction of those defects that can create observable issues for the customer. Unknown Defects in Field[Releasel = INTEG( Unknown Defects Introduced[Release ] - Defect Show up[Release] - Defects Retired[Release], 0) Units: Defect*Site The total number of important defects over all different sites. Unknown Defects Introduced[Releasel = Total Defects Introduced[Release] * Fraction Defect Unknown[ Release] Units: Defect*Site/Month Unknown defects get introduced since several of the defects are unknown yet! Unpatched Defects Sold with Patch[Releasel = Total Defects Introduced[Release ] * ZIDZ ( ( Defects with Patch[Release] + Defects Patched[Release] ) * ( 1 - Fraction Patches Installed At Sales[Release] ) , Total Defects in Field[Release]) Units: Defect*Site/Month A fraction of total patches are installed at the time of sales to new sites. 198 I M 9 II ,a- -62 00 L i !I £i ,;-,I r1 X - o r a77va !°S / l? 8, biO %o im '.=sa\ I.I 80 tL \ ~~ - I 75 EE· U <C*).. ,~ ~~ ~ ~ o~, v; ~ ~ . i V ~'F,i ra ; , . eL ra, . S ~~- , pD ·i'~ ******************************** Patches and Current Engineering Accumulated Customer Calls = INTEG( Inc Customer Calls, 0) Units: Defect*Site The accumulated number of customer calls from all the releases until this point in time. Accumulated Total Work = INTEG( Development and CE Rate, 0) Units: module The total accumulated work done on the product, both on development and patching. Allctd Ttl Dev Resource = Min ( Ttl Des Resource to CR, XIDZ ( Ttl Des Resource to CR * Total Dev Resources, Total Desired Resources, Total Dev Resources ) ) Units: Person We allocate the resources between current engineering and new release proportionately Alloctd Ttl CE Resource = Min ( Ttl Des Resource to CE , ZIDZ ( Ttl Des Resource to CE * Total Dev Resources, Total Desired Resources ) ) Units: Person Resources are allocated proportional to request, unless we have more resource than needed. CR Dev Resources Left After Allocation[R1] = Max ( Allctd Ttl Dev Resource - Dsrd Dev Resources[RI], 0) CR Dev Resources Left After Allocation[Later Releases] = Max ( 0, VECTOR ELM MAP ( CR Dev Resources Left After Allocation[RI] , Later Releases - 2) - Dsrd Dev Resources[Later Releases] ) Units: Person The amount of resources left for new development resources given to different future releases, after allocating the resources to all the releases before that. This is based on the allocation scheme where we give all the resources requested by the most mature release, before going to the next. Desired Patch Rate[Release] = Patch to Develop[Release] / Time to Patch Units: Patch/Month Desired speed of patching. Desired Resources to Current Engineering[Release] = Desired Patch Rate[Release ] / Productivity in Writing Patch[Release] Units: Person Desired resources for current engineering. Desired Total Dev Resources = 50 Units: Person 200 This is the total maximum resources desired to be given to this product for development and current engineering. In fact it should be tied to the amount of work expected for the product, not a stand-alone concept. Development and CE Rate = SUM (Total Development[Release!] + Patch Development[ Release!] * Modules Equivalent to Patch ) Units: module/Month The rate of development work done on the product in total (over all releases) Development Resource to Current Engineering[Release] = Desired Resources to Current Engineering[ Release] * Frac CE Allctd Units: Person The development resources allocated to developing patches across different releases are proportional to their requests. Development Resources to Current Release[Later Releases] = Dsrd Dev Resources[ Later Releases] * Frac CR Allctd * ( I - SW Higher Priority to Current) + SW Higher Priority to Current * if then else ( CR Dev Resources Left After Allocation[ Later Releases] = 0, VECTOR ELM MAP ( CR Dev Resources Left After Allocation[ RI] , Later Releases - 2), Dsrd Dev Resources[Later Releases ] ) Development Resources to Current Release[R1] = Dsrd Dev Resources[R1] * Frac CR Allctd * ( 1 - SW Higher Priority to Current ) + SW Higher Priority to Current * if then else ( CR Dev Resources Left After Allocation[R1] = 0, Allctd Ttl Dev Resource, Dsrd Dev Resources[R1] ) Units: Person We allocate the resources to each release proportional to their requests for resources. Alternatively one can allocate resources to the sooner release in the expense of the older one. Both allocations are possible here and one can choose between them using the switch. Eff Profitability on Resources Allocated = ( Tl Eff Profit on Resources ( Perceived Profitability) * Min (Time Since Release Ready[R1] / "Honey-Moon Period", 1 ) + 1 - Min ( Time Since Release Ready[RI] / "Honey-Moon Period", 1)) * SW EffProft on Resources + 1 SW Eff Proft on Resources Units: Dmnl We gradually strengthen the tie between profitability and resources allocated, as the project gets more mature (passes through honeymoon period). Frac CE Allctd = ZIDZ ( Alloctd Ttl CE Resource, Ttl Des Resource to CE) * ( 1 - SW Exogenous CE Resources ) + SW Exogenous CE Resources Units: Dmnl Fraction of CE resource request that is allocated Frac CR Allctd = XIDZ ( Allctd Ttl Dev Resource, Ttl Des Resource to CR, 1) Units: Dmnl The fraction of total resources for current development that is allocated. 201 Fraction of Calls Indicating New Defects[Releasel = ZIDZ ( Calls from Unknown Defects[ Release] , Total Customer Calls from Defects[Release] ) Units: Dmnl Fraction of calls that indicate a new defect. Fraction of Desired CR Allocated[Releasel = XIDZ ( Development Resources to Current Release[ Release], Dsrd Dev Resources[Release], I) Units: Dmnl What fraction of desired resources for current development of each release is allocated to them. "Honey-Moon Period" = 20 Units: Month The period during which the effect of profitability on resources is not strong Inc Customer Calls = SUM ( Total Customer Calls from Defects[Release!] ) Units: Defect*Site/Month Increase in the number of customer calls. Patch Developed[Releasel = ( Ave Def with Patch[Release] + Average Defects Patched[ Release] ) / Defect per Patch Units: Patch The total number of patches developed so far for different releases of the product Patch Development[Releasel = Development Resource to Current Engineering[Release ] * Productivity in Writing Patch[Release] Units: Patch/Month The rate of developing patches depends both on the productivity of developers in writing patches and on quantity of developers devoted to this task. Patch to Develop[Release] = Ave Known Def in Site[Release] / Defect per Patch Units: Patch The stock of patches pending development Perceived Profitability = INTEG( ( Profitability - Perceived Profitability ) / Time to Perceive Profitability, 0) Units: Dmnl Perceived profitibility fraction. Productivity in Writing Patch[Releasel = Normal Productivity[Release] / Modules Equivalent to Patch Units: Patch/(Month*Person) The number of patches an individual can write in a month. The productivity should be connected to what the productivity effects are for development, however, effect of work pressure is not so high since patches are written in a fairly high pressure environment anyway (customers are waiting) and effect of code complexity is not so big since people try to do the patches with the least interaction. It is red to make sure that other possible feedbacks are included in it. 202 SW Eff Proft on Resources = 0 Units: Dmnl Put 1 to have the effect of profitability on resources allocated, put 0 to cut this feedback. SW Exogenous CE Resources = 0 Units: Dmnl Put 0 to get normal allocation to CE, put I to get exogenous allocation to CE, where all the resources needed are given to CE, without interfering with the resources for current development' SW Higher Priority to Current = I Units: Dmnl Put I so that allocation of resources for development of different releases gives categorically higher priority to more eminent ones, put 0 for proportional distribution of resources between different releases. Time Since Release Ready[Release] = INTEG( "Release Ready?"[Release] , 0) Units: Month The number of month since each release is ready. Time to Adjust Resources = 10 Units: Month Time to adjust the resources to what is dictated by profitability concerns. Time to Patch = 5 Units: Month The desired time to write the patches for the outstanding defects Time to Perceive Profitability = 10 Units: Month Time constant for perceiving profitability. T1 Eff Profit on Resources ( [(-1,0)-(1,1.3)],(-1,0.5),(-0.425076,0.587281) ,(0.125382,0.695614),(0,0.8),(0.1, 1),(0.174312,1.06623),(0.29052,1.10614) ,(0.535168,1.15746),(1,1.2)) Units: Dmnl Table for effect of profitability on resources given to the project. Total Allctd Resources = Allctd Ttl Dev Resource + Alloctd Ttl CE Resource Units: Person Total resources allocated for development and current engineering. Total Customer Calls from Defects[Releasel = Calls from Defects With Patch[ Release] + Calls from Known Defects[Release] + Calls from Unknown Defects[ Release] Units: Defect*Site/Month Total customer calls coming from defects in the product 203 Total Desired Resources = Ttl Des Resource to CR + Ttl Des Resource to CE Units: Person The total development and CE resources requested Total Dev Resources = INTEG( ( Desired Total Dev Resources * Eff Profitability on Resources Allocated - Total Dev Resources ) / Time to Adjust Resources, Desired Total Dev Resources) Units: Person The resources allocated depend to the profitability of the product to some extent. Ttl Des Resource to CR = SUM ( Dsrd Dev Resources[Release!] ) Units: Person Total people requested for the development tasks at hand for the currently under development releases Ttl Des Resource to CE = SUM ( Desired Resources to Current Engineering[Release! ] ) * ( I SW Exogenous CE Resources) Units: Person Total people requested for the development tasks at hand for the current engineering of different releases. 204 O -_ 0 m a) a~ x Q X a x ZL p3 .°_ m en tn U r-4 m ._ LL LLM m co C 1") LL i U) Fixing Bugs ******************************** Bug Fix Approval[Release] = SUM ( Desired Bug Fixes for Next Release[Release! ] ) * "Ok to Plan?"[Release] / TIME STEP Units: Defect/Month The bug fixes planned are assumed to be decided upon at the OK to plan. We assume that we take bug fixes proportionately from all the different releases. Bug Fixes Planned[Releasel = INTEG( Bug Fix Approval[Release] , 0) Units: Defect Number of bug fixes planned in each release. Bugs to Fix in Future Releases[Releasel = INTEG( Increase in Known Bugs[Release ] - ZIDZ ( Bugs to Fix in Future Releases[Release] , SUM ( Bugs to Fix in Future Releases[ Release!] ) * SUM ( Bug Fix Approval[Release!] ) - Cancel Bug Fixes[Release], 0) Units: Defect This is parallel to stock of open MRs from all different releases. The current formulation does not reflect how quality of future releases for customers is increased by doing bug fixes in this release. That is usually one of the important motivations for having bug fixes in a release. Cancel Bug Fixes[Releasel = Bugs to Fix in Future Releases[Release] / Time to Cancel Bug Fixes *( 1 - ZIDZ ( Products Sold[Release] , SUM ( Products Sold[Release! ] ) ) ) Units: Defect/Month We cancel bug fixes based on the importance of the product in the market. If a product is selling well, we will put more emphasis on fixing its bugs, than when a release is not selling so well (in which case we start draining the bug fixes for that release). Code to Fix Bugs[Releasel = Bug Fix Approval[Release] / Defect per Patch * Modules Equivalent to Patch * "Multiplier Fixing work vs. Patch" Units: module/Month The number of modules of code needed to be written as a result of bug fixes planned is determined by looking at the number of patches and the adjusting the scope to account for the fact that bug fixes are more comprehensive and time consuming. Defect per Patch = 4 Units: Defect/Patch The number of defects that are taken care of in one typical patch Desired Bug Fixes for Next Release[Releasel = Bugs to Fix in Future Releases[ Release] * Fraction Bugs Fixed in Next Release Units: Defect The number of MRs planned to be fixed in the upcoming release. 206 Fraction Bugs Fixed in Next Release = 0.3 Units: Dmnl Fraction of bugs normally planned to be fixed in the next release. Increase in Known Bugs[Releasel = ZIDZ ( Defect Show up[Release] , Sites Installed[ Release] ) * SW Fixing Bugs Units: Defect/Month The increase in the number of bugs we know about in the field depends on the rate of bug reports from the field across all releases. Modules Equivalent to Patch = 20 Units: module/Patch A measure of how much work a patch entails, when compared to a typical module. "Multiplier Fixing work vs. Patch" = 2 Units: Dmnl Fixing the errors when done as part of a new release and included in the whole code is this much more resource consuming than fixing them as patches. Size of Next Bug Fix = SUM ( Desired Bug Fixes for Next Release[Release!] ) / Defect per Patch * Modules Equivalent to Patch * "Multiplier Fixing work vs. Patch" Units: module Size of the next bug fix in the upcoming planning of a release is determined by desired number of bug fixes. SW Fixing Bugs = I Units: Dmnl Put I to have bugs from old release become part of new release planning. Put 0 so that such effect does not exist, i.e. the old bugs are completely ignored. Time to Cancel Bug Fixes = 10 Units: Month The time frame for draining the stock of bugs to fix. 207 '1) in ( ) GD v ; ©a C) JI UC A= v V .( m a a 0 C3 A U,(D . L ' 2 C5 11 Li- 7) n , LL C (C) , 00 E 9a m 0 U .Z a. - - O 0 l a) im IC6 CY --t U, F (C C 2 ,X -1' ., iC C) , 11 CC) ,/) / (( m (1)I n -rC C- , ll f, L,.- + 7. C L) ;i, L F , o" 'p Architecture and Code Base Quality Accmltd Code Accepted = INTEG( Inc Underlying Code, 0) Units: module Accumulated code accepted. Accmltd Features Designed = INTEG( Inc Old Design, 0) Units: feature Accumulated number of features designed. Accumulated Defects in Code = INTEG( Inc Defect in Underlying Code - Code Fixes, 0) Units: Defect The total number of defected codes accumulated across all releases Accumulated Problems in Architecture and Design = INTEG( Inc Prblm in Old Design Architecture Redesign, 0) Units: Problem The accumulated problems in the architecture and design that influence the quality of future releases. Architecture Redesign = 0 Units: Problem/Month The refactoring of architecture can reduce the errors in it. Code Fixes = SUM ( Bug Fixes Planned[Release!] / TIME STEP * "Release Ready this Step?"[ Release!] ) / Fraction of Defects Important Units: Defect/Month The rate of fixing the code base as a result of incorporating MR fixes in the new release. Note that that no matter what fraction of defects we call important, we will transfer same amount from this stock. Density of Defect in Code = ZIDZ ( Accumulated Defects in Code, Accmltd Code Accepted ) Units: Defect/module The density of defects in the underlying legacy code. Density of Problems in Architecture = ZIDZ ( Accumulated Problems in Architecture and Design, Accmltd Features Designed) Units: Problem/feature The density of problems in the features already designed and used as building block of future code. Fixed Defects in Code = INTEG( Code Fixes, 0) Units: Defect accumulated number of fixed (former) defects in the code. 209 Inc Defect in Underlying Code = SUM ( ( Defects in Accepted Code[Release!] + Defects in Code Developed[Release!] + Defects in MRs[Release! ] ) * "Release Ready this Step?"[Release!] ) / TIME STEP Units: Defect/Month Defects in the code that is released become influential in the quality of upcoming releases, to the extent that they hinder good development. Inc Old Design = SUM ( Features Designed[Release!] / TIME STEP * "Release Ready this Step?"[Release!] ) Units: feature/Month The addition of legacy design Inc Prblm in Old Design = SUM ( Problems in Designed Features[Release!] / TIME STEP * "Release Ready this Step?"[Release!] ) Units: Problem/Month Rate of increase in the problems of the underlying design. Inc Underlying Code = SUM ( ( Code Accepted[Release!] + Code Developed[Release! ] + Code Waiting Rework[Release!] ) * "Release Ready this Step?"[ Release!] ) / TIME STEP Units: module/Month Rate of increase in the underlying code Relative Importance of Architecture Problems = 2 Units: Dmnl This parameter factors in the difference between one problem in a feature architecture, vs. one defect in a module of code written. The higher the parameter, the more important the weight of architecture is in this balance. Total Accumulated Defects = Accumulated Defects in Code Units: Defect Total number of accumulated defects in the underlying code. Weighted Error Density in Architecture and Code Base = ( Density of Defect in Code + Density of Problems in Architecture * Error per Design Problem in Features / Modules for a feature * Relative Importance of Architecture Problems ) / ( I + Relative Importance of Architecture Problems ) Units: Defect/module The weighted error density in architecture and code base shows the density of errors in old architecture and code base as it influences the quality of development in new releases. 210 a U w ¥ -1 I I II 11 tlI 11 ******************************** Scheduling ******************************* Accumulated Change in Release Date[ReleaseJ = INTEG( Chang in Release Date[ Release], 0) Units: Month The amount of change in the release date accumulated through time for different releases. Allocated Design Finish Date[Release] = Time + Allocated Design Time Left[Release ] Units: Month Allocated design finish date, used to determine the development finish date. Allocated Dev Finish Date[Releasel = Time + ( Allocated Design Finish Date[ Release] - Time ) * ( 1 - "Frctn Des-Dev Overlap" ) + Allocated Dev Time Left[ Release] Units: Month The development finish date allocated. Allocated Dev Time Left[Release = Estmtd Dev Time Left[Release,Current] * Prcnt Time Allocated[Release] Units: Month Allocated development time left. Allocated Release Date[Releasel = Allocated Dev Finish Date[Release] + Allocated Test Time Left[ Release] * ( - "Frctn Test-Dev Overlap") Units: Month Allocated release date, based on development, design, and testing phase conditions. Allocated Test Time Left]Releasel = Estmtd Test Time Left[Release] * Prcnt Time Allocated[ Release] Units: Month Allocated testing time left. Chang in Release Date[Releasej = ( Allocated Release Date[Release] - Sanctioned Release Date[ Release] ) / Time to Replan * if then else ( Sanctioned Release Date[ Release] = 0, 0, 1) * ( I - "Release Ready?"[Release] ) Units: Dmnl is the change rate in the release date, which accumulates to show the delay in one release. This Chng Prcvd Estmtd Code[Releasel = if then else ( "Release Started?"[Release ] = I :AND: Perceived Estimated Code to Write[Release] = 0, Estmtd Code to Write[ Release] / TIME STEP , ( Estmtd Code to Write[Release] - Perceived Estimated Code to Write[ Release] ) / Time to Perceive Current Status ) Units: module/Month 212 This equation ensures that the perceived estimated release date starts at estimated level at the beginning but then adjusts to it with some delay. Chng Status[Releasel =( 1 - "Release Ready?"[Release] ) * "Release Ready this Step?"[ Release] / TIME STEP Units: /Month The rate to change the status of a release into ready. Estimated Test Left[Releasel = Estmtd Tests to Be Designed[Release] + Tests to Run[ Release] Units: Test The number of tests to run. Estmtd Code to Write[Release] = Code to Develop[Release] + Features to Design[ Release] * Modules for a feature Units: module A rough estimate of the code waiting to be written comes from currently available code and the part to be designed based on the current specifications. Estmtd Design Time Left[Release] = ZIDZ ( Features to Design[Release] , Features Designed[ Release] + Features to Design[Release] ) * Normal Design Time Units: Month Estimated design time left based on how far in the design process we are. Estmtd Dev Time Left[Release,Productivity Base] = ZIDZ ( Expctd Code to Write After Being First[ Release,Productivity Base] , Expctd Resource Available to First Release New Dev * Prcvd Productivity[Release,Productivity Base]) Units: Month The development time left depends on the work left, the expected resources available and productivity of those resources. Estmtd Test Time Left[Releasel = Estimated Test Left[Release] / Testing Resources / Test Resource Productivity Units: Month Estimated testing time left. Estmtd Tests to Be Designed[Release = Features to Design[Release] * Test Coverage / Code Covered by Test * Modules for a feature Units: Test The number of tests to be designed. Expctd Code to Write After Being FirstlRelease,Productivity Base] = Max ( 0 , Perceived Estimated Code to Write[Release] - Expctd Code Written Before Being First[ Release,Productivity Base] ) Units: module The code that is expected to be written after becoming the main release depends on how much code is left to be written and how long we expect to stay in the shadow of other projects. 213 Expctd Code Written Before Being First[Release,Productivity Base] = Expctd Resource Available to New Dev[ Release] * Prcvd Productivity[Release,Productivity Base] * Time in Shadow[ Release] Units: module The expected progress during the time the release is in the shadow of more eminent ones. Expctd Resource Available to First Release New Dev = Total Dev Resources * Prcvd Frac Resource to New Dev * Prcvd Frac to Current Release * SUM ( "If Ri Nth Release?"[Release!,Rl] * Expected Fraction to This Release[ Release!] ) Units: Person The resources available for the first (highest priority/most eminent) release is calculated from looking at the current release with the highest priority. Expctd Resource Available to New Dev[Release] = Total Dev Resources * Prcvd Frac Resource to New Dev * Prcvd Frac to Current Release * Expected Fraction to This Release[ Release] Units: Person From total resources, only the fraction that is not going to CE, Rework, or other releases, can be expected to be assigned to this release. Expected Fraction to This Release[Release] = SUM ( Perceived Fraction to Nth Release[ NthRelease!] * "If Ri Nth Release?"[Release,NthRelease!] ) Units: Dmnl The expected fraction of resources going to each release depends on where in the ranking that release is perceived to be. Frac Resources to New Dev = XIDZ ( SUM ( Development[Release!] ), SUM ( Development[ Release!] + Rework[Release!] ), 0.9) Units: Dmnl Fraction of resources going to development Frac Resources to Nth Release[NthReleasel = SUM ( Fraction of Desired CR Allocated[ Release!] * "If Ri Nth Release?"[Release!,NthRelease] ) Units: Dmnl The fraction of desired resources going to the Nth active release. Alternatively we could look at the fraction of resources actually going, but that will come up with a too conservative estimate for what will be left for each project. Frac Time Cut = I Units: Dmnl [0,1.5] Fraction of time requested that is normally given to the team. Fraction of Resources to Current Release = XIDZ ( Allctd Ttl Dev Resource, Allctd Ttl Dev Resource + Alloctd Ttl CE Resource, Initial Prcvd Fraction to Current release) Units: Dmnl The fraction of resources allocated to current releases. 214 Fraction Total Tests Passed[Release] = ZIDZ ( Test Ran[Release] , Total Tests for This Release[ Release] ) Units: Dmnl Fraction Total Tests Passed. "Frctn Test-Dev Overlap" = 0.1 Units: Dmnl The fraction of test overlap with development is indeed not constant, but changing depending on where in the process we are. So when all the development is done, there is no overlap, while there is much more overlap at the beginning. The small fraction makes the constancy assumption OK. "If Ri Nth Release?" [Release,NthRelease] = if then else ( Sorted Vector of Releases[ NthRelease] + 1 = Release, 1, 0) * ( 1 - "Release Ready?"[Release ] ) * "Release Started?"[Release] Units: Dmnl This equation finds out if the each release is the Nthe active release. For example, if first release is GA then the second release will be 1st, and so on. Note that releases which have not started or have finished are not counted. Initial Prcvd Fraction to Current release = 0.9 Units: Dmnl Initially people come with the perception that only 10% of time need be spent on current engineering Negotiated Release Date[Release] = ( Reqstd Release Date[Release] - Time) * Frac Time Cut + Time Units: Month The release date given to the team, after negotiations. Normal Design Time = 2 Units: Month normal length of the design phase. Normal Threshold for Replan = 2 Units: Month Number Replans So Far[Releasel = ( SQRT ( ( 2 + Strength of Threshold Feedback) ^ 2 + 8 * Accumulated Change in Release Date[Release] / Normal Threshold for Replan ) - ( 2 + Strength of Threshold Feedback ) ) / ( 2 * Strength of Threshold Feedback ) - Modulo ( ( SQRT ( ( 2 + Strength of Threshold Feedback) A 2 + 8 * Accumulated Change in Release Date[Release ] / Normal Threshold for Replan) - ( 2 + Strength of Threshold Feedback ) / ( 2 * Strength of Threshold Feedback), 1) Units: Dmnl 215 Number of replans can be tracked based on the accumulated change in the release date and how the management changes the formal schedule based on that accumulated informal change. The formula comes from the assumption that threshold for changing release date is B*(l+a*n), where B is threshold for the first replan, a is the strength parameter and n is the replan numbers so far. By summing this over n=1..N we get to the total delay corresponding to N replans, and by finding N in terms of that summation, we can find the Time equivalent. This variable, however, is not changing anything at the current formulation, in the other words, it is assumed that official replans are not so significant in determining the pressure, rather, the perception of the work status and what will become the official replan in future are important. This makes the behavior smoother (since the discrete replans are removed) but the overall dynamics don't change. Perceived Estimated Code to Write[Release] = INTEG( Chng Prcvd Estmtd Code[ Release], Estmtd Code to Write[Release] ) Units: module Perceived code to be written based on current estimates. Perceived Fraction to Nth Release[NthReleasel = INTEG( ( Frac Resources to Nth Release[ NthRelease] - Perceived Fraction to Nth Release[NthRelease] ) / Time to Perceive Resource Availability, XIDZ ( 1, SUM ( I - "Release Ready?"[NthRelease!] ) * "Release Started?"[ NthRelease!]), 1)) Units: Dmnl The fraction of resources given to the Nth release is perceived as we go forward. The initial value is set to assume equal allocation between all active releases. Prcnt Time Allocated[Release] = XIDZ ( Negotiated Release Date[Release] - Time, Reqstd Release Date[Release] - Time, 1) Units: Dmnl Fraction of time requested, given to the development team. Prcvd Frac Resource to New Dev = INTEG( ( Frac Resources to New Dev - Prcvd Frac Resource to New Dev ) / Time to Perceive Resource Availability, 0.9) Units: D)mnl It takes time for people to learn what percentage of resources are really available for new development, vs. rework and current engineering. The perception is assumed to aggregate across multiple releases since people interact and see what happens across releases. At the beginning this perception is assumed to equal 0.9 Prcvd Frac to Current Release = INTEG( ( Fraction of Resources to Current Release - Prcvd Frac to Current Release ) / Time to Perceive Resource Availability, Initial Prcvd Fraction to Current release ) Units: Dmnl People adjust their perception about what fraction of work is given to current release Prcvd Productivity[Release,Currentl = INTEG( ( Productivity[Release] - Prcvd Productivity[ Release,Current] ) / Time to Perceive Productivity, Normal Productivity[ Release] ) 216 Prcvd Productivity[Release,Normal] = INTEG( 0 * ( Productivity[Release] - Prcvd Productivity[ Release,Current] ) / Time to Perceive Productivity, Normal Productivity[ Release] ) Units: module/(Month*Person) Perceived productivity slowly adjust to real productivity observed on the field. Project Readiness Threshold = 0.95 Units: Dmnl The pass rate necessary for release Rank Release[Releasel = ( Total Number of Releases - Release + 1) * ( I - "Release Ready?"[ Release] ) * "Release Started?"[Release] Units: Dmnl Gives the rank of each release: highest number for the first release, 0 for inactive ones. Release Date Planning[Release] = if then else ( Estmtd Dev Time Left[Release ,Current] > 0 :AND: Sanctioned Release Date[Release] = 0, Allocated Release Date[ Release] / TIME STEP, if then else ( Estmtd Dev Time Left[Release ,Current] > 0, 1, 0) * ( Allocated Release Date[Release] - Sanctioned Release Date[Release] ) / Time to Replan ) Units: Dmnl The formulation makes sure that we start with initial scheduled release date and then adjust it to the allocated one slowly. "Release Ready this Step?" [Release] = if then else ( Fraction Total Tests Passed[ Release] >= Project Readiness Threshold, 1, 0) * ( 1 - "Release Ready?"[ Release] ) Units: Dmnl The flag that becomes I at the time the release gets ready and then stays at 0. "Release Ready?"[Release] = INTEG( Chng Status[Release] , 0) Units: Dmnl The counter variable showing if the release is ready or not "Release Started?" [Release] = if then else ( Release Start Date[Release] > Time, 0, 1) Units: Dmnl Gives 1 if the release has started, otherwise 0. Reqstd Release Date[Release] = Time + Time Reqstd for Des Dev Test Left[Release ] Units: Month Requested release date by development organization. Sanctioned Release Date[Release] = INTEG( Release Date Planning[Release] , 0) Units: Month Sorted Vector of Releases[Release] = VECTOR SORT ORDER ( Rank Release[Release ], -1) Units: Dmnl This variable sorts and finds out the rank of each release among the active releases. So the first active release gets the rank 0 and all the finished releases get the rank 5 (the number of releases). 217 Strength of Threshold Feedback = 0.5 Units: Dmnl This is the power to which the number of replans are raised, in order to determine how larger the Threshold gets for replanning. A value of 1 means for the 2nd replan we have 2 times of normal Threshold and for the 3ed one 3 times, so on. A value of 0.5 suggests that for 2nd replan we have sqrt(2)=1.41 times longer Threshold etc. SW Effect Replan on Replan Threshold = I Units: Dmnl Put I to have the effect of number of replans on Threshold for a new replan. Put 0 for not having this feedback on. Time in Shadow[Later Releases] = Max ( 0, if then else ( "If Ri Nth Release?"[ Later Releases,RI] = 1, 0, VECTOR ELM MAP ( Sanctioned Release Date[ RI], Later Releases - 2) Time ) ) Time in Shadow[RI] = 0 Units: Month The time that a release is in the shadow of more important releases is as long as there is a more urgent release out there. Time Reqstd for Des Dev Test Left[Release] = Estmtd Design Time Left[Release ] * ( I - "Frctn Des-Dev Overlap" ) + Estmtd Dev Time Left[Release,Current ] + Estmtd Test Time Left[Release] * ( 1 - "Frctn Test-Dev Overlap") Units: Month The time required for the rest of the project depends on how much time testing, development, and design need all together. Time to Perceive Current Status = 2 Units: Month There is a delay in assessing and perceiving the current status of project. This delay is important in how fast we plan and react to the state of project. Time to Perceive Productivity = 10 Units: Month Time to Perceive Productivity Time to Perceive Resource Availability = 50 Units: Month Time to Perceive Resource Availability Total Number of Releases = ELMCOUNT(Release) Units: Dmnl The total number of releases modeled Total Tests for This Release[Releasej = Estmtd Tests to Be Designed[Release 218 ] + Total Tests to Cover[Release] Units: Test Total tests for the current release. Control: Simulation Control Parameters FINAL TIME = 100 Units: Month The final time for the simulation. INITIAL TIME = 0 Units: Month The initial time for the simulation. SAVEPER = 1 Units: Month The frequency with which output is stored. TIME STEP = 0.03125 Units: Month The time step for the simulation. 219 0 L A 0 LL ::i * u, <P , _ 0) cas U) a- X X . i5 C -ar C 40 (n LL t~~~~~~~C X _ , C1 a) .- ' 6~~~~~~~~ 1 (U LL. CO, C v~ _ Cu cuJ U) 0) F I co ~~~~~n ~ ~ a2 cr ~ F- 0d) 'L " ~g! ffi :3~~ euJ . ?' + r) N .>_ - Z z a f T :L 1 - ,Lrm (3 ~:A,' a) 0P Q) a- <A E Fe a) 'i, a) C) . CK - 71 1., C \. . I , M f , r -p·:x1 a Y Sales and Financial Acceptable Delay Level = 4 Units: Month The normal level of delay is put at 4 month, suggesting that if we have delays that long we get normal sales. Acceptable Features in Market = RAMP ( Feature Expectation Growth Rate, 20, FINAL TIME ) Units: feature This is the number of new features that need to be in a new release to make that release reasonable in the eyes of customer. Accumulated Features Designed[R1] = Features Designed[R1] Accumulated Features Designed[Later Releases] = VECTOR ELM MAP ( Accumulated Features Designed[ R1] , Later Releases - 2) + Features Designed[Later Releases] Units: feature This is the total number of features from the first release to this one. Average Delay from All[Release] = (Accumulated Change in Release Date[Release ] + Relative Weight of Old Releases * Average Delay from Ready Releases ) ( I + Relative Weight of Old Releases ) Units: Month The average delay is a weighted average of delay so far in the current release, and the delay observed in past releases. Average Delay from Ready Releases = ZIDZ ( SUM ( Accumulated Change in Release Date[ Release!] * "Release Ready?"[Release!] ), SUM ("Release Ready?"[ Release!] ) ) Units: Month Calculating the average of delay across all the ready projects. Cost of Development and Testing Resources = Salary of developer and testers * ( SUM ( Test Resources Allocated[Release!] ) + Total Allctd Resources) Units: $/Month Total Cost of Development and Testing Resources allocated to this project. It assumes that nonallocated resources are used elsewhere and therefore not billed to this product. Cost of Services and Maintenance = Cost per Customer Call * SUM ( Total Customer Calls from Defects[ Release!] ) Units: $/Month Total Cost of Services and Maintenance, assumed to be a function of the number customer calls. 221 Cost per Customer Call = 200 Units: $/(Defect*Site) The cost of each customer call arising from a defect, in terms of service labor etc. Eff Features on Sales[Releasel = TI Eff Features on Sales ( XIDZ ( Perceived Features Designed[ Release] , Acceptable Features in Market, 1) ) * SW Eff Features on Sales + I - SW Eff Features on Sales Units: Dmnl The effect of new features on sales depends on how many features we have vs. what is the acceptable level of features we could have. Eff Market Saturation on Sales = TI Eff Saturation on Sales ( XIDZ ( Market Size, SUM ( Indicated Sales[Release!] ), 10)) Units: Dmnl The effect of market saturation depends on how much sales we could potentially do, vs. what the market needs. Eff Quality and Services on Sales = TI Eff Quality and Services on Sales ( Normalized Quality ) * SW Eff Qual and Services on Sales + I - SW EffQual and Services on Sales Units: Dmnl Effect of quality and services on sales of the product. Eff Timeliness on Sales[Releasel = TI Eff Delay on Sales ( Normalized Delay[ Release] ) * SW Eff Delay on Sales + I - SW Eff Delay on Sales Units: D)mnl Effect of time-to-market on sales. Exogenous Products Sold = 10 Units: Site/Month Exogenous sales is used for model testing Feature Expectation Growth Rate = 30 Units: feature/Month The rate of growth in the number of features needed to remain competitive in the market place. Fraction Overhead Cost = 0.5 Units: Dmnl Overhead cost, as a fraction of R&D and Service labor costs. Indicated Sales[Release] = Product Sales Personnel * Normal Sales Productivity * Eff Features on Sales[Release] * Eff Quality and Services on Sales * EffTimeliness on Sales[Release] * This Release Selling[Release ] Units: Site/Month The indicated sales depends on number of sales person, their productivity and how that productivity depends on quality, timeliness, and features of the product. 222 Initial Perceived Quality = 1 Units: Dmnl The initial quality level (0 for very good quality, 1 for medium, and higher for poor) is what customers have in mind before observing the products real quality. Maintenance Fee = 1000 Units: $/(Month*Site) The maintenance fee charged. Market Size = 50 Units: Site/Month This size is defined in terms of number of sites per month in a steady state, not including any of the diffusion dynamics etc. Therefore is fairly simplistic. Net Revenue = Revenue - Total Cost Units: $/Month Net revenue of the product. Normal Bug Acceptable = 300 Units: Defect This is the avearge number of bugs in a medium release. Normal Sales Productivity = 2 Units: Site/(Month*Person) The productivity of sales personnel in selling this product. Normalized Delay[Release] = XIDZ (Acceptable Delay Level, Perceived Delay[ Release], 5) Units: Dmnl Delay level normalized by some acceptable delay. Normalized Quality = XIDZ ( Normal Bug Acceptable, Perceived Outstanding Defects, 5) Units: Dmnl Normalized quality of the product. Perceived Delay[Releasel = DELAY 1 (Average Delay from All[Release] , Time to Perceive Delays ) Units: Month Perceived delivery delay. Perceived Features Designed[Release] = INTEG( (Accumulated Features Designed[ Release] Perceived Features Designed[Release] ) / Time to Perceive Functionality, Accumulated Features Designed[Release]) Units: feature Perceived number of the features included in each release. 223 Perceived Outstanding Defects = INTEG( ( Average Outstanding Defects - Perceived Outstanding Defects ) / Time to Perceive Quality * "Release Ready?"[R ] , Initial Perceived Quality * Normal Bug Acceptable ) Units: Defect The initial perceived quality sets the expectations from the company before the release is out. Later on it is adjusted to observed quality, as soon as first release is out. Price = 250000 Units: $/Site The price of each installation of the software. Product Sales Personnel = 5 Units: Person The number of dedicated sales personnel for this product. Products Sold[Release] = Exogenous Products Sold * This Release Selling[Release ] * ( I - SW Endogenous Sales ) + SW Endogenous Sales * Sales Materialized[ Release] Units: Site/Month The number of products sold and sites installed in a month. Profitability = ZIDZ (Net Revenue, Revenue ) Units: I)mnl Profitability of the product line. Relative Weight of Old Releases = 0.5 Units: Dmnl The weight of old releases in comparison with the focal release, when determining the effect of delay on sales. Revenue = SUM ( Products Sold[Release!] * Price ) + Maintenance Fee * SUM ( Sites Installed[Release!] ) Units: $/Month The revenue earned from this product line. Salary of developer and testers = 6000 Units: $/(Month*Person) Normal salary of developer and testers Sales Bonus Fraction = 0.01 Units: Dmnl The fraction of sales that goes as sales bonus Sales Cost = Product Sales Personnel * Sales Personnel Cost + SUM ( Sales Materialized[ Release!] ) * Sales Bonus Fraction * Price Units: $/Month The cost of salesforce. Sales Materialized[Releasel = Indicated Sales[Release] * Eff Market Saturation on Sales 224 Units: Site/Month The real sales materialized not only depends on the potential level, but on market availability. Sales Personnel Cost = 50000 Units: $/(Month*Person) The fixed salary of sales personnel SW Eff Delay on Sales = I Units: Dmnl Put I to have effect of schedule delay on attractiveness, put 0 not to include that effect. SW Eff Features on Sales = I Units: Dmnl Put I to have the effect of features on sales, put 0 to eliminate that effect. SW Eff Qual and Services on Sales = I Units: Dmnl Put I to have effect of quality on sales, put 0 not to include that effect. SW Endogenous Sales = I Units: Dmnl Put 1 to have endogenous sales rate. Put 0 to get the fixed, exogenous rate. This Release Selling[Early Releases] = "Release Ready?"[Early Releases] * ( 1 - VECTOR ELM MAP ( "Release Ready?"[RI] , Early Releases)) This Release Selling[R6] = "Release Ready?"[R6] Units: Dmnl Only the latest release is selling at any time. Time to Perceive Delays = 5 Units: Month Time to perceive the delays by customers. Time to Perceive Functionality = 6 Units: Month Time to perceive functionality of the software. Time to Perceive Quality = 15 Units: Month Time to perceive the quality of the product. TI Eff Delay on Sales ( [(0,0)-(1,1.5)],(0,0),(0.195719,0.203947),(0.321101,0.447368) ,(0.455657,0.690789),(0.666667,0.888158),(1, 1),(2, 1.15),(4, 1.3)) Units: Dmnl Table for effect of delays on sales. 225 TI Eff Features on Sales ( [(0,0)-(2, 1.5)],(0,0),(0.311927,0.026315 8),(0.489297,0. 118421) ,(0.66055,0.355263),(0.795107,0.598684),(0.880734,0.815789),(1, 1) ,(1.24159,1.18421),(1.5,1.3),(2,1.4)) Units: Dmnl Table for effect of feature availability on sales. TI Eff Quality and Services on Sales ( [(0,0)-(4,2)],(0,0),(0.464832,0.587719) ,(0.685015,0.789474),(1,1),(2,1.3),(4,1.5)) Units: Dmnl Table for effect of quality on sales of the product. TI Eff Saturation on Sales ( [(0,0)-(3,1)],(0,0),(0.4,0.4),(0.706422,0.583333) ,(1,0.7),(1.45872,0.837719),(2.07339,0.93421 1),(3, 1)) Units: Dmnl There is a market saturation effect, i.e. sales can not grow more than a limit in the market. Total Cost = ( Cost of Development and Testing Resources + Cost of Services and Maintenance ) * ( 1 + FractionOverheadCost) + Sales Cost Units: $/Month Total R&D, Service, and salesforce costs of the project. 226 A QLL U) /E en A a) V cU) o a) T2 v U) V) C' < E m- +z 's 1 L La)U ctci4 5 U) .r r 2 ) 0 X D {)cLA AW 3 V-Xs Oa) U) 4) 2 v a) -a A 00 o E a) G) :3Q qj A A a) (n - U) _u a) -_ _ V fO v ******************************** Metrics ** ***************************** Accumulated Calls at Final Time = if then else ( Time <= FINAL TIME - TIME STEP, 0, Accumulated Customer Calls at Sampling Time / TIME STEP) Units: Defect*Site/Month Accumulated customer calls at final time for payoff function in optimization. Accumulated Customer Calls at Sampling Time = SAMPLE IF TRUE( Time <= Sampling Time :AND: Time + TIME STEP > Sampling Time, Accumulated Customer Calls, 0) Units: Defect*Site The accumulated number of customer calls at the time of the sampling. Accumulated Resources at Final Time = if then else (Time <= FINAL TIME - TIME STEP, 0, Accumulated Resources at Sampling Time / TIME STEP) Units: Person accumulation resources at final time for optimization purposes. Accumulated Resources at Sampling Time = SAMPLE IF TRUE( Time <= Sampling Time :AND: Time + TIME STEP > Sampling Time, Accumulated Resources So far, 0) Units: Month*Person Picks up the value of accumulated resources at the sampling time. Accumulated Resources So far = INTEG( Resource Accumulation, 0) Units: Month*Person Total number of people involved in development and testing of the product. Accumulated Revenue = INTEG( Revenue Inc, 0) Units: $ The accumulated revenue through time. Average Max Natural Sch Pressure = ZIDZ ( Max Natural Schedule Pressure, Time ) Units: Dmnl Average of the maximum schedule pressure among different releases, through time. Finish Time at Final Time = if then else (Time <= FINAL TIME - TIME STEP, 0, Finish Time of Last Release / TIME STEP ) Units: Dmnl Finish time of 6 releases at the end, divided by time step to make sure the payoff is equal to final time, in optimization control. Finish Time of Last Release = SAMPLE IF TRUE( "Release Ready?"[R6] > 0 :AND: Finish Time of Last Release = 0, Time, 0) Units: Month This picks up the finish time for the last release. 228 Max Natural Schedule Pressure = INTEG( Min ( 3, VMAX ( Natural Dev Schedule Pressure[ Release!] )) , 0) Units: Month The maximum of schedule pressure across different releases. Resource Accumulation = SUM ( Test Resources Allocated[Release!] ) + Total Allctd Resources Units: Person Sum of resources actively involved in the project at this time. Revenue Inc = Revenue Units: $/Month The increase rate in revenue. Sampling Time = if then else ( Finish Time of Last Release > 0, Min ( Finish Time of Last Release + Time to Wait Before Sampling, FINAL TIME ), FINAL TIME ) Units: Month Sampling time for observing value of important metrics. Time to Wait Before Sampling = 10 Units: Month There is no sampling in the first few months to avoid picking up things before project are under way. 229 +i N, em o 0 f O' a) a)C) 0 '-e; E ri, v v ~, C)c) a) O. 0 v aht , )+ I.-, -3 8 E E , C),W _- ' + < r 'T o a .> LL 0 0C14 0 0 en 4.. .. I- Testing Model Discrepancy Defect in Field Vs Accumulated = Total Accumulated Defects - Total Ave Defect / Fraction of Defects Important Units: Defect This test item tells if the total number of defects in the average field site equals the number we have on the accumulated error case. It is not necessarily 0 since when we sell the product, not 100% of code is passed, and therefore initial customers get fewer errors (just a few) while the later ones have more. Maximum of All Physical Stocks = Max ( VMAX ( Code Developed[Release!] ), Max ( VMAX ( Code to Develop[Release!] ), Max ( VMAX (Code Waiting Rework[ Release!]), Max ( VMAX ( Defects in Code Developed[ Release!] ), Max ( VMAX ( Defects in MRs[Release! ] ) , Max ( VMAX ( Defects Patched[Release! ] ) , Max ( VMAX ( Defects with Patch[ Release!] ) , Max ( VMAX ( Features Designed[ Release!] ) , Max ( VMAX ( Features to Design[ Release!] ) , Max ( Features Under Consideration, Max ( VMAX ( Known Defects in Field[ Release!] ), Max ( VMAX ( Sites Installed[ Release! ] ), Max ( VMAX ( Test Ran[ Release! ] ), Max ( VMAX ( Tests to Run[ Release! ] ), VMAX ( Unknown Defects in Field[ Release!])))))))))) ))))) Units Not Specified Maximum of all physical stocks, used only for robustness testing, and therefore does not have a unit. Minimum of all Physical Stocks = Min ( VMIN ( Code Developed[Release!] ), Min ( VMIN ( Code to Develop[Release!] ), Min ( VMIN ( Code Waiting Rework[ Release!] ), Min ( VMIN ( Defects in Code Developed[ Release!] ), Min ( VMIN ( Defects in MRs[Release! ] ), Min ( VMIN ( Defects Patched[Release! ] ) , Min ( VMIN ( Defects with Patch[ Release!] ) , Min ( VMIN ( Features Designed[ Release!] ) , Min ( VMIN ( Features to Design[ Release!] ) , Min ( Features Under Consideration, Min ( VMIN ( Known Defects in Field[ Release!] ) , Min ( VMIN ( Sites Installed[ Release! ] ), Min ( VMIN ( Test Ran[ Release! ] ), Min (VMIN ( Tests to Run[ Release! ] ), VMIN ( Unknown Defects in Field[ Release! ] ) )) Units Not Specified This test item checks if all the physical stocks remain above 0 or not. *************************** Subscripts Early Releases: (R1-R5) Later Releases: (R2-R6) NthRelease <-> Release Productivity Base: Current,Normal 231 ))) )) ) ) ) ) ) This subscript is to simplify equations written in parallel, but based on two different productivity levels, one is the current level of the productivity, and another is based on the normal level of productivity. Current productivity is what is really used in most of the planning of the work, while the normal productivity is what is driving how fast people work, how is their quality and so on. Release: RI,Later Releases Different releases for the same product SRelease <-> Release 232 Alphabetical list of model variables and their groupings: Variable Acceptable Delay Level Acceptable Features in Market Accmltd Code Accepted Accmltd Features Designed Accumulated Calls at Final Time Accumulated Change in Release Date Accumulated Customer Calls Accumulated Customer Calls at Sampling Time Accumulated Defects in Code Accumulated Features Designed Accumulated Problems in Architecture and Design Accumulated Resources at Final Time Accumulated Resources at Sampling Time Accumulated Resources So far Accumulated Revenue Accumulated Total Work Allctd Ttl Dev Resource Allocated Design Finish Date Allocated Design Time Left Allocated Dev Finish Date Allocated Dev Time Left Allocated Release Date Allocated Test Time Left Alloctd Ttl CE Resource Architecture Redesign Ave Def with Patch Ave Dfctsy in Accptd Code Ave Dfct in Rwrk Code Ave Dfct in Tstd Code Ave Known Def in Site Ave New Problem per Cancellation Ave Prblm in Designed Feature Ave Problem in Features to Design Ave Unknown Def in Site Average Defects Patched Average Delay from All Average Delay from Ready Releases Average Max Natural Sch Pressure Average Outstanding Defects Average Time in the Field Bug Fix Approval Bug Fixes Designed Bug Fixes Planned Bugs to Fix in Future Releases Calls from Defects With Patch Calls from Known Defects 233 Group/View Name Sales and Financial Sales and Financial Architecture and Code Base Quality Architecture and Code Base Quality Metrics Scheduling Patches and Current Engineering Metrics Architecture and Code Base Quality Sales and Financial Architecture and Code Base Quality Metrics Metrics Metrics Metrics Patches and Current Engineering Patches and Current Engineering Scheduling Development Scheduling Scheduling Scheduling Scheduling Patches and Current Engineering Architecture and Code Base Quality Defects in the Field Development Development Development Defects in the Field Concept and Design Concept and Design Concept and Design Defects in the Field Defects in the Field Sales and Financial Sales and Financial Metrics Defects in the Field Defects in the Field Fixing Bug s Development Fixing Bugs Fixing Bugs Defects in the Field Defects in the Field Calls from Unknown Defects Cancel Bug Fixes Chang in Release Date Chng Prcvd Estmtd Code Chng Start Date Chng Status Code Accepted Code Cancellation Code Covered by Test Code Developed Code Fixes Code Passing Test Code to Develop Code to Fix Bugs Code Waiting Rework Cost of Development and Testing Resources Cost of Services and Maintenance Cost per Customer Call CR Dev Resources Left After Allocation Current Dev Schedule Pressure Current Scope of Release Defect Covering by Patch Defect Intro in Coherence of Feature Set Defect not Patching Defect Patching Defect per Patch Defect Show up Defects Accepted Defects Fixed Defects found Defects from Dev Defects Generated Defects in Accepted Code Defects in Code Developed Defects in MRs Defects Patched Defects Retired Defects with Patch Defects with Patch Retired Demand for Test Resources Density of Defect in Code Density of Problems in Architecture Des Time Change Discrepancy Defect in Field Vs Accumulated Design Design Rate Design Time Left Desired Bug Fixes for Next Release Desired Development Rate Defects in the Field Fixing Bugs Scheduling Scheduling Concept and Design Scheduling Development Development Development Development Architecture and Code Base Quality Development Development Fixing Bugs Development Sales and Financial Sales and Financial Sales and Financial Patches and Current Engineering Development Concept and Design Defects in the Field Concept and Design Defects in the Field Defects in the Field Fixin Bugs Defects in the Field Development Development Development Development Development Development Development Development Defects in the Field Defects in the Field Defects in the Field Defects in the Field Test Architecture and Code Base Quality Architecture and Code Base Quality Concept and Design Testing Model Concept and Design Development Concept and Design Fixing Bugs Development 234 Desired Patch Rate Desired Resources to Current Engineering Desired Total Dev Resources Dev Reschedule Dev Resources to New Dev Dev Resources to Rework Development Development and CE Rate Development Resource to Current Engineering Development Resources to Current Release Development Time Left Dfct Dnsty in Accepted Code Dfct Dnsty in Rejctd Code Dfct Dnsty in Untstd Code Dfct from Rwrk Dfcts Left in RWrk Dsrd Dev Resources Dsrd New Dev Resources Dsrd Rwrk Rate Dsrd Rwrk Resources Dsrd Time to Do Rwrk Early Releases Eff Archtcr Complexity on Productivity Eff Bug Fixing on Scope of New Release Eff Features on Sales Eff Market Saturation on Sales Eff Old Archtctr on Design Error Eff Old Code & Archtctr on Error Eff Pressure on Error Rate Eff Pressure on Productivity Eff Profitability on Resources Allocated Eff Quality and Services on Sales Eff Timeliness on Sales Endogenous Release Start Date Error per Design Problem in Features Error Rate in Feature Design Error Rate in Late Features Error Rate in Original Feature Selection Error Rate New Dev Errors from Coding Errors from Design and Architecture Estimated Test Left Estmtd Code to Write Estmtd Design Time Left Estmtd Dev Time Left Estmtd Test Time Left Estmtd Tests to Be Designed Exogenous Products Sold Exogenous Release Start Date Patches and Current Engineering Patches and Current Engineering Patches and Current Engineering Development Development Development Development Patches and Current Engineering Patches and Current Engineering Patches and Current Engineering Development Development Development Development Development Development Development Development Development Development Development Subscript Development Concept and Design Sales and Financial Sales and Financial Concept and Design Development Development Development Patches and Current Engineering Sales and Financial Sales and Financial Concept and Design Development Concept and Design Concept and Design Concept and Design Development Development Development Scheduling Scheduling Scheduling Scheduling Scheduling Scheduling Sales and Financial Concept and Design 235 Expctd Code to Write After Being First Expctd Code Written Before Being First Expctd Resource Available to First Release New Dev Expctd Resource Available to New Dev Expected Fraction to This Release Feasible New Dev Rate Feasible Rwrk Rate Feature Cancellation Feature Expectation Growth Rate Feature Requests Features Approved Features Designed Features to Design Features Under Consideration FINAL TIME Finish Time at Final Time Finish Time of Last Release Fixed Defects in Code Frac CE Allctd Frac CR Allctd Frac Errors Found in Rwrk Frac Resources to New Dev Frac Resources to Nth Release Frac Time Cut Frac Total Tst Feasible Fraction Bugs Fixed in Next Release Fraction Code Accepted Fraction code Perceived Developed Fraction Code Written Fraction Defect Unknown Fraction of Calls Indicating New Defects Fraction of Defects Important Fraction of Desired CR Allocated Fraction of Resources to Current Release Fraction of sites Installing All patches for a Call Fraction Overhead Cost Fraction Patches Installed At Sales Fraction Sites Installing Fraction Total Tests Passed Frctn Code Cancellation Frctn Des-Dev Overlap Frctn Test-Dev Overlap Frctn Tst Running Honey-Moon Period If Ri Nth Release? Inc Customer Calls Inc Defect in Underlying Code Inc Old Design Inc Prblm in Old Design 236 Scheduling Scheduling Scheduling Scheduling Scheduling Development Development Concept and Design Sales and Financial Concept and Design Concept and Design Concept and Design Concept and Design Concept and Design Control Metrics Metrics Architecture and Code Base Quality Patches and Current Engineering Patches and Current Engineering Development Scheduling Scheduling Scheduling Test Fixing Bugs Test Concept and Design Test Defects in the Field Patches and Current Engineering Defects in the Field Patches and Current Engineering Scheduling Defects in the Field Sales and Financial Defects in the Field Defects in the Field Scheduling Development Development Scheduling Test Patches and Current Engineering Scheduling Patches and Current Engineering Architecture and Code Base Quality Architecture and Code Base Quality Architecture and Code Base Quality Inc Underlying Code Increase in Known Bugs Indicated Sales Initial Features Under Consideration Initial Perceived Quality Initial Prcvd Fraction to Current release INITIAL TIME Installing Known Patches Known Defects in Field Known Defects Retired Known Defects Sold Late Feature Error Coefficient Late Feature Introduction Late Feature Problems Late Features Late Intro Time Later Releases Maintenance Fee Market Size Max Natural Schedule Pressure Maximum Cancellation Fraction Maximum of All Physical Stocks Min Des Time Min Dev Time Min Tst Time Minimum Code Size With an Effect Minimum of all Physical Stocks Modules Equivalent to Patch Modules for a feature Multiplier Fixing work vs. Patch Natural Dev Schedule Pressure Negotiated Release Date Net Revenue New Error Rate in Rwrk Normal Bug Acceptable Normal Design Time Normal Error Rate in Feature Design Normal Error Rate in Rwrk Normal Errors from Coding Normal Productivity Normal Sales Productivity Normal Scope of New Release Normal Threshold for Replan Normalized Delay Normalized Quality NthRelease Number Replans So Far Ok to Plan? Patch Developed Architecture and Code Base Quality Fixing Bugs Sales and Financial Concept and Design Sales and Financial Scheduling Control Defects in the Field Defects in the Field Defects in the Field Defects in the Field Concept and Design Concept and Design Concept and Design Concept and Design Concept and Design Subscript Sales and Financial Sales and Financial Metrics Development Testing Model Concept and Design Development Test Development Testin Model Fixing Bugs Development Fixing Bugs Development Scheduling Sales and Financial Development Sales and Financial Scheduling Concept and Design Development Development Development Sales and Financial Concept and Design Scheduling Sales and Financial Sales and Financial Subscript Scheduling Concept and Design Patches and Current Engineering 237 Patch Development Patch to Develop Patched Defects Sold Patches Retired Perceived Delay Perceived Estimated Code to Write Perceived Features Designed Perceived Fraction to Nth Release Perceived Outstanding Defects Perceived Profitability Perceived Total Code Prcnt Time Allocated Prcvd Frac Resource to New Dev Prcvd Frac to Current Release Prcvd Productivity Price Problem Cancellation Problems from Cancellation Problems in Approved Features Problems in Designed Features Problems Introduced in Design Problems Moved to Design Problems Per Feature Interaction Product Sales Personnel Productivity Productivity Base Productivity in Writing Patch Products Retired Products Sold Profitability Project Readiness Threshold Rank Release Relative Importance of Architecture Problems Relative Size of Expected Bug Fixing Relative Weight of Old Releases Release Release Date Planning Release Ready this Step? Release Ready? Release Start Date Release Started? Reqstd Release Date Resource Accumulation Retest Required Revenue Revenue Inc Rework Salary of developer and testers Sales Bonus Fraction Patches and Current Engineering Patches and Current Engineering Defects in the Field Defects in the Field Sales and Financial Scheduling Sales and Financial Scheduling Sales and Financial Patches and Current Engineering Concept and Design Scheduling Scheduling Scheduling Scheduling Sales and Financial Concept and Design Concept and Design Concept and Design Concept and Design Concept and Design Concept and Design Concept and Design Sales and Financial Development Subscript Patches and Current Engineering Defects in the Field Sales and Financial Sales and Financial Scheduling Scheduling Architecture and Code Base Quality Concept and Design Sales and Financial Subscript Scheduling Scheduling Scheduling Concept and Design Scheduling Scheduling Metrics Test Sales and Financial Metrics Development Sales and Financial Sales and Financial 238 Sales Cost Sales Materialized Sales Personnel Cost Sampling Time Sanctioned Code Complete Sanctioned Design Complete Sanctioned Release Date SAVEPER Scope of New Release Sites Installed Sites Reporting One Defect Size of Next Bug Fix Sorted Vector of Releases SRelease Strength of Complexity on Productivity Effect Strength of Threshold Feedback SW Eff Archtcr Complexity on Productivity SW Eff Bug Fix on Scope SW Eff Delay on Sales SW Eff Features on Sales SW Eff Old Arch on Design Problems SW Eff Old Code and Archtctr on Error SW Eff Pressure on Error SW Eff Pressure on Productivity SW Eff Proft on Resources SW EffQual and Services on Sales SW Effect Replan on Replan Threshold SW Endogenous Start SW Endogenous Sales SW Exogenous CE Resources SW Fixi BuFixing SW Higher Priority to Current Test Acceptance Test Coverage Test Design Test Feasible Test Ran Test Rate Test Rejection Test Resource Productivity Test Resources Allocated Testing Resources Tests to Run This Release Selling Threshold for New Project Start Time in Shadow Time Reqstd for Des Dev Test Left Time Since Release Ready TIME STEP Sales and Financial Sales and Financial Sales and Financial Metrics Development Concept and Design Scheduling Control Concept and Design Defects in the Field Defects in the Field Fixing Bugs Scheduling Subscript Development Scheduling Development Concept and Design Sales and Financial Sales and Financial Concept and Design Development Development Development Patches and Current Engineering Sales and Financial Scheduling Concept and Design Sales and Financial Patches and Current Engineering Bugs Patches and Current Engineering Development Test Test Test Test Test Development Test Test Test Test Sales and Financial Concept and Design Scheduling Scheduling Patches and Current Engineering Concept and Design 239 Time to Adjust Resources Time to Cancel Bug Fixes Time to Patch Time to Perceive Current Status Time to Perceive Delays Time to Perceive Functionali ty Time to Perceive Productivity Time to Perceive Profitability Time to Perceive Quality Time to Perceive Resource Availability Time to Replan Time to Show up for Def with Patch Time to Show Up for Defects Time to Wait Before Sampling Tl Eff Bug Fix on Scope TI Eff Delay on Sales Tl Eff Features on Sales Tl Eff Old Arch on Design Problems TI Eff Old Code & Archtctr on Error TI Eff Pressure on Error Rate TI Eff Pressure on Productivity TI Eff Profit on Resources TI Eff Quality and Services on Sales Tl Eff Saturation on Sales TI Eff Schdul Prssr on Cancellation Tl Tst Dependency on Dev Total Accumulated Defects Total Allctd Resources Total Ave Defect Total Code Total Cost Total Customer Calls from Defects Total Defects in Field Total Defects Introduced Total Desired Resources Total Dev Resources Total Development Total Number of Releases Total Tests for This Release Total Tests to Cover Totl Code to Tst Totl Code Tstd Tst Effectiveness Tst Rate Feasible Tst Rate from Resources Tst Rjctn Frac Tstd Code Accptd Ttl Des Resource to CR Ttl Des Resource to CE Patches and Current Engineering Fixing Bugs Patches and Current Engineering Scheduling Sales and Financial Sales and Financial Scheduling Patches and Current Engineering Sales and Financial Scheduling Concept and Design Defects in the Field Defects in the Field Metrics Concept and Design Sales and Financial Sales and Financial Concept and Design Development Development Development Patches and Current Engineering Sales and Financial Sales and Financial Development Test Architecture and Code Base Quality Patches and Current Engineering Defects in the Field Test Sales and Financial Patches and Current Engineering Defects in the Field Defects in the Field Patches and Current Engineering Patches and Current Engineering Development Scheduling Scheduling Test Development Development Development Test Test Development Development Patches and Current Engineering Patches and Current Engineering 240 Unknown Defects in Field Unknown Defects Introduced Unpatched Defects Sold with Patch Weighted Error Density in Architecture and Code Base 241 Defects in the Field Defects in the Field Defects in the Field Architecture and Code Base Quality