A FORMAL MODEL FOR ASSESSING SOFTWARE ARCHITECTURE AND PREDICTING COORDINATION REQUIREMENTS Yuanfang Cai, Sunny Wong, Kanwarpreet Sethi, Yuan Duan ARCHITECTURE, PROCESS, COORDINATION Team Organization Task Assignment Change Management … Organization Architecture Process FUNDAMENTAL DESIGN THEORIES Information Hiding [Parnas 1972] Design Rule Theory [Baldwin and Clark 2000] Can we make theories operable? DO WE HAVE TO DIG INTO SOURCE CODE? Empirical Studies APACHE TOMCAT VS. A PROPRIETARY SOFTWARE [WICSA 2008] DO WE HAVE TO DIG INTO SOURCE CODE? Making decisions Which language paradigm is better: OO or AO? What else are going to change if one part changes? How to assign tasks to maximize concurrency in large-scale, globally distributed projects? Can we predict before coding? FUNDAMENTAL QUESTIONS What is a “Module”? What does it mean by “Dependence”? Syntax Dependency? Logic Dependency? Are these dependency sufficient for prediction? What is the basic unit of dependency? OUR GOAL A Formal Model and Theory Description Prediction What’s going to happen if the requirement changes? Prescription Why some architectures are more adaptive than others? What’s the best way to accommodate a change? Shall we refactor? Bridges Architecture with Process, Organizational Structure, and Economic Analysis RESEARCH IN MY LAB A formal model: Augmented Constraint Network Model decisions as first-class members Model assumption relations as logical constraints A formal definition of Pair-wise Dependency The automatic generation of Design Structure Matrix AUGMENTED CONSTRAINT NETWORK 1. Constraint Network DesignSpace matrix{ client:{dense, sparse}; ds:{list_ds, array_ds, other_ds}; alg:{array_alg, list_alg, other_alg}; ds = array_ds => client = dense; ds = list_ds => client = sparse; alg = array_alg => ds = array_ds; alg = list_alg => ds = list_ds; } 2. Dominance Relation {(ds, client), (alg, client)} 3. Clustering Environment Cluster: {client} Design Cluster: {ds, alg} Precise Definition of Pair-wise Dependence And DSM Derivation client = sparse client = dense ds = array_ds alg = array_alg client = sparse S6 ds = list_ds alg = list_alg S1 alg = other_alg client = sparse ds = other_ds client = sparse S2 client = sparse ds = other_ds alg = other_alg client = dense S5 ds = array_ds alg = other_alg S4 S3 client = dense ds = other_ds alg = other_alg client = sparse ds = list_ds alg = other_alg 1 1.client 2.ds 3.alg 2 3 . x x . . x x CHALLENGES How to make this formal model scalable? Divide and Conquer [ASE 2006] Binary ACN (BACN, Sunny Wong) How to derive “Decisions”? Transform UML Class Diagram to ACNs (Sunny Wong) Transform UML Component Diagram to ACNs (KP Sethi) DIVIDE AND CONQUER TRANSFORMING UML TO ACN ACN-BASED ASSUMPTION DEPENDENCIES A lot more than pure syntactical dependencies Apache Ant Lattix dependencies: 829 ACN dependencies: 2929 Maze Game: Lattix Dependencies: 34 ACN Dependencies: 71 Much fewer than transitive closure. Do these extra dependencies produce better prediction? WHAT WE CAN DO SO FAR Suggest Task Assignments to Maximize Parallelism New Stability and Modularity Metrics Design Rule Hierarchy Decision Volatility Metrics Concern Diffusion Metrics Independence Level Metrics Predict Change impact DESIGN RULE HIERARCHY: HOW TO ASSIGN TASKS TO MAXIMIZE CURRENCY DESIGN RULE HIERARCHY APACHE ANT CASE STUDY The Architecture Version 1.6.5, 1000 variables and 4000 constraints, 13000 dependences in DSM Derived 500 classes and interfaces (including inner classes) 640 modules 11 layers The Coordination Same Layer Different Module Same Layer Same Module Different Layer Dependent Module METRICS: STABILITY AND MODULARITY MODULARITY AND STABILITY METRICS Which architecture will generate more options? Independence Level Metrics Which system/part of the systems is most unstable? Decision Volatility Metrics Design Volatility Metrics How concerns are separated? Concern Diffusion Metrics MODULARITY METRIC: INDEPENDENCY LEVEL CASE STUDY: 8 VERSIONS OF A PRODUCT LINE CASE STUDY: 8 VERSIONS OF A PRODUCT LINE DOES ADDING ONE COMPONENT MEANS ADDING ONE MODULE? CONCLUSION We reached highly consistent conclusions with source-code analysis Several source code level analysis results are less accurate It is possible to assess stability and modularity from architecture level. PREDICTING CHANGE IMPACT STATE-OF-THE ART Prediction from History What if : The project is relatively new The version history does not exist The system is refactored ACN-BASED PREDICTION Pure ACN Prediction The more subsystem involved, the more likely to be affected The higher the level in the hierarchy, the less likely to be affected The distance also matters. Hybrid Prediction Combining ACN prediction and Version History A CASE STUDY –HADOOP A CASE STUDY –HADOOP A CASE STUDY –HADOOP DESIGN ISSUES DISCOVERED Is it ok if design rules are constantly violated? We found: “Modification task #51, in version 0.1.0, describes changing the DistributedFileSystem class but not only is its parent class FileSystem impacted, another child of the FileSystem (LocalFileSystem) is also impacted. The FileSystem class is changed 47% of the time DistributedFileSystem is changed and the LocalFileSystem class is changed 37% of the time DistributedFileSystem is changed, yet there are no syntactic dependencies between DistributedFileSystem and LocalFileSystem. “ DistributedFileSystem deprecated in v.19.0 DESIGN ISSUES DISCOVERED “Modification task #1127 in version 12 is titled “Speculative execution and output of Reduce tasks” and it describes a change to the ReduceTask class (and only this class). When we examine the solution for this modification task, it also includes changes to the Task class, which is the parent class of the ReduceTask class. In fact, the Task class is one of the classes most often changed with the Reduce-Task class; by release 0.14.0, the the Task class is changed in the same transaction as the ReduceTask class nearly 40% of the time.” Task class is also refactored in v.19.0 CONCLUSION The ACN/Hybrid Approach works better in early versions. The ACN approach helps identify refactoring candidates. Hybrid Approach generates reliable predictions FUTURE WORK Application to on-going projects Linking decisions with decision-makers to predict coordination needs. Extending change impact analysis to coordination change impact analysis. Linking formal mode with economic analysis.