A Formal Model for Assessing Software Architecture and Predicting

advertisement
A FORMAL MODEL FOR
ASSESSING SOFTWARE ARCHITECTURE
AND PREDICTING COORDINATION
REQUIREMENTS
Yuanfang Cai,
Sunny Wong,
Kanwarpreet Sethi,
Yuan Duan
ARCHITECTURE, PROCESS, COORDINATION
Team Organization
 Task Assignment
 Change Management
…

Organization
Architecture
Process
FUNDAMENTAL DESIGN THEORIES

Information Hiding [Parnas 1972]

Design Rule Theory [Baldwin and Clark 2000]
Can we make theories operable?
DO WE HAVE TO DIG INTO SOURCE CODE?

Empirical Studies
APACHE TOMCAT VS. A PROPRIETARY SOFTWARE [WICSA 2008]
DO WE HAVE TO DIG INTO SOURCE CODE?
 Making
decisions
Which language paradigm is better: OO or AO?
 What else are going to change if one part
changes?
 How to assign tasks to maximize concurrency in
large-scale, globally distributed projects?

Can we predict before coding?
FUNDAMENTAL QUESTIONS

What is a “Module”?

What does it mean by “Dependence”?




Syntax Dependency?
Logic Dependency?
Are these dependency sufficient for prediction?
What is the basic unit of dependency?
OUR GOAL
A Formal Model and Theory

Description


Prediction


What’s going to happen if the requirement changes?
Prescription


Why some architectures are more adaptive than others?
What’s the best way to accommodate a change? Shall we
refactor?
Bridges Architecture with Process, Organizational
Structure, and Economic Analysis
RESEARCH IN MY LAB

A formal model: Augmented Constraint Network

Model decisions as first-class members

Model assumption relations as logical constraints

A formal definition of Pair-wise Dependency

The automatic generation of Design Structure
Matrix
AUGMENTED CONSTRAINT NETWORK
1. Constraint Network
DesignSpace matrix{
client:{dense, sparse};
ds:{list_ds, array_ds, other_ds};
alg:{array_alg, list_alg, other_alg};
ds = array_ds => client = dense;
ds = list_ds => client = sparse;
alg = array_alg => ds = array_ds;
alg = list_alg => ds = list_ds;
}
2. Dominance Relation
{(ds, client), (alg, client)}
3. Clustering
Environment Cluster: {client}
Design Cluster: {ds, alg}
Precise Definition of Pair-wise Dependence
And DSM Derivation
client = sparse
client = dense
ds = array_ds
alg = array_alg
client = sparse
S6 ds = list_ds
alg = list_alg
S1
alg = other_alg
client = sparse ds = other_ds
client = sparse
S2
client = sparse
ds = other_ds
alg = other_alg
client = dense
S5 ds = array_ds
alg = other_alg
S4
S3
client = dense
ds = other_ds
alg = other_alg
client = sparse
ds = list_ds
alg = other_alg
1
1.client
2.ds
3.alg
2
3
.
x
x
.
.
x
x
CHALLENGES


How to make this formal model scalable?

Divide and Conquer [ASE 2006]

Binary ACN (BACN, Sunny Wong)
How to derive “Decisions”?
Transform UML Class Diagram to ACNs (Sunny Wong)
 Transform UML Component Diagram to ACNs (KP Sethi)

DIVIDE AND CONQUER
TRANSFORMING UML TO ACN
ACN-BASED ASSUMPTION DEPENDENCIES

A lot more than pure syntactical dependencies
 Apache Ant
Lattix dependencies: 829
 ACN dependencies: 2929


Maze Game:
Lattix Dependencies: 34
 ACN Dependencies: 71


Much fewer than transitive closure.
Do these extra dependencies produce better
prediction?
WHAT WE CAN DO SO FAR

Suggest Task Assignments to Maximize Parallelism


New Stability and Modularity Metrics




Design Rule Hierarchy
Decision Volatility Metrics
Concern Diffusion Metrics
Independence Level Metrics
Predict Change impact
DESIGN RULE HIERARCHY:
HOW TO ASSIGN TASKS TO MAXIMIZE CURRENCY
DESIGN RULE HIERARCHY
APACHE ANT CASE STUDY

The Architecture
Version 1.6.5, 1000 variables and 4000 constraints,
13000 dependences in DSM
 Derived 500 classes and interfaces (including inner
classes)
 640 modules
 11 layers


The Coordination
Same Layer Different Module
 Same Layer Same Module
 Different Layer Dependent Module

METRICS: STABILITY AND MODULARITY
MODULARITY AND STABILITY METRICS

Which architecture will generate more options?


Independence Level Metrics
Which system/part of the systems is most unstable?
Decision Volatility Metrics
 Design Volatility Metrics


How concerns are separated?

Concern Diffusion Metrics
MODULARITY METRIC: INDEPENDENCY
LEVEL
CASE STUDY:
8 VERSIONS OF A PRODUCT LINE
CASE STUDY:
8 VERSIONS OF A PRODUCT LINE
DOES ADDING ONE COMPONENT MEANS
ADDING ONE MODULE?
CONCLUSION



We reached highly consistent conclusions with
source-code analysis
Several source code level analysis results are less
accurate
It is possible to assess stability and modularity
from architecture level.
PREDICTING CHANGE IMPACT
STATE-OF-THE ART

Prediction from History

What if :

The project is relatively new

The version history does not exist

The system is refactored
ACN-BASED PREDICTION

Pure ACN Prediction
The more subsystem involved, the more likely to be
affected
 The higher the level in the hierarchy, the less likely
to be affected
 The distance also matters.


Hybrid Prediction

Combining ACN prediction and Version History
A CASE STUDY –HADOOP
A CASE STUDY –HADOOP
A CASE STUDY –HADOOP
DESIGN ISSUES DISCOVERED

Is it ok if design rules are constantly violated?

We found: “Modification task #51, in version 0.1.0,
describes changing the DistributedFileSystem class but not
only is its parent class FileSystem impacted, another child
of the FileSystem (LocalFileSystem) is also impacted. The
FileSystem class is changed 47% of the time
DistributedFileSystem is changed and the LocalFileSystem
class is changed 37% of the time DistributedFileSystem is
changed, yet there are no syntactic dependencies between
DistributedFileSystem and LocalFileSystem. “

DistributedFileSystem deprecated in v.19.0
DESIGN ISSUES DISCOVERED
 “Modification task #1127 in version 12 is titled
“Speculative execution and output of Reduce tasks” and it
describes a change to the ReduceTask class (and only this
class). When we examine the solution for this modification
task, it also includes changes to the Task class, which is
the parent class of the ReduceTask class. In fact, the Task
class is one of the classes most often changed with the
Reduce-Task class; by release 0.14.0, the the Task class is
changed in the same transaction as the ReduceTask class
nearly 40% of the time.”

Task class is also refactored in v.19.0
CONCLUSION



The ACN/Hybrid Approach works better in early
versions.
The ACN approach helps identify refactoring
candidates.
Hybrid Approach generates reliable predictions
FUTURE WORK




Application to on-going projects
Linking decisions with decision-makers to predict
coordination needs.
Extending change impact analysis to
coordination change impact analysis.
Linking formal mode with economic analysis.
Download