Tradespace Exploration in the Cloud: Incorporating Cloud Technologies into
IVTea Suite
by
Aaron L. Prindle
S.B., Massachusetts Institute of Technology (2013)
Submitted to the Department of Electrical Engineering and Computer Science
in partial fulfillment of the requirements for the degree of
ARCHMES
Master of Engineering in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
AU3 2 02015
June 2015
@ 2015 Massachusetts Institute of Technology. All rights reserved.
L IR
Signature redacted
..............................
Author...........
Department of Electrical Engineering and Computer Science
redacted.....................
Certified by..........
22, 2015
Adam M. Ross
Research Scientist, Engineering Systems
Lead Research Scientist, Systems EngineeringAvancment Research Initiative
Certified by ..........................
Signature redacted
Thesis Supervisor
Donna H. Rhodes
Principal Resem h Scientist and Senior Lecturer, Engineering Systems
Direct , ysms Engineering Advancement Research Initiative
Accepted
by..Signature
........................... Thesis Co-Advisor
Prof. Albert R. Meyer
Chairman, Masters of Engineering Thesis Committee
Thesis Reader
E
2
Tradespace Exploration in the Cloud: Incorporating Cloud Technologies into
IVTea Suite
by
Aaron L. Prindle
S.B., Massachusetts Institute of Technology (2013)
Submitted to the Department of Electrical Engineering and Computer Science
on May 22, 2015 in partial fulfillment of the
requirements for the degree of
Master of Engineering in Electrical Engineering and Computer Science
Abstract
IVTea Suite is a tradespace exploration and analysis tool designed to allow users to gain
insights into potential designs for large scale systems, and enables the analysis of tradeoffs,
both static and dynamic, inherent in the selection of particular designs from amongst many
possibilities. IVTea Suite's current architecture limits its ability to operate on large datasets, as
well as prevents it from calculating important computationally complex lifecycle metrics
needed to select value sustaining designs. This thesis analyses the current state of cloud
technologies and provides solutions on how IVTea Suite can overcome its current
architectural limitations. As a demonstration of potential new capabilities, the multi-era
affordability with change paths problem, previously not solvable, is addressed using Markov
decision processes and cloud technology. Additionally, this work describes a cloud framework
that can be used in the future, which provides the potential ability to solve the multi-arc
change paths problem for datasets previously too large to evaluate.
Thesis Supervisor: Adam M. Ross
Title: Research Scientist, Engineering Systems, Systems Engineering Advancement Research
Initiative
3
4
Acknowledgements
This work would not have been possible without the help of many people. I would like to thank the
leadership of the Systems Engineering Advancement Research Initiative (SEAri) for giving me the
incredible opportunity to do this research project. Thank you so much Dr. Adam Ross and Dr. Donna
Rhodes, your guidance and feedback was instrumental in shaping the ideas in this thesis. A huge
thanks to Michael Curry for aiding in my early design decisions as well as the numerous meaningful
contributions during brainstorming sessions. Thank you Matt Fitzgerald who consistently provided
insight into tradespace exploration and tolerated my numerous questions. Additional thanks to all
SEAri students; past, present and future. Only through learning from and building off of the work of
these students was I able to accomplish this thesis. Finally, I thank my parents, Diane Prindle and Don
Prindle, and brother, Tim Prindle. I am eternally grateful for all of the love and support you have given
me.
5
6
Table of Contents
A bstract ....................................................................................................................................................
3
Acknow ledgem e nts ...................................................................................................................................
5
Table of Contents.......................................................................................................................................
7
List of Figures.............................................................................................................................................9
List of Tables ............................................................................................................................................
11
Chapter 1: Introduction and Background ............................................................................................
13
1.1 The Challenge ................................................................................................................................
1.1.1 Current Problem s w ith Designing Large Scale System .....................................................
13
13
1.1.2 Considering Tim e, Change, and Ilities in Large Scale System s........................................... 13
1.2 Explanation of Tradespace Exploration...................................................................................
15
1.2.1 Tradespace Exploration Param eterization of System ........................................................
15
1.2.2 How TSE Reasons about Dynam ic Context/Ilities...............................................................
17
1.3 Explanation of Epoch-Era Analysis............................................................................................
17
1.4 Questions- based Tradespace Approach...................................................................................
18
1.5 Developm ent of VisLab and IVTea Suite...................................................................................
19
1.5.1 Goals of IVTea.........................................................................................................................
20
1.5.2 Brief overview of IVTea Suite............................................................................................
20
1.5.3 Current Lim itations of IVTea Suite .....................................................................................
21
1.5.4 Potential Solutions to M ake Up For W hat IVTea Lacks ...................................................
23
1.5.5 Current Tradespace Exploration Tools and the Need for IVTea Suite .................................
23
1.6 Research Community and Industry Shows Need for Better TSE Tools....................................28
Chapter 2: IVTea Suite Overview .......................................................................................................
31
2.1 Capabilities.....................................................................................................................................
31
2.2 UI Overview ...................................................................................................................................
34
2.2.1 Design-Centric Analysis .....................................................................................................
34
2.2.2 Epoch-Centric Analysis...........................................................................................................35
2.2.3 Era-Centric Analysis................................................................................................................36
2.2.4 M anagem ent ..........................................................................................................................
36
2.3 IVTea Suite W alkthrough (Use Case 1: W hidbey-Island Class Ship)........................................ 37
2.4 IVTea Suite W alkthrough (Use Case 2: SpaceTug).....................................................................
53
2.5 Sum m ary of Current IVTea Suite ..............................................................................................
64
7
Chapter 3: Build up To Cloud IVTea .....................................................................................................
3.1 Database Com parison: W hich is Best for IVTea?....................................
65
. . . . .. . . . .. . . . .. . . . . . .. . . . . .. . . . .
65
3.1.1 SQL ..........................................................................................................................................
65
3.1.2 NoSQ L .....................................................................................................................................
67
3.2 Database Conclusion.....................................................................................................................70
3.3 Distributed Com puting Infrastructure: W hich is Best For IVTea? ...............................................
72
3.3.1 Self hosting..............................................................................................................................
72
3.3.2 AW S ........................................................................................................................................
72
3.4 Distributed Com puting Infrastructure Conclusion ...................................................................
75
3.5 Distributed Computing Frameworks: Which are the Best for IVTea? ..................
. .. . . . . . .. . . . . .. . . .
75
3.5.1 Hadoop/ Hadoop M apReduce ............................................................................................
76
3.5.2 Spark .......................................................................................................................................
78
3.6 Distributed Com puting Fram ework Conclusion........................................................................
81
3.7 IVTea Suite Recom m ended Architecture Sum m ary ...................................................................
81
Chapter 4: Detailed Overview of Cloud IVTea .....................................................................................
83
4.1 Architectural Diagram ...................................................................................................................
83
4.1.1 IVTea Suite ..............................................................................................................................
83
4.1.2 Cloud IVTea.............................................................................................................................
84
4.2 M ulti Era Affordability w ith Change Paths...............................................................................
85
4.1.2 The Big Problem Approach ................................................................................................
87
4.3 Dem onstration: Parallelizing M ulti-Arc Cost Paths w ith M apReduce ......................................
95
4.3.1 Introduction to Changeability............................................................................................
96
4.3.2 Distributed Com puting for Changeability Calculations ......................................
99
4.3.3 M ulti-Arc Cost Paths w ith M apRed uce Results ....................................................................
Chapter 5: Discussion and Conclusions ................................................................................................
103
107
5.1 State of TSE Tools and IVTea Suite's Role ....................................................................................
107
5.2 Results and Recom m endations ...................................................................................................
107
5.3 Future Work.................................................................................................................................
108
Bibliography ...........................................................................................................................................
109
Appendix A: M arkov Decision Process Exam ple Code...........................................................................113
Appendix B: M ulti-Arc Change Path M apReduce Code.........................................................................
8
115
List of Figures
Figure 1 . Frequency of ilities mentioned in journal articles and Google hits on the internet (de Weck,
e t a l. 2 0 1 2 )...............................................................................................................................................14
Figure 2. Tradespace Exploration Parameterization (Ross, 2009).....................................................
16
Figure 3 Relationship between attributes, design vectors, and tradespace data (Ross, et al. 2010) ..... 16
Figure 4. Epoch-Era Analysis (Ross, et al. 2008) ..................................................................................
17
18
Figure 5. Example Era: satellite system with serviceability (Ross, et al. 2008) ...................................
Figure 6. MySQI file size limit for respective operating Systems ("Limits on Table Size", 2015)..........22
Figure 7. VisualDOC's M ain User Interface..........................................................................................
25
Figure 8. ATSV's Prim ary M enu Bar Interface.....................................................................................
26
Figure 9. ATSV 's G lyph Plot......................................................................................................................26
Figure 10. Rave's M ain User Interface W indow ..................................................................................
27
Figure 11. The same tradespace plotted with different value functions, demonstrating how much the
chosen value function can affect the Pareto front of tradespaces (Ross, et al. 2015a)......................32
Figure 12. BATNA Space Viewer (Fitzgerald, et al. 2014) ..................................................................
33
Figure 13. IVTea Suite M odel Flow (Ross, et al. 2015a).....................................................................
33
Figure 14. IVTea Dashboard: primary window of IVTea Suite ............................................................
34
Figure 15 .Tradespace Viewer with selected designs..........................................................................
40
Figure 16. Comparison Tool comparing selected designs ..................................................................
41
Figure 17. Pareto Widget displaying the pareto optimal designs for the selected epoch ..................
42
Figure 18. Tradespace Viewer with pareto front displayed as red triangles......................................43
Figure 19. Tradespace Viewer with MainEngineType as the color-axis ............................................
44
Figure 20. Preference Explorer where each pane corresponds a preference of the Military officer. The
top graph shows the current utility function. The middle graph shows the tradespace evaluated with
the selected single attribute utility function. The bottom graph shows the entire tradespace which
might not be valid due to the constraints imposed by the single attribute utility function, represented
as bars o n the graph. ...............................................................................................................................
46
Figure 21. Preference Explorer demonstrating how the lowering of the endurance requirement to
8000km affects the tradespace ...............................................................................................................
Figure 22 Tradespace Viewer showing the addition of lower cost solutions to the tradespace as a
48
result of altered requirem ents ................................................................................................................
49
Figure 23. Carpet Plot showing correlations between designs variables and attributes....................50
Figure 24. Era Viewer: Pareto Front For Each Epoch ..........................................................................
51
Figure 25 Pareto Tool calculating join and compromise pareto sets for multiple stakeholders......52
Figure 26. Single Design Tradespace Widget displaying how a selected design performs across all
po ssib le e po chs........................................................................................................................................5
5
Figure 27. Fuzzy Pareto Num ber W idget for design 22.......................................................................
56
Figure 28. Fuzzy Pareto Number Widget for design 222.....................................................................57
Figu re 29 . Era V iew e r...............................................................................................................................58
Figure 30. Morph Widget showing 3 frames of animation in which the tradespace shifts due an epoch
sh ift ..........................................................................................................................................................
59
Figure 31. Filtered O utdegree W idget ................................................................................................
60
9
Figure 32. Filtered Outdegree Function W idget .................................................................................
61
Figure 33. Design Variable Streaks W idget..........................................................................................
62
Figure 34. Design Space V iew er...............................................................................................................63
Figure 35. Example of the Row and Column Storage of a SQL Database ("SQL Syntax", 2015).......... 66
Figure 36. Graph Database Speed Comparisons (Marzi, 2014)..........................................................70
Figure 37. List of Amazon EC2 On-Demand Prices by Instance Type ("Amazon EC2 Pricing," 2015).....73
Figure 38. List of Amazon S3 Prices by Storage/Month ("Amazon S3 Pricing," 2015)........................74
Figure 39. Example of Amount Spent on AWS for Research ..............................................................
74
Figure 40 Overview of MapReduce's Structure (Dean, et al. 2008) ...................................................
78
Figure 41. Overview of Apache Spark's Structure (Zaharia, et al. 2012)............................................79
Figure 42. Demonstrates triangle couting speedup resulting from using in memory architecture vs disk
based architecture (Suri, et al. 2011) ..................................................................................................
80
Figure 43. IVTea Suite current architecture........................................................................................
84
Figure 44. Proposed Cloud IVTea architecture ...................................................................................
85
Figure 45 Visualization showing a tradespace moving through epochs, forming a single era (Ross, et al.
2 0 0 6 )........................................................................................................................................................8
6
Figure 46. Modified Bellman equation code, calculating optimal policy and expected cost in $ and
m o ne y ......................................................................................................................................................
91
Figure 47. Output V matrix showing the expected aggregate cost for different epoch lengths......92
Figure 48. Output 7n matrix showing the best policy for different epoch lengths ...............................
92
Figure 49. Output V, matrix showing the best policy for different epoch lengths..............................93
Figure 50. Output Vt matrix showing the best policy for different epoch lengths.............................93
Figure 51. Tradespace modeled as a tradespace network graph through the use of transition rules ... 97
Figure 52. Example tradespace network graph for SpaceTug tradespace ..........................................
Figure 53. Example set of adjacency matrices for each rule ..............................................................
97
98
Figure 54. Diagram showing the user flow of the parallelizd multi-arc change path calculation .......... 99
Figure 55. Diagram showing information piped throughout the architecture used in the multi-arc
ca lcu latio n ..............................................................................................................................................
100
Figure 56. Examples of the random graphs generated for testing the multi-arc change path calculation
...............................................................................................................................................................
1 03
Figure 57. Graph results showing the runtime of the launched MapReduce tasks which solve the multiarc change path pro blem .......................................................................................................................
10
104
List of Tables
Table 1. List of visualizations for IVTea Suite v1.0 ..............................................................................
21
Table 2. IVTea Suite: Designs rendered vs. time per frame for a tradespace plot .............................
23
Table 3. Capabilities and features of design support tools ................................................................
24
Table 4. Rave Visualizations (Daskilewicz, et al. 2012).......................................................................
28
Table 5 W hidbey-Island Dataset Overview ..........................................................................................
37
Table 6 SpaceTug Dataset Overview ...................................................................................................
Table 7 SQ L Exam ple Table Schem a ....................................................................................................
53
Table 8 Example of Data Stored in Key-Value Database .....................................................................
68
Table 9 Example of Commands Used in Key-Value Database ............................................................
68
Table 10 Example of JSON Data Stored in MongoDB .........................................................................
69
Table 11 Exam ple of Queries Done in M ongo DB..............................................................................
69
65
Table 12. Shows Each AWS Region and the latency associated with pinging them............................75
Table 13. Example set of transition rules for SpaceTug tradespace...................................................97
Table 14. M ulti-arc change paths for design 1 ..................................................................................
98
Table 15. 2D matrix format used to store the tradespace network graph............................................100
Table 16. Command line flags used for running the multi-arc change path calculation.......................101
Table 17. Input file into the MapReduce Mapper function where a single line is input to each mapper
at a tim e .................................................................................................................................................
10 2
Table 18. Summary of advances made by Cloud IVTea Suite ................................................................
107
11
12
Chapter 1: Introduction and Background
1.1 The Challenge
Designing large scale systems today is difficult and often leads to suboptimal designs being selected.
One reason for this is that the current approaches for creating large scale systems (such as military
systems) do not evaluate the full range of possible designs and their associated costs and utilities
throughout the life cycle of the design (Diller, 2002) These approaches often lead to long design times
and designs that are optimized in a local, static context but may not be optimized globally for the
lifetime of the design. Confounding this, time and budget pressures can result in corner cutting and
careless accounting (Ross, et al. 2004). These are issues which could be addressed from the initial
design phase and alleviated by adequately exploring design choices.
1.1.1 Current Problems with Designing Large Scale System
In the early phase design of complex systems, it is important to explore the design space before
settling on a solution (Diller, 2002). This is a consequence of the ambiguity and uncertainty present
during early stages of a system's lifecycle, and is especially true for complex systems where the
prediction of value delivered by such systems may not be readily achieved. The process for selecting a
design for large scale, complex systems should involve rigorously assessing a large group of designs,
determining the feasibility and utility of each of those designs, and ultimately selecting the best
design (Shishko, et al. 1995). However, this is often not done in practice due to limited time and
budget, as well as expertise, for conducting broad exploration of possible system designs. As Ross and
Hastings argued, considering such a large number of options requires a significant investment of
time and money which are often not available (Ross, et al. 2005). Instead, ad-hoc solutions are used
which are biased and do not yield the best possible design. These ad hoc solutions may consist of
engineers setting a design baseline from previously developed concepts or favorite designs. From
there, an Analysis of Alternatives is done off of this baseline in which small changes to the design are
made to create a small set of designs near the baseline. Larger scale concept trades are sometimes
done in these ad hoc solutions but they fall short in that they are done with much lower fidelity than
the baseline set (Ross, et al. 2005). This leads to potentially good sets of designs being ruled out due
to simple approximations. Also these larger scale concept trades are typically done in much smaller
numbers than the designs generated from the baseline, which leaves many potentially good designs
pruned out before any consideration (U.S. Government Accountability Office, 2009). Designers
resort to these limited trade studies because it is too time intensive and expensive to properly
evaluate the entire space of design solutions. Stakeholders for these systems may also restrict the
space of designs by creating solution dependent requirements which constrain the space, even
though their own imposed restrictions might remove designs options they would benefit from
(Keeney, 1994). This is because stakeholders often do not adequately explore the design space
before creating requirements, meaning that the requirements might not coalesce with their actually
needs.
1.1.2 Considering Time, Change, and Ilities in Large Scale Systems
Another difficult issue faced when designing large scale, complex systems is how to select the
correct design given that the system will most likely be operating in an uncertain and dynamic
13
environment (Ricci, et al. 2014). Oftentimes during the design phase, only a static context is
considered and then optimized and iterated upon which leaves design solutions that are potentially
more flexible, robust, etc. prematurely ruled out when in fact they provide the most value
throughout the system's lifetime. When architecting a system, there should be an emphasis put on
enhancing lifecycle value sustainment from the early phases of design but this is often not the case
in practice. These system lifecycle properties that reflect different tradeoffs have been categorized
by research groups as 'ilities' (adaptability, changeability, flexibility, etc.). 'Ilities' are formally
"properties of engineering systems that often manifest and determine value after a system is put
into initial use. Rather than being primary functional requirements, these properties concern wider
impacts with respect to time and stakeholders." An example of some of the most commonly seen
ilities can be seen in Figure 1
8 Journal Articles (thousands)
a Google Hits (millions)
1000
100
10
.0
0
~~~0
1
0.
0.0.
.
0.
.0.
in
CL1
0
-0
Figure 1. Frequency of ilities mentioned in journal articles and Google hits on the internet (de Weck, et al. 2012)
Change-related ilities in a design are driven by the introduction of 'options'. An 'option' is the ability
to execute or activate a design decision or feature at a later point in the lifecycle in order to respond
to variations in the operational context and/or in stakeholder preferences.
For example, let us consider the design of an imaging satellite. The satellite has a potential design
choice regarding radiation shielding. In a typical optimization of the design, the non-shielded version
will likely be cheaper and equally good at imaging, prevailing from a value perspective in a static
context. When considered in a dynamic context with the potential for solar radiation exposure and
potential solar flares, the shielding option might actually allow for maximum sustained value over
time. Utilizing its shielding 'option' during the period of a solar flare, the shielded design would be
14
able to continue operating and delivering value. By evaluating the survivability of these two designs
it would be apparent which one is actually better to select. This is a very simple example to
elucidate the benefits of evaluating designs with the inclusion of ilities. Firstly, analysis of these
ilities is not done currently because it is too is time intensive and expensive to properly evaluate
each design for the range of contexts that the system might encounter. Secondly, it is not done
because the methods of formally evaluating/calculating ilities are not well known in the design
community, as many of these methods are active areas of research. Finally, due to the old paradigm
of premature optimization and writing solution-dependent requirements, designs that would have
valuable ilities aren't properly considered as they are weeded out before evaluation (Ross, et al.
2005). In fact, recent research suggests that designs which are overly optimized are actually very
fragile in the face of changing contexts (Carlson, et al. 2000)
1.2 Explanation of Tradespace Exploration
The tradespace is the space spanned by the completely enumerated design variables of a system. This
means that given a set of design variables, the tradespace is the space of possible design options
(Ross, et al. 2005). Exploring the tradespace entails looking at the space of designs and seeing how
each design performs across many dimensions of attributes, and how those attributes for each design
can change across various situations that the overall system might encounter. We cannot currently
explore the entire space in an unstructured way as it is too large and complex. This exploration of the
tradespace can be a daunting challenge and as such has led to the development of a specific
framework/paradigm designed to structure the tradespace exploration process. The tradespace
exploration paradigm consists of analyzing the performance of a large set of enumerated designs,
evaluating the performance with respect to the needs of multiple possible stakeholders, and
identifying how the performance and needs change in the context of many possible future scenarios
for the system (Ross, et al. 2010). Instead of identifying the "best" solution which might arise from an
optimization on a set of parameters, the tradespace exploration paradigm seeks to evaluate what
traditionally may have been considered "bad" designs in order to reveal the multi-dimensional
tradeoffs inherent in a complex design problem (Ross, et al. 2005). This is because in complex design
problems, designers often cannot agree upon a single unequivocal objective function, as argued by
Daskilewicz and German, 2012.
1.2.1 Tradespace Exploration Parameterization of System
Ross and Rhodes argued that in order to properly evaluate a tradespace of designs, quantification of
the system is necessary in order to develop a concrete specification of understanding across the
system lifespan (Ross, et al. 2008). Figure 2 depicts the process for parameterizing a system which is
required for tradespace development for the Multi-Attribute Tradespace Exploration (MATE) method
(Ross, 2003). The first step consists of the designer meeting with the key stakeholders to identify
system value and system concepts. Once the system concepts are well defined, they are
parameterized into design variables which are enumerated and then evaluated into cost via a cost
model (Figure 2). The design variables represent the aspects of the system that designer have control
over, for example the type of engine used. These variables are held as a set within the design vector,
with different particular levels of each design variable taken together being interpreted as specifying a
specific design. Once the definition of value is well defined, it is parameterized into a set of system
15
attributes which, as shown in Figure 2, are aggregated into utility (via a value model). These system
attributes are metrics which are derived from design variables via evaluation of performance models,
developed by subject matter experts and analysts.
quantitative
Constants I
aggregation
0.
0A
Atributes
Vables
0
,a
El05
C_
Each point represents
**
Tradespace: {Design Variables;Attributes} <--> {Cost;Utility
- . .a
feasible solution
Li4e
50s 52 5" 5
Lifecycle Cost ($M)
%0 60
Figure 2. Tradespace Exploration Parameterization (Ross, 2009)
In tradespace exploration, the flow of this process for a static context is that once design variables and
attributes have been properly parameterized, a model is designed that takes design variables and
outputs attributes. For example, in the case of a space shuttle being designed, the design variables for
the shuttle might be propulsion-type, payload mass, and propulsion mass. Each design variable has
an associated cost and as such can be aggregated into a cost for the system. These design variables
would then be run through a physics model and produce attributes for the shuttle, for example the
shuttle's capability and deltav. These attributes would then be run through a multi-attribute utility
function which models how decision makers and stakeholders value each attribute. This would then
be aggregated into an overall utility ranking for the designs. Figure 3 depicts an alternate view of the
data flow in tradespace exploration where designs are evaluated in terms of attributes, utilities, costs,
and key intermediate variables via model(s). Tradespace data is then stored in a database for further
exploration and analysis.
Mission Concept
Attributes
-dfni
-
UMt goed)
-utility
Design Vector
varb
mom
curves
Constants
Mo el
Attributes
Single-Attribute Utilities
Multi-Attribute Utilty(s)
Key
Tradespace
database to
be explored
For many
possible
designs
Cost(s)
Intermediate Variables
Figure 3 Relationship between attributes, design vectors, and tradespace data (Ross, et al. 2010)
16
1.2.2 How TSE Reasons about Dynamic Context/Ilities
Tradespace Exploration serves as a framework which provides insights into communicating and
quantifying the impact of changing requirements, uncertainty, and lifecycle system properties, or
'ilities,' such as flexibility and robustness (Ross, et al. 2005). As was demonstrated in Section 1.1.2,
ilities can be incredibly difficult to reason about as they are properties that can only be evaluated
when considering the lifetime of a design including a large number of potential situations that the
design might encounter in the external system it inhabits. Ilities are also useful to reason about as
they allow a much better evaluation of how much value a certain design will deliver over its lifetime
which is often what decision makers actually want to consider when designing large complex systems.
1.3 Explanation of Epoch-Era Analysis
Within the context of tradespace exploration, a method known as Epoch-Era Analysis (EEA) can be
used for reasoning about the dynamic system value environment. EEA characterizes the system
lifecycle that a design resides in by discretizing it into a possible a set of possible epochs which can be
strung together to form eras.
t N*
System Troctkry
Epc
Shod run
Long run
6A-W.
-1
2
2
Epoch 2
Epoch 3
Le
ALogw
Epah 4
0
Epoch 5
-
pw
S
E ectol "ns
Figure 4. Epoch-Era Analysis (Ross, et al. 2008)
An epoch can be thought of as specific context which could possibly occur in the external system a
design resides within. Formally, an epoch is a time period that bounds a specific change scenario,
during which utility functions, constraints, design concepts, available technologies, and articulated
attributes are defined. The purpose of the epoch is similar to short run analysis in Economics: to parse
a complex problem into a series of simpler ones (Ross, et al. 2008). The value of a design is evaluated
at each epoch, meaning that when an epoch shifts from one to another, the operational value of the
design can change.
Over time, as technology evolves and the needs and preferences of stakeholders change, the ideal
design will not remain fixed. Within each epoch, the perceived value and the short-run dynamics of a
system are fixed and known. This means that while the system itself can change (for example, due to
degradation, breakdown, servicing, and repairs) the needs and preferences of the designers will not
17
over an epoch. Changes in technology or stakeholder need do occur across epochs. This framework
simplifies the modelling of the complex and uncertain design lifetime.
An example era for a satellite system with serviceability used originally in Ross and Rhodes' work can
be seen Figure 5 (Ross, et al. 2008). The Era begins with the Epoch 1, the beginning of life for the
system. As the system operates within the static context of Epochi, value degradation occurs which
lowers the value of the system. A major failure disturbance occurs to the system which dramatically
lowers the value. As the satellite is serviceable, there is servicing done in order to restore the satellite
to the prior level of value delivery in its continued operation. Then a new context Epoch 2 which due
to the introduction and expectations of new satellite technology, resulting in the exact same system
being perceived by the stakeholders as now having significantly decreased value. After a service to
upgrade to the satellite system with these new technologies, including a service time outage for which
the system is not operation and has no value, the system again reaches a high level of perceived value.
The system continues its operation in Epoch 2 until context changes result in another epoch. This
continues until end of life for the system.
Value
degradation
New Context: new
value function
Major falure
(objective fcn)
Service to
Major falure
'Upgrack:
T
System BOL
I
1
Epoch 211
System EOL
Epoch n
-
0
T
SS
Service to
restore'
Value outage:
Servicing time
Service to
Same system,
restore*
but perceived
value decrease
Figure 5. Example Era: satellite system with serviceability (Ross, et al. 2008)
Through leveraging the concept of EEA, a systems entire lifecycle value is now capturable. Knowledge
of what epochs and eras to evaluate for a system can be captured through a parameterization process
similar to that outlined in Section 1.2.1. By leveraging the concepts of epochs and eras, tradespace
exploration is able to analyze the 'ilities' that decision makers choices depend on which were once
infeasible. For example in order to understand the changeability of a certain set of design options
there must be a framing of the system of systems which control the situations that a design might
come into contact with. These types of calculations which require an understanding of the system of
systems a design resides in can only be done on top of the framework provided by EEA.
1.4 Questions-based Tradespace Approach
A series of practical questions from decision makers and analysts can be used to drive the process of
18
tradespace exploration. These questions guide the design process, to ensure that the complexities of
ilities and dynamic value are addressed by the final design. The answers to the questions underlying
tradespace exploration come from evaluating stakeholder preferences and analyst data to capture
knowledge about their values and preferences, and it is these answers that ultimately result in the
selection of a design. Emerging through the course of ten years of tradespace exploration studies, this
question-driven approach has shown to be a useful in constructing the exploration process (Ross, et al.
2010). When considering high-level decision makers who will make critical decisions concerning large,
complex systems, the following questions provide a starting point for organizing the tradespace
exploration effort (the particular ordering of the questions is recommended, but not required):
1. Can we find good value designs?
2. What are the strengths and weaknesses of selected designs?
3. To which parameters is value most sensitive?
4. Are lower cost designs feasible? What compromises are needed to lower costs?
5. Is the problem dominated by underlying physical, technological, or budgetary effects or limits?
6. What about time and change?
7. What about uncertainty? What if my needs change?
8. How can detailed design development be initiated in a ways that maximize the chance of
program success?
9. What if I also need to satisfy another decision maker?
10. Do close looks at the design and more advanced visualizations support our conclusions?
11. How can we find more (good) designs to include in the tradespace?
1.5 Development of VisLab and IVTea Suite
Development of the Interactive Value-driven Tradespace Exploration and Analysis (IVTea) Suite began
in 2009 under the name VisLab (Visualization Laboratory). The original vision for VisLab was to create
a platform leveraging the research library of SEAri and allow for the effective reuse of data and
advanced tradespace visualizations without the need to 'reinvent the wheel' for every project (Ross,
2015b). Incorporating key dimensions that modern TSE methods seek such as multiple decision maker
perspectives, temporal representation for all data, and varying degrees of fidelity were also high
priorities for the tool (Ross, 2009). By providing real-time feedback, the interactive software tool
would be able to reduce the delay between imagining questions and finding answers. This would
ultimately allow users to accelerate their development of insight into systems of interest.
Additionally, the promise of a highly modular code base could enable graduate students to contribute
individual 'widgets', thus rapidly and easily expanding the software's capabilities over time as new
19
techniques were created at MIT SEAri.
The earliest goals of VisLab 1.0 were to rapidly facilitate the state of practice for TSE, allow for data
consistency across multiple user sessions through a database backend, and allow for a linked
representation of data that is consistent across all of the tools views (Ross, 2009). During the
development of VisLab 1.0, the key vision captured by the software was one of supporting epochcentric analysis: the visualization and analysis of the different tradespaces created by varying the
context and preferences under which the system operates (Ross, 2015b). As SEAri research began to
expand more heavily into multi-epoch and era analysis (across all uncertainty and across timedependent sequences of uncertainty, respectively), it became apparent that VisLab would require
considerable architecture upgrades in order to handle these advanced analysis types. VisLab 2.0 and
subsequently IVTea 1.0 have gradually improved the architecture and user experience of the software,
now supporting all of these analyses and providing a comprehensive set of perspectives from which to
view the design problem.
1.5.1 Goals of IVTea
With recent advancements in computing, designers can now analyze millions of design alternatives
more cheaply and quickly than ever before using model simulation and visualization. These
advancements allow for the opportunity to revolutionize the design process for complex systems
(e.g., automobiles, aircraft, satellites, etc.) that consist of multiple interacting subsystems and
components designs by engineers of various disciples (Stump, et al. 2009). Software tools leveraging
these computational advancements allow for design tradeoffs in large sets of designs to be analyzed
cheaply and quickly, overcoming the previous barriers to entry. For these reasons we have built
IVTea Suite (Interactive Value-Driven Tradespace Exploration and Analysis Suite). IVTea Suite is a
software package intended to help engineering analysts, stakeholders, and decision makers uncover
insights about their systems and support value robust decision making in the face of large, uncertain
problems. The end goal of IVTea Suite is to allow users to vie'w, and real-time interact with, vast
amounts of design data in order to reason about their complex system, compare designs across
important lifecycle properties that are currently overlooked (changeability, survivability, etc.) and
iteratively evaluate their design throughout the entire design process incorporating newly obtained
information.
1.5.2 Brief overview of IVTea Suite
In its current state, IVTea Suite provides visualization and analysis capabilities for the interrogation of
performance and value models throughout a system's lifecycle. IVTea Suite's features include a host
of visualizations including 2d/3d scatterplots, carpet plots, table comparisons, etc. (for a full list of
visualizations see Table 1).
20
Table 1. List of visualizations for
IVTea
Suite v1.0
Discrete Visualization
Continuous Visualization
Tables/text
Other
Line Plot
Scatter Plot
3D Scatter Plot
Histogram
Scatter plot Matrix
Line plot
Carpet plot
Data table
Data Point Details
Comparison Table
Static text
Animated Scatter Plot
Notes/Annotations
Also through the use of multi-dimensional graphs, viewing multiple plots on a graph simultaneously,
color coding, and transparency, IVTea Suite allows users to view and compare multiple dimensions of
information simultaneously. IVTea Suite also fully supports the real time creation and modification of
value models applied to the system. Multi-attribute Effectiveness models are all present in the IVTea
architecture and can be compared side-by-side, modified, or swapped on the fly in order to
understand the impact that different value models have on the tradespace. IVTea Suite also supports
analysis capabilities for viewing trade spaces in multiple contexts which allows for lifecycle and
scenario planning. By taking this dynamic context into consideration, IVTea is able to give users insight
into system lifecycle properties. IVTea currently some changeability based analysis tools as an
example of this and more ilities based tools are being developed. These features allow users to be
able to interactively interrogate their complex system, gain insights into the tradeoffs between
designs, and consider important aspects of their designs that would have otherwise been overlooked.
A more detailed analysis and walkthrough of IVTea Suite's features can be seen in Section 2.
1.5.3 Current Limitations of IVTea Suite
While IVTea Suite has a lot of useful functionality, as it is currently architected there are some
desired features that cannot be implemented and some features which do not scale to large data due
to insufficient implementation. These include the ability to construct design attributes from design
variables on the fly via modeling/simulation code from within IVTea Suite, to query very large (~100
TB) data sets for tradespace exploration, and to compute computationally expensive system lifecycle
value metrics such as changeability on the fly from within IVTea Suite. One such architectural
limitation is that IVTea Suite is currently designed to display precomputed attributes and system
lifecycle properties. Due to this, while some important quantities such as utility can be updated in
real time, things such as changes to the performance model or changes in the cost of a changeability
pathway cannot be done in a real time exploratory way. The process for changing these currently is:
compute them for a value, explore the space, gain insight into what other values the analysts would
want to see, alter the values to those more interesting values, then recompute; this is not as fast and
intuitive of a process as it should be.
In addition, IVTea Suite does not provide a way of efficiently querying its precomputed data for
specific sets or ranges of designs based off of performance metrics. This is necessary to facilitate
the goals outlined because of the need to evaluate designs across different utility functions across
different epoch-era spaces as required. For example, for calculating changeability of designs, IVTea
21
Suite currently queries the entire database's information and then processes them client side
instead of performing a query on the database that would immediately yield the values desired.
Also, for the majority of the important ilities to be evaluated, an extensive graph search through the
epoch-era space is required; this search would likely require more disk and memory than any single
computer could provide, and would take an unreasonably long time to run serially.
Operating System
File-size Limit
Win32 WI FATFAT32
2GBI4GB
Win32 WI NTFS
2TB (possibly larger)
Linux 2.2-Intel 32-bit
2GB (LFS: 4GB)
Linux 2.4+
(using ext3 file system) 4TB
Solaris 9/10
16TB
MacOS Xw/HFS+
2TB
NetWare w/NSS file system 8TB
Figure 6. MySQl file size limit for respective operating Systems ("Limits on Table Size", 2015)
IVTea Suite is also limited by its data storage implementation: it currently reads data from a single
server MySQL database or a local SQLite server. As such, there is a practical limit from 2GB-16TB
(operating system dependent, see Figure 6) to the amount of data that can be stored or analyzed,
even with top of the line hardware. This is not enough storage for many problems, given the sheer
complexity that arises from analyzing design variables and attributes through all possible lifecycle
scenarios. In user interviews with Navy designers, we learned that they have 40TB of unstructured
data they would like to leverage for tradespace exploration; the current version of IVTea Suite would
be able to handle only a small fraction of that data.
Even if IVTea Suite could store enough data, reading or writing that much to disk from a single
machine will be exceptionally slow, as there is no easy way to do reads or writes in parallel as all of the
information would be housed on one computer. This kind of data sharding and request parallelization
would dramatically increase overall application speed.
IVTea also has an inadequate implementation for processing and visualizing large sets of data; it can
take several seconds to render large datasets on a desktop computer. The basic advice regarding
response times has been about the same time for many years (Miller, 1968; Card, et al. 1991):
0.1 scnoonds ;s -bl+
+k^ tum limit -Flrhaving the user fee| that the system is reac
_.1ting
instantaneously
*
1.0 second is about the time limit for the user's flow of thought to stay uninterrupted
*
10 seconds is about the time limit for keeping the user's attention focused on the dialogue.
For longer delays, users will want to perform other tasks while waiting on the computer to
finish.
Also from the US Department of Defense Design Criteria Standard, it is required that a "Key Print"
22
have a response time of 0.2 seconds for a "key depression until appearance of character." In this case
we assume a mouse movement done to alter the view of data to be a "key depression" (AMSC, 1999).
To illustrate this limitation, we created randomized sets of data and timed the process of interrogating
them through IVTea Suite's tradespace visualizer (i.e. scatterplot tool) on a typical desktop computer.
Results are presented in table 2. IVTea Suite currently allows for responsive data interaction for up to
10A5 designs, after which the latency involved detracts heavily from the desired user workflow. These
results are, of course, hardware dependent; we could potentially increase this limitation by a few
orders of magnitude using hardware improvements alone. In order to interact with very large
datasets, however, we would need to overcome architectural limitations in IVTea Suite, to allow us to
use multiple machines for rendering, or to reduce the computational burden of data interaction.
Current research is investigating how to address this rendering limitation in IVTea Suite (such as
binned aggregation) and will not be discussed further, as this thesis focuses on the data storage and
processing challenges for IVTea (Curry, et al. 2015).
Table 2. lVTea Suite: Designs rendered vs. time per frame for a tradespace plot
Number of Designs Rendered
^OA5
Average Time per frame
0.084s (reasonable)
0.793s (impractical)
4.156s (unusable)
10A6
10A7
1.5.4 Potential Solutions to Make Up For What IVTea Lacks
One way to overcome these limitations is to use cloud services. Cloud services are software systems
which leverage the power of distributed computing in order to allow data scalability, computational
parallelism, and numerous advantageous hardware abstractions. For example, leveraging Amazon's
SimpleDB will allow us to scale up to larger datasets by allowing the data stored to be distributed
across a number of different remote servers. Also SimpleDB effectively shards data across servers,
meaning that when a user wants to query a vast amount of information, that query will not be
served by a single server containing all of the tables but instead a variety of servers doing query
operations and disk reads/writes in parallel. As an added benefit, this data is fault tolerant due to
replication of the data across servers. Using AWS (Amazon Web Services) EC2 (Elastic Compute
Cloud) will allow us to quickly and cheaply scale the computational resources available for analysis
as necessary. Utilizing cloud computing paradigms such as MapReduce allows us to simply and
effectively distribute our workload by giving us a programming abstraction that allow simple running
operations in parallel. By integrating IVTea Suite with cloud services like Amazon SimpleDB and
Amazon EC2, and cloud computing programming model such as MapReduce we can overcome these
limitations and deliver the promised utility of a real-time and scalable interactive tradespace
exploration tool.
1.5.5 Current Tradespace Exploration Tools and the Need for IVTea Suite
The difficulties described in Section 1.1 have prompted the development of several software tools in
domains closely related to and within tradespace exploration. One set of tools, known as analysis
integration frameworks, focus primarily on taking designs and running/piping them through various
23
complex simulations in order to be able to generate parameters of how a design performs in various
models. These frameworks also typically include optimization functions so that running designs
through the simulation and optimizers leads to an output which are the theoretically best possible
designs for the given optimization parameters. In addition, there are decision support tools that do
not provide the ability to link and automate design, but instead accept as input data sets obtained
from separate design analysis and provide many visualization-enabled design techniques to analyze
the input data. There is also a growing class of hybrid tools that attempt to incorporate some set of
analysis/optimization and visualization-enabled design techniques into a single package. While there
are a number of tools in the current landscape that aid in tradespace exploration, we will be doing a
focused analysis on one tool from each category to allow for in depth analysis as to the structure of
programs in that category. A brief overview of several of these types of tools is now described; for a
comparison of IVTea's features to the discussed tools see Table 3.
Table 3. Capabilities and features of design support tools
Legend:
. =
primary functionality, o = secondary functionality
ATSV
Performance model linking
Discrete visualization
Continuous visualization
Linked visualizations
Interactive visualization
Persistent arrangement of graphs
Single-objective optimization
Multi-objective optimization
Constraints/feasibility assessment
MCDM/interactive preferences
Design of experiments
Scenario Planning
Lifecycle Planning (ilities)
Multi-Stakeholder
Multi-platform
User/customer support services
User-extensible capabilities
VisualDOC
0
0
Rave
0__
IVTea Suite
0
0
o
o
0
0
o
0o
0
0
0
0
0
0
1
0
0
0
0
0
o
0o
_
_
VisualDOC
An example of an analysis integration framework is VisualDOC, developed by Vanderplaats Research
and Development. VisualDOC was originally designed to allow a person without an optimization
background to start applying optimization to their particular problem within a couple of hours after
using the software (Balabanov, et al. 2002). VisualDOC can be classified as a decision-support tool
designed primarily for design optimization, simulation and analysis function linking.
VisualDOC's primary user interface is a large flowchart that users can add processes to in order to
create paths of analysis steps which can depend on certain values (See Figure 7).
24
4VisualDOC
Help
?
Component Editor
Flowchart Components
Stan
Read Inputs
? Check I
Error Report
ata Unke
Simulation
4W
il
Esc
Work Flow
I-. f X
7.1 : Multi-level 0ptimiZation ,Vdbx
File Edit Run Tools Database
C
Processing
Montss
.'
Stan
El
0
Stop
? Check 2
? 4
?
R!t
System-level Optimization
4 Optimization
*Malvss2
~ ~DOE
Analysis 1
El
Simulation Summary
iStandard Messages
I
Error messages
Lwarning
Task id 0
messages
External Output
Task name: Default Model
External Error
Viewv monitors
Verbosity Verbose Output
JMode:
Edit
L At switch
13 MB used out of 1786 MB
Figure 7. VisualDOC's Main User Interface
Users create a simulation and optimization flow through various which results in the outputs of those
simulations and optimizations which can be visualized within VisualDOC. VisualDOC is useful in
situations where complex, multi-stage simulations are required in order to analyze performance
attributes of designs. VisualDOC can also be useful in the context of optimizing small parts of designs
(for example the shape of wing) in which the attributes to optimize on are well known and the models
are simple or very well proved.
While VisualDOC does have a considerable amount of optimization methods and software applications
it can integrate easily with (e.g., Excel, MATLAB), it does not solve many of the leading issues that
motivate tradespace exploration. Design optimization in the context of the large, complex systems is
often times a negative thing as it prematurely prunes design options if the parameters being
optimized do not properly encode the decision maker's needs. What decision makers should do is use
tools that allow for exploration of designs, allowing them to visually steer their own optimization and
gain insight into the complex system.
ATSV
An example of a decision-support tool that interacts with data imported from separate analysis is
the ATSV (ARL Trade Space Visualizer), developed at Pennsylvania State University. The ATSV was
designed to be a decision-support tool with advanced visualization capabilities that allows for
visualizing high dimensional data, gaining insight into complex systems and visual steering for
25
supervised optimization (Stump, et al. 2004). The ATSV's primary user interface consists of a primary
menu bar which can launch various types of visualizations (See Figure 8).
Trade Space Visualizer: Applied Research Laboratory at Penn State
-
Cl
~~~~~~~~~---
P1ots Controllers Anatysis Help
- - - - - - - - - - - - - - - - - - - - - -----t
Figure 8. ATSV's Primary Menu Bar Interface
For visualizing multidimensional data, ATSV provides data analysis tools such as brushing, linked
views, Pareto front displays, preference shading, and data reduction and zooming methods. ATSV's
primary visualization tool, the glyph plot can be seen in Figure 9
Glyph Plot : data\carOatabase\CarData-updated.csv
File Options lsosutfaces Lines
!H-
oMapping Options
x· Axis
MSRP
y -Axis
Horsepower (hp}
Constant
Color
Constant
Constant
Tr~ency
Constant
Text
Constant
Mouse Controls
Left Button : Rotate
Mdcle Button : Pan
RlQht Button : Zoom
Figure 9. ATSV's Glyph Plot
ATSV is also capable of dynamically calculating Pareto fronts based off of input preference functions,
allowing users to explore the space of designs and alter their preferences based off of what exists.
ATSV also has the ability to link into and query simulation models, allowing it to point sample
simulations and generate in real time points of interest. ATSV is geared towards use in visual steering
tasks in which a decision maker "shops" around the design space of a complex system, generating
sample points in areas of interest, identifying relationships between different design variables, and
dynamically applying constraints and preferences to the tradespace in real time (Stump, et al. 2009) .
While ATSV has support for tradespace visualization and gaining insight into complex systems, it
26
focuses on looking at designs in a static context which is not what is desirable for large scale systems
that are expected to deliver sustainable value over their lifetime. ATSV does not have native support
for viewing the uncertainty of information or any type of scenario planning interface, instead focusing
on user steered design optimization. ATSV also has issues with rendering datasets that are on the
order of 10A4 designs. This is problematic when trying to explore tradespaces with a large numbers of
designs as ATSV becomes very slow and lags considerably in response to user inputs.
Rave
An example of a hybrid software solution, incorporating both analysis and visualization is Rave,
developed at the Georgia Institute of Technology. Rave is implemented in MATLAB and is open
source under the General Public License. The driving principles behind the creation of Rave were to
create a tool focused on the design decision support techniques of: visualization, optimization, and
j
surrogate modeling. Rave is built as a library and supports a plug-in style API to allow for
researchers to modify Rave's existing source code.
Discrete Graphics Demo
TSIFC t
0
04
0
'350
03
01
Sin..
Paw.m9
UO
%
a
01
X101
GM9
MU 998s99M9
940
EnxMVW
I
I
De bv XA Ai.V.44
599.p
03
I
25
20
13991nc 130 U60
EROC I
mean
949.9
Varawe
Sid0ev
84433,9
*6499929
13786523
1301745
59MI5
22 5W9
42571?
-1911?
Me
V&a~t
530Dev
6694111
5077558
O40s
-0 93994
W
I
592M4
.leo
9081
of
Data Filter Controls:
Ca.. 9...S..
LPCER
279
GS99 ONW
22923
299W
3NOONM
T4a..
3121
3439NO
r~
Lear
09901
92992
9.p
lw 94%89
Ia, 7933
DaKI.
63648S
33744
TOR
f..dComt1
9732?796
97683
49
-OSINW
-
EgM41
42049
4352
1=2g Ma..
14295
NMN
Iwo
EVOC 1
13M3 JOSMMW
14116
EROC?2
33
9NOMMMM7146
P9.91.499. 29VAW
0(9,OAN
OO
C4.5ba.9
ON Ow"A00
on
-m
09
094
Data Selection Controls:
sow4
Dam:
s .9
ftUA
HI& S~hbdsd Dow
0WO
JE EEUUDow
.1i
Figure 10. Rave's Main User Interface Window
Rave's interface with various visualizations can be seen in Figure 10 and is composed primarily of a
series of a sidebar of tabbed controls, a window navigator, and a workspace consisting of all the users'
visualization and analysis tools and a data table view. Users can load data into Rave from flat text files
as well as generate data from flat text files and loaded analysis functions the user supplies. Rave
supports a variety of visualization options which can be seen in Table 4.
27
Table 4. Rave Visualizations (Daskilewicz, et al. 2012)
Discrete Visualization
Continuous Visualization
Tables/text
Other
Line plot
Scatter plot
3D Scatter plot
Histogram
Bivariate (3D) histogram
Cumulative distribution
Stacked bar graph
Stacked area graph
Parallel coordinates plot
Data overlayed on image
Data density scatter plot
Line plot (function based)
Prediction profiler (matrix
of line plots)
Contour plot
Counter plot matrix
Carpet plot
Surface plot
Response profiler
Data Shift
Derivative field
Point-to-Point
Data table
Data point details
Ranked List
Calculated value
Static text
Bitmap image
Postscript image
Geometric
shapes/Annotations
House of plots
When an appropriate objective function is available, Rave supports running optimizations which may
be used to seek the best design for a given parameterization of the problem. Rave also allows
generation of surrogate models through user supplied analysis functions (Daskilewicz, et al. 2012).
With this functionality, Rave is suited primarily as a toolbox for expert tradespace analysts. With the
Rave tool providing a plethora of advanced analysis and visualization tools as shown in Figure 10 and
Table 4, users are able to skip creating this functionality on their own and can instead focus on
advanced data interrogation. This allows users who know what tools to leverage to be able to work
faster in a single integrated tool.
While Rave has the general capabilities for doing a variety of analysis and visualization tasks, it lacks
any kind of guided structure which ensures that only expert tradespace analysts will be able to
leverage the software capabilities of Rave. Also Rave has no built in support for analyzing systems
through changing contexts which make selecting designs for large scale, complex systems very
difficult. As such, the insights that could be captured from the system and discoverable tradeoffs in a
tool like Rave would have to be things the user would already know to look for, not concepts that the
user is discovering through using the tool itself.
1.6 Research Community and Industry Shows Need for Better TSE Tools
Despite the availability of these design decision support tools, the scientific and engineering research
communities have expressed that existing tools do not currently support designers' needs to switch
between the design comparison, information seeking, and design selection tasks. Participants in a
2010 NSF workshop held regarding multidisciplinary design optimization expressed the need for tools
that support tradespace exploration, as well as for tools that can be used by experts and non-experts
alike (Simpson, et al. 2011). Similarly a 2014 research agenda in which 40 researchers and
practitioners involved in tradespace exploration cited "identification of tradespace features" and
"incorporating the ilities" as important areas for future work for tradespace exploration tools (Spero,
et al. 2014).
28
These challenges are so important to overcome that the Department of Defense has outlined an entire
program dedicated to addressing the issue of designing large scale systems and making informed
decisions regarding them. This program, titled Engineered Resilient Systems (ERS), is designed to be
an evolving framework and an integrated, trusted computational environment supporting all phases
of acquisition and operational analysis (Neches, 2011). The program's ultimate goal of allowing for the
repeated creation of resilient large scale systems necessitates data-driven, informed decisions
(Rhodes, et al. 2014). While the program is looking at various areas of research to address these
challenges, some important concepts ERS has already identified as necessary to meet its goals include
high performance computing and tradespace exploration (Goerger, et al. 2014).
With IVTea Suite we are attempting to shift the current Tradespace Exploration paradigm to be one in
which the entire lifecycle of a system is considered from the beginning, and tradespace exploration is
done not only in the early phases of design, but also throughout the entire lifecycle. In this thesis, we
review the state of the art in this enhanced tradespace exploration, and explore ways in which we can
use cloud software to expand the existing analysis capabilities. The ultimate objective is to allow
tradespace exploration to scale to much larger design problems, and to be informed by far larger
amounts of data, in order to facilitate the development of higher value, more resilient systems.
29
30
Chapter 2: IVTea Suite Overview
IVTea Suite is a software package designed with the tradespace exploration concepts from Section 1.2
at its core (e.g., MATE and EEA). lVTea Suite is designed to guide users allowing them to effectively
explore their design data, allowing them to uncover design tradeoffs, reason about how policies and
utility affect the tradespace, and reason about important ilities.
2.1 Capabilities
IVTea provides users with a host of visualization and analysis tools to allow users to explore their
design tradespace. As mentioned in Section 1.5.2, through the use of multi-dimensional graphs,
viewing multiple plots on a graph simultaneously, color coding, and transparency, IVTea Suite allows
users to view and compare multiple dimensions of information simultaneously. IVTea Suite supports
both discrete and continuous visualizations and a full list of supported visualization are shown in Table
1. These visualizations are both linked and interactive. They are linked in the sense that any data
selection or data brushing done in one visualization is immediately transferred to all other
visualizations and the selected designs are stored in local session state. With IVTea Suite's storing of
"Favorites" mentioned in Section 2.2.4, users are able to store in there exploration session "good"
designs they find throughout the exploration process which are linked across all of lVTea Suite. This
allows for further interrogation of designs with the entirety of IVTea Suite's analysis tools. The
visualizations are interactive in that they allow dynamic altering of axis variables, readjustment of
value models, value inputs for analysis tools, and real-time updating of selected designs, epochs and
eras.
IVTea Suite also fully supports the real time creation and modification of value models applied to the
system, allowing for single objective and multi-objective steered optimizations. Multi-attribute
Effectiveness models are all present in the IVTea architecture and can be compared side-by-side,
modified, or swapped on the fly in order to understand the impact that different value models have
on the tradespace. This includes the MAU, AHP, CBA, and MOE value models and each has an
accompanying interface built in to IVTea. This swapping of value models allows users of IVTea to
easily see the consequences of model choices on their tradespaces. This process can be used as a
sensitivity analysis as demonstrated by Ross (Ross, et al. 2015a). By looking at the optimal values for
each value model and seeing which designs are robust to value model change, users can gain insight
into which designs perform better in multiple value contexts. Figure 11 shows an example of a
tradespace evaluated with different value. This gives users insight into solutions robust to value
model change and offers into which value model most appropriately captures the value of the system.
31
1
M r 63 01.61%
'
I
09
08
A
I 107.A
I
00
I
I
X -334
(00."61
09
06
4
0?
05
i
0,
03
I
02
01
0
500
1
ISMs 2000
S(Cost)
2D0
30D
3S0
0,2 0
0.1
04
4 13
3 -~64 (1b.~6
I
.
000
6
66
3
6
6
7000
2
l4000
3300
1 200 2M M
015
0
W
IO0
3600
M
O
4WD
SW0
stCost)
MAU4ost Pareto 5.t A AHP-Cost Pareto Set 4 CBA-Cost Pareto Set
1000
100
2000
2600
30
3S0
400
SOC")
MOEoet Pareto Set V
Figure 11. The same tradespace plotted with different value functions, demonstrating how much the chosen value
function can affect the Pareto front of tradespaces (Ross, et al. 2015a)
IVTea Suite also contains analysis capabilities for viewing trade spaces in multiple external contexts
which allows for scenario and lifecycle planning. By utilizing the epoch model for characterizing
external environments, IVTea Suite is capable of providing scenario planning by allowing users to
create epochs and analyze how their designs would perform in them. Stringing these epochs together
into an era, IVTea Suite is able to perform analysis on lifecycle of designs, allowing users to view how
their tradespaces evolve as different scenarios occur. Taking this dynamic context into consideration,
IVTea is able to give users insight into system lifecycle properties. IVTea currently some changeability
based analysis tools (Filtered Outdegree, DV Streaks) and more ilities based tools are being developed.
Multi-stakeholder tools are also being developed which allows for multiple stakeholders to improve
their negotiations in the face of differing interests. These tools focus on reframing the design
selection process as looking for the Best Alternative to a Negotiated Agreement (BATNA), and focusing
on where individual's requirements overlap through IVTea Suite. This allows groups of stakeholders to
gain a higher awareness for the overarching needs of the all stakeholders, reduces positional
bargaining and reduces attachment to one-sided solutions (Fitzgerald, et al. 2014). An example of one
such visualization tool designed to force users to view their BATNA instead of their Pareto front can be
seen in Figure 12.
32
............
.
.......
MODM
44
Figure 12. BATNA Space Viewer (Fitzgerald, et al. 2014)
IVTea is also user extensible through user created Widgets (See Section 2.2). Researchers can easily
create custom MATLAB Widgets which use the lVTea Suite API to seamlessly integrate their custom
into the linked, interactive framework IVTea Suite provides. Also as lVTea
analysis/visualization tool
7 aW~
Suite is written in MATLAB and as such is inherently multi-platform.
To summarize lVTea Suite's current capabilities, users are able to interrogate the portions of their data
which comprise the design space, epoch space, resource space, performance space, and value space.
Users are also able to modify and execute alternative value models across their designs. Modifying
and re-execution of cost models and performance models as well as the inclusion of multi-stakeholder
analysis tools are future (to be added) capabilities. The overall flow of using lVTea Suite, based on the
parameterization from Section 1.2, can be seen in Figure 13.
Epoch
Space
Decision
Decision
(Problem)
(Solution)
Vakie
Spame
Design
Space
Needs
contexts
PResournce
Space
Figure 13.
IVTea Suite Model Flow (Ross, et al. 2015a)
33
M _1
2.2 UI Overview
The primary window of IVTea Suite is the Dashboard. The Dashboard with all of the important
components labeled can be seen in Figure 14. In the upper left of the Dashboard we have a Tab Bar
which has tabs for each primary analysis type that IVTea Suite supports (design, epoch, and era) as
well as a management tab. The main pane of IVTea's dashboard contains the complete set of widgets
for the designated tab. Widgets are tools or visualizations specifically designed to perform one task,
with the full set of widgets providing the complete functionality of IVTea Suite when used together.
Widgets are designed to be self-contained windows allowing for multiple widgets to be active at once
and for users to essentially create a preferred user-interface layout out of the widgets. All of IVTea
Suite's widgets are linked together by locally-stored session data. This allows for any changes or
interrogation done to the data being analyzed in one widget to be immediately propagated to all
active widgets. In the upper left corner of the Dashboard we have 'file' menu which contains key
features and is consistent across all of IVTea's widgets. The current set of widgets is by no means
complete. The intended workflows of IVTea Suite are categorized in terms of design-centric, epochcentric, or era-centric analyses, described below.
~. e~~.
~.
H~
Menu Bar
XT0
WWidgeu
Figure 14. IVTea Dashboard: primary window of IVTea Suite
2.2.1 Design-Centric Analysis
Design-centric analysis focuses on examining the tradespace data with a focus on a specific set of
designs. In this way users are able to focus on designs they have find appealing or interesting through
interactive tradespace exploration and examine them, for example, across all epochs of interest. This
design emphasize helps answer questions such as "How do these designs deal with uncertainty in my
system?" and "How can we find more (good) designs to include in the tradespace?" The widgets for
design-centric analysis focus on selecting a subset of designs and allowing users to analyze that
particular set. The widgets in design-centric analysis include:
*
Design Filter - Find the set of designs that obey a user-specified group of logical
34
__
statements about their design variables.
*
Design Knobs - See the defining variables of a design, and easily modify them to find
similar designs.
*
Design Tradespace Viewer - A variation on the standard tradespace scatterplot that
shows each epoch as a point on the graph for a single design (rather than each design as
a point for a single epoch). Has the same features as the Tradespace Viewer, but is used
to identify the effects of changing context on the performance of one design.
*
Design Space Viewer - Create a grid of scatterplots and histograms showing the
enumeration scheme and completeness of the design space.
*
Comparison Tool - Place designs of interest into a table allowing side-by-side
comparison of their variables and performance attributes. A baseline design can be
set, coloring the other table entries based on their higher/lower relationship to the
baseline.
*
Fuzzy Pareto Number - Show a histogram of the specified design's Fuzzy Pareto Number
(a measure of cost-benefit efficiency) across all epochs. This view gives an overview of
the design's performance across the complete uncertainty space.
*
Filtered Outdegree - Plot a tradespace and color it with live calculation of Filtered
Outdegree for specified designs. This illustrates differences in available change
options for each design.
2.2.2 Epoch-Centric Analysis
Epoch-centric analysis focuses on examining the tradespace data with a focus on a specific set of
epochs. In doing so, users are able to focus on epochs they have specific interest in through
interactive tradespace exploration and examine them. An example of this could be looking at how
designs perform during a given epoch. This epoch emphasize helps answer questions such as "Can we
find 'good value' designs?", "What attributes dominate decision making?", and "Are there lower cost
solutions?" The widgets for epoch-centric analysis focus on selecting a subset of epochs and allowing
users to analyze that particular set. The widgets in epoch-centric analysis include:
*
Epoch Filter - Find the set of epochs that obey a user-specified group of logical
statements about their context.
*
Epoch Knobs - See the defining context and preference variables of an epoch, and
easily modify them to find similar epochs.
*
Tradespace Viewer -The standard tradespace view shows all valid design alternatives for
a specified epoch as points on a scatterplot graph. The x, y, z, color, and size axes are all
customizable to display different variables in the database. Hotkeys are available to snap
axes to the benefit/cost view of different stakeholders. The scatterplot has pan, zoom,
rotate, and group brush tools, as well as the ability to right-click design points to bring up
a context menu with information about that design and/or save it as a favorite.
*
Context Space Viewer - Create a grid of scatterplots and histograms showing the
enumeration scheme and completeness of the context space.
*
Carpet Plot - Create a grid of scatterplots showing the effect of each design variable (on
35
the x-axis) against each performance attribute (on the y-axis). This view is useful for
verifying intended variable interactions in the models or uncovering unexpected
interaction.
*
Preference Explorer - View the specified preferences of each active decision maker in a
specified epoch. This includes capability for all of the different value models included in
IVTea (MAU, AHP, CBA, and MOE), each with an accompanying interface. The preferences
can be modified and stored locally, shared among the other widgets.
*
Pareto - Specify objective sets and find the designs which are Pareto efficient across them
for a given epoch. Objective sets can include any number of objectives greater than one.
Pareto sets can be modified with allowances for fuzziness, can be compared to find jointand compromise-efficient designs, and can be easily saved as favorites.
*
DV Streaks - Shows a standard tradespace view but adds the ability to draw 'streaks'
between designs that are the same except for a single, specified design variable. Streaks
can be applied to manually or to favorites, and are customizable. This view shows the
sensitivity of designs to perturbations in their variables.
2.2.3 Era-Centric Analysis
Era-centric analysis focuses on examining the tradespace data with a focus on a creating and viewing
specific eras. In doing so, users are able to analyze eras they have deemed particularly likely or
important through interactive tradespace exploration and examine them. An example of this could be
how designs perform at each point in a particular era. This era emphasis helps answer questions such
as "How does my system respond to time and change?" The widgets for era-centric analysis focus on
selecting a subset of eras and allows users to analyze that particular set. The widgets in era-centric
analysis include:
Era Constructor - Build an era (or eras) as a sequence of epochs for use in other
widgets.
*
Era Viewer - Plot the tradespaces of each epoch in an era side-by-side on consistent
axes.
*
Morph - Animate the trajectories of each design across the epochs in an era. Allows
playback, looping, and frame-by-frame stepping in addition to the standard tradespace
visualization options. The animation is particularly effective for helping users track both
individual designs and overarching trends.
2.2.4 Management
Management widgets focus on organizing data across the different categories of analyses.
*
Favorites Manager - This widget keeps track of all designs and epochs locally saved as
favorites, and allows for manual entry of new favorites. Favorites can also be saved as
batches. Favorites have plotting options that enable other widgets to display them with a
*
consistent, customizable marker (size, shape, and color).
Notes - A text entry field for keeping track of notes during the session. Notes are
saved for the current session even if the widget is closed, and can be permanently
36
saved as a text file.
o
Summary Dash - Presents an overview of the status of the currently connected database.
Includes totals for evaluated designs, epochs, preference sets, and others, and can be
clicked to display more detail about individual database tables. A diagram view also offers
a visual representation of the relationships between the tables.
o
Responsive System Comparison - A workflow outline of MIT SEAri's RSC method, which
can provide guidance for new users on the use of tradespace exploration and the
relevant widgets for each step.
o
DM Creator - Allows the insertion of new decision makers to the active database (if
appropriate permission is available). New DMs are assigned to new epochs that
replicate existing epochs, but with new preferences.
o
Preference Creator - Allows the insertion of new preference sets to the active database
(if appropriate permission is available). This supports all four value models and allows
full customization of their parameters.
2.3 IVTea Suite Walkthrough (Use Case 1: Whidbey-Island Class Ship)
In order to get an understanding of how IVTea Suite guides users through exploring a tradespace,
some demonstration of the tool is necessary. In the first example we will be attempting to explore
the tradespace of a Whidbey-Island Class Ship from the perspective of a Military Officer (1) who
prefers ships with high endurance that can operate for extended periods. The design variables for
the possible ship designs, the performance attributes, stakeholders/decision-makers, and epochs all
of which were derived from the process described in Section 1.2.1 can be seen in Table 5.
Scenario:
You are a military officer, trying to select the design for the next generation of Whidbey-Island Class
Ship
Goal:
Utilize tradespace exploration to select the best next generation ship
Table 5 Whidbey-island Dataset Overview
Dataset Size Summary
Designs (20 Vars): 1000
Contexts (1 Var): 1
Epochs: 3
Attributes: 5
Decision Makers: 2
Preference Sets: 3
Design x Epoch pairs: 3000
37
Epoch Space
Epoch 1
Epoch 2
Epoch 3
Preference Set: Decision Maker->Military Officer(1) - Has a preference for ships
with high endurance that are able to operate for extended periods Context: The
Whidbey Island class ship has primary missions of Amphibious Warfare, Mobility,
Command and Control, and Anti-Air Warfare. It is also designed to support Special
Warfare, Fleet Support Operations (refueling other ships), non- combatant
operations, Ocean Surveillance and Electronic Warfare.
Preference Set: Decision Maker ->Military Officer(2) - Prefers ships
with high
weight allowance and speed over ships with high endurance
Preference Set 1: Decision Maker ->Military Officer(1) - Has a preference for ships
with high endurance that are able to operate for extended periods
Preference Set 2: Decision Maker ->Government Legislator - Has a preference
for very large ships in order to appease new legislation for the Air Force
Design Variables
Min*
Max*
CubicSpaceCargo
26745
3.64E+04
ElectricSLMargin
1.38E-04
0.2881
HullBeamMeters
23.4801
37.2421
JP5CapacityGallons
4.64E+04
333496
MainEngineType
1
5
MilTailored
3
4
NumberAirAssets
1
5
NumberMainEngines
2
4
NumberShafts
2
2
NumberTroopDetachments
14
54
NumberTroopOEMS
330
744
NumberTroopOfficers
23
87
PropulsionType
1
1
VehicleUsableSurface
6.63E+03
19497
VerticalCenterMargin
0.0559
0.1526
WeightFactor323
0.0365
0.2103
WeightFactor324
0.7875
1.246
WeightFactor332
0.0031
0.6397
-0.0715
0.0791
WeightFactor347
WeightMargin
*
[0.0741
0.1905
Units were not provided by the client
Attributes
Ship length
Endurance (Range)
Speed
Weight Allowance
Lifecycle/Acquisition Cost
38
To begin we will start IVTea Suite with the Whidbey-Island SQL dataset opened for our session which
contains a set of designs created by enumerating the design variables listed in Table 5. We will select
to evaluate our designs from the perspective of Epoch 1 which contains Military Officer(1). After we
have loaded in our database, the primary user interface, the Dashboard, will be displayed. The
Dashboard contains selectable icons that launch widgets as mentioned in Section 2.2. To begin with
we will ask ourselves the first question of the questions based approach:
Can we find 'good value' designs?
To evaluate this we, will first bring up. a 2d scatterplot of our tradespace with the x axis being an
estimated lifecycle cost ($) and the y axis being our Military Officer's Multi-Attribute Utility function
(described in Section 1.2.1). We can launch exactly this plot by the clicking the Tradespace Viewer
widget from the Dashboard and selecting 'Benefit/Cost'->'Decision Maker'->'Military Officer' (Figure
15, 1). The tradespace visualization this generates can be seen on Figure 14. At first glance of the
tradespace we can see that there appear to be limitations imposed by the utility functions on the
feasible designs, which result in us only able to view 19% of the tradespace (the feasible designs)
(Figure 15, 2). In viewing our tradespace of feasible designs we can see that for our Military Officer's
preferences, there appears to be a 'knee' in the graph for which the utility and cost are strongly
traded. We will select 2 designs (in this case design 109, and design 192) that appear here and add
them to our set of Favorites. (Figure 15, 3). We do this in order to evaluate and gain intuition as to
what design variables and attributes are causing these designs to be evaluated as 'good' and what
tradeoffs they might have relative to each other.
39
Fil
'Optio
Hel
Fam
0.6-
P=M
I
El. Du: NaN
el, Wavo&eE
1
1
1
N -
190 (i9-0G
'I
2
0.55-
0.5-
0.45-
0.4-
-
0.35
3
-
0.3
0.25-
0.2-
0.15-
8.5
9.5
Y-Axh
X-Axb
samiecycoesE)
9
VIjUAU(MiAK
ty fcer Prof 1)
10
S(ifecycleCostEst)
CoVcr-Axix
11
10.5
11.5
x 10
SV-Axis
Z-Axis
V SFl
Figure 15 .Tradespace Viewer with selected designs
We can compare these designs by launching the 'Comparison Tool' widget which brings up a table
view of each of our designs which can be seen in Figure 16.
40
File
Option
H*l
Famon: a1. Epoch EL. Dw NWN
1
2
27691000
10190600000
(
274120000
1046390000
32971.0
0.047614
3296.1
29.6702
1
287737
29.0064
2
19690
2
4
4
4
0.14429
4
2
31
S"6
rpnmn)
M0)
dVPr0PUftbtTyp)
4
2
42
177
76
1
17097.7
8.07916"
1
12760.2
0.00026
V
0.041777
0.092101
1.039
v(W6dF&CkN324)
dV(W8dFSCW2)
f(WA
4)
1
-0.014432
0.061720
0."74
0.19SI
0.2211S
0.007629
0.22427
OAisa99
9(ehusr)
pONsTBeamst)
rSIa*)
*S10EurAn
0.92797
0.076019
-0.16314
g)
96S.02
4616.72
197.18
XShoLergM)
109.935
22.s0S3
21-9474
A(MryOffcw,
Prof 1)
3"N
AMa
3@ky0.69
may
MUdNary Ofscer, X(Servfteuww...
w(tmlry Offter, x(ShW:EruAra-..
u(&NWaryOrar..x(Sh0Sudgk)
Add To C"Npr
mant
8.061397
0.1122
Iftu
0.91029
0.81604
As 0.ekw
Remove F... Comsermon
Set As BoasMe
Remove From CofWarison
Add To
Add To Fevorbs
Set
fovorlos
Figure 16. Comparison Tool comparing selected designs
As we have found some 'good value' designs, this brings us to our next question:
Are there more "good value" designs?
In order to further explore the tradespace and evaluate other potentially good designs we will utilize
the 'Pareto' widget whose interface be seen in Figure 17. The 'Pareto' widget allows users to calculate
Pareto efficient designs for a specified objective set. In this case our objective set will be a
minimization on lifecycle cost ($) and a maximization on our Military Officers Multi-Attribute Utility
function (Figure 17,1) then we will save the set (Figure 17, 2). After selecting 'Calculate' (Figure 17, 3)
the set of designs that meet the input criterion will be displayed. We will choose to add this group of
designs to our set of Favorites and alter it to have a different visual marker so we can tell it apart
(Figure 17, 4).
41
....
.....
-- -
FV
Opbio:s
-
..
..
........
........
.....
......
....
. ......
HE E
*VII
Fwo~ .1, Epodi: El. 9m NaN
2
Pareto Set
Objectives:
2
21
5
109
3
4
2.
Fuzzy Pareto Set
5
MAUMeIWary Ofem, Pref
1)
195
6
a
i
9
10
446
4
582
11
13
aMs
82
14
89
12
S("fecycleCostEst)
MAUiOMiy MA1W, Pet I
3
Summary:
Valr
PartoSet
4
SaWutoFaeVOM.
>
ANad etto Favates
0
-
uz
Figure 17. Pareto Widget displaying the pareto optimal designs for the selected epoch
Now previously opened tradespace view immediately populates with our updated favorite designs,
showing our old selected designs as well as our Pareto designs as seen in Figure 18.
42
File
0
Opfions
N 2
Hdp
Benit/Cast
X 11Fanib-- tEpmdtEl.,OwMMa
V
I
Ir
I
I
K J1 190 (19.00C~
0.9-
0.8-
0.7-
A
A
A.
0.6
A
A
A
A
0.4-
0.3-
A
0.2-
0.1-
j
0.9
1.1
1
1.2
1.3
1.4
(ifacycleCostEst)
X-Aist
Size-AXis
Y-Axif
fMUa9fi.
wt1)
X10
vsmi
1.5
10
Z-AxIS
Vr
Figure 18. Tradespace Viewer with pareto front displayed as red triangles
We can also use the 'Comparison Tool' widget once more on our Pareto designs to gain insight as to
what tradeoffs are being made across the Pareto front. As we have found some more 'good value'
designs, this brings us to our next question:
What are their strengths and weaknesses / What attributes dominate decision making?
For this, we will utilize IVTea Suite's ability to view higher dimensions of data simultaneously by
applying different color weightings to the shown designs in the tradespace view. This is done by
selecting the color drop down from the tradespace view and selecting each design variable. In doing
43
.....
..........
so, we are learning how the system is valuing different design variables and what are each designs
strengths and weaknesses. One design variable of particular interest seen in doing this analysis is
'engine type'. As seen in Figure 19, the noticeable tiered distribution of engine types suggests engine
type is highly valued design variable for designs for this epoch.
Re
SaidtCOs
H4P
OPUen
A
7
[~aqI~ tbMhRDqMiN
*1, Po
4 i 1% a A 1
1r
-190
(19.000
Et ow no
I
I
I
.
I
0.9-
-
4'.5
0.84
0.7-
A
3.5
a
&
A
0.6-
AA
3
0.2-
A
A
0.42.5
A
03 -
A
2
-
0.2
1.5
0.1-
8
II
I
0.9
1
Y .x
5~b~)
13
1.2
1.1
5(LifecycleCostEst)
)
~L
14
1.5
I
X10*
Z-Axs
CabrA
.4
V
Figure 19. Tradespace Viewer with MainEngineType as the color-axis
We can also gain intuition about which attributes are dominating decision making by opening the
'Preference Explorer' widget which allows us to modify and update in real-time our Military Officer's
preferences. In the 'Preference Explorer' widget in Figure 20 we can see that there are four attributes
that we are focused on which are ShipEnduranceRange, ShipLength, ShipSustainedSpeedKTS, and
ServiceLifeWeightAllowance. In Figure 20 we can see that there are four panes, each representing
one of the aforementioned attributes of the Military Officer's preferences. The 'k' value located near
the top-right of each pane represents the weighting for the specified attribute. The top graph of each
pane shows the current utility function being applied to the attribute designated in the pane. Users
can freely modify the shape of this function using the buttons below the this graph and get immediate
44
feedback on the results of that change as all linked visualization immediately update with new values.
The middle graph shows the current valid tradespace evaluated by the single attribute utility specified
by the top graph. From this users can see how designs of interest are valued for each attribute,
gaining insight into the strengths and weaknesses of their designs. The bottom graph demonstrates
the entire tradespace including invalid designs, with the minimum and maximum values for the utility
function shown. Users are able to adjust these cutoffs on the fly, allowing them to immediately see
what a change in policy or specifications could mean for their tradespace.
45
Fr,
02io
W.
-1-
------------
19
24r
425.
23
M.
u4. too M)
(L99.00V
5
5
30E
3 - / .40'3 Of
1.1
1.2
13
I'.
1
(9.3
)
.-10*4
(1AM" . (110:.St N361
2
10
1.5
3
I
1
3
.
186
-44
t'.
91jg1
AA
A
OA--04
0.2-
6
I
.1
04.
All-
12
1;
.
'
-
0; -
1.2
131.4
1.5
"1
12
1,3
1150509
1112
1314
1.59 1.1
1.A9
-.504l 009
Sal"M
III:f
13hcjdO.CaESQ
2MU,
10
X~-,dGIEAI
Iwo (190.00%)
zso
X 1CyC1)C81E8
X' 1C
S0. 1(190.1"3)
CI
0
-.
e114-4-.
.3
1
11
12 13
'A
'
15-
rcc
-
9
1
Of
OS
'.1
12
13
1
15
112or
7
08 09
1
'
12
13
14 1.5
17
00
0.9
1
11
1.2 13
1.4
15
Figure 20. Preference Explorer where each pane corresponds a preference of the Military officer. The top graph shows the current utility function. The middle graph shows
the tradespace evaluated with the selected single attribute utility function. The bottom graph shows the entire tradespace which might not be valid due to the constraints
imposed by the single attribute utility function, represented as bars on the graph.
46
From Figure 20 we can easily see which attributes are more heavily weighted, in our case the weighting
are 0.5, 0.2, 0.2, and 0.1 respectively. The more heavily weighted attributes will drag solutions with
those attributes toward the Pareto front. The low weighted attributes are opportunities for low cost
designs to make up value. It is important to note that the tradeoff is not just benefit/cost but these
different measures of benefit. As we have identified the strengths and weaknesses of designs and what
attributes were dominated decision making we are led to our next question:
What is constraining the tradespace?
From before, we know that due to our current policies and perception of value, we are only capable of
observing 19% of all of the available designs options. In order to understand why this is the case we can
observe our 'Preference Explorer', specifically, the current requirements we have placed on the
attributes Endurance and Weight. If for example we lower our requirements to instead be 8000km
endurance and a 0.1 weight allowance from within the Preference Explorer (Figure 21), we immediately
see that update in the Pareto Front of our tradespace view. This addition of lower cost solutions can be
seen in Figure 22, 1.
47
File
Opfions
Hdp
1,EpochE1,DumNaN
Favoete:
choose
Decision MMr
77
MyOfflcsr
Presrenc
j
Lb
x
a
v
1
N
nejt
on:
ePiviorence
tryOfficer
0.5
N
670 (67.00
-
670 (67
001
(36.
-'361
x0.6 -
0.5
:
0
0.6
0.8
-
0.4
0.2OL
0
5000
j
15000
10000
-V2
0
0.
0.4
Attribute
Attribute
AdN..
=AddfNo..
1
0.8A-
67L (67.00%1
A
-
0.2-
0.8
1
1.2
1.4
5 0.40.28
1.6
1.4
1.2
1
-
0.8 -
15000
1.6
x10
$(LfecycleCostEst)
x1010
$(LifecycleCostEst)
0.4
-0
-
10,0
361 (36.10%
0.6
5 0.4-
.6
-
-
0.6 -
N
-
0.8-
-
0.2
0
-
V0'mr, x(S rviceuItw.g... v oc
I cost
choose sone
0:8
6
1
1.2
$(UfecycleCostEst)
1.4
1 6
1.6
x
0:8
Remelt to DSet
_PRovmt
1
1.2
$(LifecycleCostEst)
1.4
x 10
1.6
Revert
Color-Axia
x-Axis
S(UfecycleCostEst)
--So-cI
--
Figure 21. Preference Explorer demonstrating how the lowering of the endurance requirement to 8000km affects the
tradespace
As we have identified some attribute restrictions that were previously constraining the solution, we can
now move onto the next question:
Are there lower cost solutions?
48
Fe
Optiom
Help
BendW/Cmdt
e1.E
Ift:vo
Fd9
V
h:EDuN*
1
,
R W 322 (32.20*Y
0.9-
0.8-
A
0.7A
A
A
A
A
A
*
0.5-
'
.4
A
0.3f-
0.2-
[ ]
0.1-
1
0.8
0.9
1.2
1.3
1.4
1.5
$(UfecycleCostEst)
X-Axis
s(ufecycsalEst
1.1
1
Y-AxIB
V
MAUMw OfYcerPf)
x 10
s
Z-Axis
C
I-W-ife
So
Mvone
59iftwebcomest)
Figure 22 Tradespace Viewer showing the addition of lower cost solutions to the tradespace as a result of altered
requirements
From our dynamically updated Tradespace Viewer, we can indeed tell that altering our constraints
brought a lot of previously invalid designs into our new Pareto Front. This is especially apparent when
looking at our new yield of 32%, up from 19%. Now that we know there are potential lower cost
solutions we can continue to interrogate our system with the next question:
To which parameters is value most sensitive?
49
V
The simplest way to evaluate parameters to which value is most sensitive is to utilize the 'Carpet Plot'
widget. The Carpet Plot widget displays a graph of the preference/benefit attributes vs the design
variables. From this we can see that there appears to be a strong correlation between many of the
important benefit attributes and the design variable hull beam meters as seen in Figure 23. This gives us
insight into the system as it seems that the length of the hull beam drives a lot of the value of the
system for our Whidbey-Island class ship.
"wm*
-0
No4
U)
mom
L
-L
wA
z
w
~Li
In
IT.
DESIGN VARIABLES
Figure 23. Carpet Plot showing correlations between designs variables and attributes
As we have identified the parameters to which value is most sensitive, we can now move onto the next
question:
What if my needs change?
In order to evaluate changing needs we will need to compare two different needs contexts. In our
loaded navy dataset we have 2 such different contexts which are resembled in IVTea Suite as epoch 1
and epoch2. To view the different weighting for each, we can select each them from the Preference
Explorer drop down and compare. Now we will calculate the Pareto Front for our new set of needs by
doing a process identical to before only this time within the context of epoch 2. We can view both of
50
these tradespaces side by side by using the Era Viewer. This can be seen in Figure 24:
Fi
Opdn
Hp
kaNsn~
I,
1n
1 r-
19d (meem~)
0.9-
0.9-
0.8-
0.8-
0.7-
I
V
0.7
-
A
n - 19d (1.o%
V
A
06-
F;'.
A
-
LA
0.6-
0.2-
*A
0.6-
AA
A
0.3-
COWr
vi
0.2-
02-
-
0.1
Era:
.8
NVesra,
1
Epo h: 3I(1) 1
1.4
1.2
S(LihcycleCostEst)
r:
Era: SmwEra, Epo h: E2(2), bur:
1.q
1.6
0
.8
X a'1
1
1.A
1.2
"(LiIecyc9CostEst)
1.
10
1.6
to
Figure 24. Era Viewer: Pareto Front For Each Epoch
From this we can see that in the case of changing needs, there are some designs that are very good in
both contexts and some that appear to be good in only one context. This gives us intuition about what
solutions might be robust and valuable in many contexts, good solutions to choose if you're needs are
capable of changing. This brings us to the next question:
What if I also need to satisfy another decision maker?
In the case of satisfying another decision maker, we will have to load in our epoch 3 which has 2 decision
makers in the single context. We can, as we did before, analyze these two user preferences using the
Pareto tool and view their tradespaces side by side. lVTea Suite is also capable of calculating Pareto sets
for multi-stakeholder problems as well, looking for joint and compromise solutions simultaneously. This
51
can be seen in Figure 25.
IRe
30uE
AN t0w0"
NaN
Sa.TbsS
i
~
2
3
4
5
5
7
)
MAU(Mfwy Offim, Pref 1)
3(qdMCoE"
Summary:
$G2
~ts~S
-
-
MAU(vMRry.OftWr, cPrf1)R op?
S(Uyce~t)U
_
AN
0%PAMRMO
Sdl
5g2
AN
Joint
_s
Ci;p
1
21
So1
1a
1
139
142
156
2
2
8
182
1
9
10
11
195
207
346
1
2
2
1
1
1
1
Joint Part Set
1000
TOTAL
VALND
Set1
L!k~~fordsf
1
Objective Sets:
Cwmnmi-e Degig-t
190
1
156
1000
190
2
413
3
4
718
965
14
9
18
0
4
_
COMPM?
0
Cala
wFuzzy
Addait IFavoiae
Addodlofavaon
Figure 25 Pareto Tool calculating join and compromise pareto sets for multiple stakeholders
By using lVTea Suite to explore the Whidbey-island Class Ship dataset, we were able to gain various
insights into the complex system that controls the lifecycle cost and value for such large scale ships as
well as compare designs across numerous factors. Through the questions based approach we
discovered that ship endurance and ship length are the attributes that dominate decision making for
Epoch 1. Along with this, it was seen that engine type appeared to be the most dominant design
variable for Epoch 1 and hull beam size to be the most sensitive design variable, having the strongest
correlations on all attributes overall. We also learned that a number of viable lower cost solutions
become available if we are willing to alter our requirements on endurance and weight allowance. In
addition to finding that there are designs that offer us reasonably high value for low cost (the knee of
the Pareto front) we were also able to identify designs that maintain value in the face of changing
contexts (viewing how the Pareto front moves in possible Eras and identifying designs that remain
52
near it). We also learned that in the case that we had to satisfy another potential decision maker, we
could still identify compromise solution that did relatively well for both decision makers despite there
being no obvious joint solutions that resided on the Pareto front for both.
2.4 IVTea Suite Walkthrough (Use Case 2: SpaceTug)
Our second example deals with constructing a Space Tug vehicle capable of performing a variety of
space missions. In this example we will try to reason about finding 'good' designs in the context of
uncertainty and dynamic changes. A detailed overview of the data in the Space Tug dataset can be seen
in Table 13:
Scenario:
You are the owner of a space tug rental company, providing the services of your system to customers
with varying preferences.
Goals:
Meet customer demands as well as possible, for as long as possible - satisfied contracts provide revenue
based on duration and utility.
Table 6 SpaceTug Dataset Overview
Dataset Size Summary
Designs (4 Vars): 384
Contexts (2 Vars): 2
Epochs: 18
Attributes: 3
Decision Makers: 6
Preference Sets 9
Design x Epoch pairs: 6912
Design Variables
PropType
Min
1
Max
4
PayloadMass
PropMass
DesignforChange
300
30
0
5000
50000
2
Units
1->4 (bipropellant, cryogenic, electric,
nuclear)
kg
kg
DFC level is a switch intended to
model a conscious effort to design for
ease of redesign/change
Reward: additional and/or cheaper
change mechanisms
Penalty: additional dry mass, resulting
in higher cost and lower delta V
53
Attributes
Capability
DeltaV
ResponseTime
Preference Sets
Name
Base Case
Tech Demo
Geo Rescue
DeploymentAssistance
Refueler Maintainer
Garbage Collector
Military All-Purpose
Satellite Saboteur
capability weighting
0.3
0.7
0.2
0.6
0.75
0.2
0.4
0.2
deltaV weighting
0.6
0.2
0.6
0.1
0.2
0.75
0.4
0.6
Rule
Engine Swap
Fuel Tank Swap
Engine Swap (Reduced
Cost)
Fuel Tank Swap (Reduce
Cost)
Effect
Biprop<->Cryo
Change propellant mass
Biprop<->Cryo
responseTime weighting
0.1
0.1
0.2
0.3
0.05
0.05
0.2
0.2
Contexts
Present
Future
Change Mechanisms
#
1
2
3
4
Capacity
5
Change
6
Refuel in Orbit
DFClevel
0
0
1 or 2
Change propellant mass
1 or 2
1
Change capability
1
Change propellant mass
(no redesign)
2
or 2
To begin we will select an epoch of particular interest to our user to our user as it uses the customer
benefit function AHP in this model vs MAU in the navy case. Our first question for this investigation is:
What about uncertainty in the system?
In order to investigate the uncertainty in the system, we will use the 'Single Design Tradespace' widget
on design 22 which was performing well in our epoch of interest, epoch 17. The Single Design
Sradespace widget aIlow us to view how a single designs performs across all possible scenarios input
into our database across any 2 dimensions of the design. Note that for this case there are only 2
contexts, so there are only two points in the plot. In comparing Cost($) across all of the attributes, it
appears that across the 2 epochs for which this design is valid, the attributes all stay the same except for
delta V which, relative to our epoch 17, actually increases in the other allowable epoch. The
visualization for this can be seen in Figure 26.
54
Options
File
1
\
Help
S4
vi
Favorte d2 Design#22
X 10
4
345
.
'
N -
18
(100.00%)
-
3.4
3.35-
3.3P-
3.25-
3.2-
A
3.15-
272
272.5
273.5
273
274
$(Cost)
Y-Axis
X-Axis
$(Cost)
v
x(DefV)
Size-Axis
Color-Axis
V
Y iixed
iSold
Z-AxiB
V tNone
Figure 26. Single Design Tradespace Widget displaying how a selected design performs across all possible epochs
To continue our investigation into uncertainty and look at designs that perform well in multiple epochs,
we will utilize the 'Fuzzy Pareto Number' widget. The Fuzzy Pareto Number widget calculates for a given
design, what is that design's Fuzzy Pareto Number across all available epochs in the database. Fuzzy
Pareto Number is a metric used for evaluating how close the Pareto Front a design is, with a '0' meaning
that the design is on the Pareto Front for that epoch and 100 meaning that the solution is completely
dominated by all other solutions for that epoch. For the Fuzzy Pareto Number tool a value greater than
100 means that the design is infeasible in that epoch. As is shown in Figure 23, our original design 22
which does well in our base epoch 17 actually becomes infeasible in 6 of the remaining 16 epochs.
55
File
Options
Help
2) A C
Fwait.d22,Desig #2
In
CL
Ei
z
2
20
Select a Decision Maker
Potwntl
Customer
40
60
FPN for Design # 22
80
100
120
16 Epochs Evaluated
1 Epochs Unevaluated
Gr~h HOODIM
Figure 27. Fuzzy Pareto Number Widget for design 22
Now let us select another design, design 222 and evaluate it to see if it potentially performs better
across all potential epochs. In evaluating design 222's Fuzzy Pareto Numbers across all epoch it appears
that might not be as good in any individual context, it does relatively well in all of them and is never
infeasible.
56
Options
'N
Help
4
Favnt d22. Desig #22
_
I
I
I
1
__
1
__
"
File
w
E
Z
20
40
80
60
FPN for Design #222
Select a Decision Maker
100
120
16 Epochs Evaluated
Orah
PotentIal Customer
WW
r
]
I Epochs Unevaluated
Figure 28. Fuzzy Pareto Number Widget for design 222
This means that design 222 might be a better design choice if we perceive these other scenarios as likely
as it will sustain value throughout the shifting scenarios. This leads us to our next question:
What about time and change?
In order to evaluate how time and change might affect our tradespace of SpaceTug designs, we will first
construct a potential scenario that we believe to be likely. We will do this by utilizing the 'Era
Constructor' widget and create an era containing the epochs: base case now, base case future, and
military all-purpose future. We will then use the 'Era Viewer' widget with the x-axis set to 'Potential
57
Customer: Cost' and the Y-axis set to 'Potential Customer: Benefit' to show the effects that each epoch's
context and needs has on the selected designs. With this view we can color the designs by design
variables to gain insight into what is exactly is causing the shifting of the tradespace along the cost and
utility axes. We can see that there is minimal reordering in the context change with some designs
improving slightly but it is the needs change that drives the biggest difference. By coloring by propType
we can see that this change in needs has almost completely eliminated propType = 1 (biprop) designs.
This visualization can be seen in Figure 29:
I
I
02
0a
W
low
Ism
MW
ww)
-M
MW
1
Figure 29. Era Viewer
Another valuable way of viewing this information is with the 'Morph' widget. The Morph widget
demonstrates moving performance, interpolating the three graphs shown in the Era Viewer into a video
that can give users insight into exactly how designs moved from epoch to epoch and why that might be
the case. Using the Morph tool it is very easy to see that a large chunk of designs fall out and this
appears to be primarily due to. A screenshot of the Morph tool's generated movie can be seen in Figure
30.
58
Ak
V i
Hlp
Opbw
Opbis
H*
i
-
,m;;04,
PLEASE WAIT WHILE THE MOVIE
DATA IS CREATED
Era NelE
Epoch
Ba,
C7as
-
108
06
0.6
0A
20.4
2 0
-6
*%
2 0.2
6-
1.6
0
0
X-A0
W0
160 2M0 2W 30M
Poicustomer Cas
100
3M0
t
ebnor- 130161~0
00
1000
1
1500 2000 2600
Pat"n
Qosoomar C
u~sespa"0 *goion?
2
cowv
-
0
'MN1*
0 LOW 9W~sAMW
YA~M
AS 02
4M0
[A tos. eeaes"
PON
two 11101
Un0
Ag" csow. PosnoitiCusomr-.
Femeet
20
cow:
Choose a votifte
R" W
sin.
Cho"* wait"
2
1
Re
a
Opfions
Hep
PLEASE WAIT WHILE THE MOVIE DATA IS CREATED
0.6-11
11
0.6.
0.21..
a 5020-
0
60
10M
10
200 2W0
3M0
PaoI.6I Customer C"s
360
40M
]Luse Pood&nbsm
flLopowsuk'
Y.Axis
POOR" Custom,
pw"
c~
from""g
20
Color40
&qWT"W
hof a.oIwlsjok
3
Figure 30. Morph Widget showing 3 frames of animation in which the tradespace shifts due an epoch shift
From the morph tool, we can easily identify the set of designs with falling utility which reach the invalid,
0 utility area in epoch 3. Another way of designs being able to respond to changes in context and value
over time by being able to change. The 'Filtered Outdegree' Widget is able to read transition tables and
find highly changeable designs by examining how many designs each design is capable of transitioning
for specified time and costs amounts. We will put the sliders at the maximum values for both time and
money ($2751x1OA6, and 7 months). The visualization for the widget can be seen in Figure 31.
59
File
Options
Help
E. N.rg
a Ee
pot Base Case, Now( Dr1
Add New
base
Fave F00
Readl
Oesign 8
Add New Custom FOD
lld
42 43 44 45 46 47 48 49
85 86 87 88 89 90 91 92
125 120 127 128 129 130 131
163 164 1SS 106 167 166 169
201 202 203 204 205 206 207
239 240 241 242 243 244 245
D
41
84
124
162
200
238
I
01
21
41
5 51 52 53 54 55 56 57
93 94 95 96 97 96 99 100
132 133 134 135 136 137 138
170 171 172 173 174 175 176
206 209 210 211 212 213 214
246 247 248 249 250 251 252
58
101
139
177
215
253
92
12
81 1
59 60 61 62 63 64 65
102 103 104 105 106 107
140 141 142 143 144 145
178 179 180 181 182 183
216 217 218 219 220 221
254 255 256 257 258 259
93
52
72
66 67 66 69 70 71 72 73
JOB 109 110111 112 113 114
146 147 148 149 150 iS1 152
184 185 186 187 188 189 190
222 223 224 225 226 227 228
260 261 262 263 264 265 266
32
I
N
'
-
33
13
74 75 76 77 76 79 80 81
115 116 117 118 119 120 121
153 154 155 156 157 158 159
191 192 193 194 195 196 197
229 230 231 232 233 234 235
267 268 269 270 271 272 273
82 83
122 123
10 161
196 199
236 237
274 275
V
394 (100.00%)
18
0.9-
-
0.8
16
-
0.7
0
0.6-
14
E
t
0.4U.
0D
04-
I
If:t:
15
V
i
12 u
0.310
0.20.1-
B
OL
0
I
I
I
I
I
1
1
500
1000
1500
2000
$(Cost)
2500
3000
3500
S(Cost)
v
$M
MAU(Potential Customer, Base Case)
Z-Ax
Size-Axis
Y-AxIS
X-Axis
4000
V
None
Faced
2751
n
scedeaemonifts
<1
7
Figure 31. Filtered Outdegree Widget
From this we can evaluate how changeable designs are and leverage that information to select designs
that react well to time and change. lVTea Suite also has a Filtered Outdegree Function Widget which
displays the filtered outdegree of a design as a function of a cost parameter (in our case $ in Millions
and time in Months), A rendering of this visualization for designs 22(red) and 222 (blue) can be seen in
Figure 32.
60
re Option
Hep
~
SM
so
ra Nftvra Epocdt Bse Case Now(1) Our 1.0
Schedule(months)
V
base
Add New Fave Strbak
I
I
I
I
0009n
Add New Custom S""a
d422
Blue
7-
Y Sold
Oatbe
Y Thin
V
6-
CD 50
4U-
3-
2-
0
0
200
400
600
800
1000
SM
I
I
1200
1400
1
1600
1
1800
2000
Lming Cost Type
so'
IV
SM
Figure 32. Filtered Outdegree Function Widget
This leads us to our next question:
Is the problem dominated by underlying physical, technological, or budgetary effects or limits?
To evaluate if the problem is dominated by underlying physical, technological, or budgetary effects or
limits we can use the Design Variable (DV) Streaks widget. For a list of selected designs and a single
design variable, DV Streaks draws a line through the designs that differ from the primary design by only
the design variable selected. We will select our streak to be across the propMass variable for our
selected designs of 22 and 222. The visualization for this can be seen in Figure 33:
61
File
Options
Help
EraNew&.a
Epoctt ae Case Now(1). Dur1.0
bm
J
Add New
11
,
N
-
384
Foee Sreek
(180.00
Add Ume Cusfto
09-
Pue
v Sal
vBod
V Sol
vBom
v]
-
0.8
Wteak
0.7-
0.6-
E
0.5-
04-
0.3-
0.2-
0.1-
0I
0
X-Axs
(Coet)
500
I
I
1000
1500
I
2500
1
3000
1
3500
4000
$(Cost)
Colr-Axis
Y-AxiS
V MAU(Potential Customer
2000
SoMl
Streak
Z-AxIB
Size-Axis
V Feced
V
None
V dv(PropMeSe)
V
Figure 33. Design Variable Streaks Widget
From this, we can see that low mass designs like design 22 have a sudden drop of point in value-added
for more fuel (the knee of the blue line). This leads us to our next question:
How can we find more (good) designs to include in the tradespace?
In order to find more (good) designs to include in the tradespace for we can use the Design Space
Viewer to potential "holes" in our original enumeration. We can see the visualization in Figure 34.
62
Fe
Options
Help
dv(ProTye)
I
I
I
I
LI-
dOV(PTOPUM)
I.''
dv(Pybadum)
dv(0e5nfOrCh"n
e)
II
dv(DealgntorChenge)
dv(PropMass)
dv(PaybedMss)
dv(PropType)
Figure 34. Design Space Viewer
From this we can see that there where there are gaps in enumeration that we could potentially fill in if
they are areas of interest. One such gap can be seen when looking at the PropMass design variable.
The graphs in Figure 34 show that there were proportionally more low PropMass designs in the
enumeration than high PropMass designs. A potential re-enumeration of the design space with more
high PropMass designs would lead to a more balanced tradespace with potentially more "good" designs
to evaluate.
By using IVTea Suite to explore the Space Tug dataset, we were able to gain various insights into the
complex system that controls the lifecycle cost and value for such large scale shuttles as well as
63
compare designs across numerous factors. Through the questions based approach we discovered that
while some designs appear to well in a static context (design 22), they perform much worse when
evaluated over a dynamic context. Other designs which appear to be dominated in a static context
can actually maintain high value when evaluated through the frame of dynamic contexts (design 222).
We were also able to analyze which designs could change to potentially high value designs in other
epochs with the FOD widget. We also learned about important underlying technological problems, for
example that low mass designs (like design 22) have a sharp drop off point in value-added when
adding more fuel to them. This is not the case for other types of designs in which value keeps
increasing slowly as more fuel is added. We also were able to see that our design enumeration had a
hole in it regarding PropMass which we might want to re-enumerate to provide a more even coverage
for our designs, potentially exposing more good value designs.
2.5 Summary of Current IVTea Suite
Using IVTea suite we were able to reason about many important aspects of the complex systems we
were designing for and properly compare designs on lifecycle aspects that were critical to our end use.
We effectively explored on the order of ^OA4 designs in our SpaceTug dataset and were able to
efficiently and effectively compare their tradeoffs, taking into consideration dynamic contexts. While
this has clear value there are still questions that IVTea Suite cannot answer in its current state: What
about if we wanted to reason about millions of designs? What if we wanted to perform a new
enumeration of our design space in real-time? What if we wanted to reconfigure and re-run our model
which yields attributes? What if we wanted to analyze and interrogate lifecycle properties across
multiple-eras involving 100s of epochs? For questions such as these the current IVTea architecture is
not enough.
64
Chapter 3: Build up To Cloud IVTea
Re-architecting IVTea Suite to allow for larger data sets, faster computation, and real-time data
exploration requires the addition of 3 major components. The first component is the type of database
used. With the rise of many NoSQL databases that specialize in certain contexts, we will evaluate this
dynamic landscape for the best database possible for IVTea Suite when expanding into larger datasets
and real-time computation. The second element is distributed computing infrastructure. With the
growing distributed computing movement and the rise of many cloud hosting options, we will
examine the current options and compare their potential strengths and weaknesses relative to IVTea
Suite. The third component is a distributed computing software framework. Here we will compare
the distributed computing software frameworks that exist, for example MapReduce and consider
what IVTea Suite can best leverage to meet the goals of tradespace analysts.
3.1 Database Comparison: Which is Best for IVTea?
Amazon's SimpleDB was mentioned in Section 1.5.4 as a potential database platform for IVTea to
leverage to allow for conducting Tradespace Exploration on larger data sets. The actual database to use,
however, is not straightforward to choose as there are numerous different tradeoffs and architectures
associated with distributed databases. There are two major categories of databases currently, SQL
(Structured Query Language) databases, and NoSQL databases.
3.1.1 SQL
SQL databases provide a standardized, rigid relational structure in which users must provide a schema
describing their data's type and relationships and adhere to that schema during operation. Any changes
made to the schema must be propagated throughout existing entries in the database through a
database migration. This schema defines a set of relations that defines the attributes and relationships
of all the data to be input meaning that SQL databases are relational databases. The structure of a SQL
database is a series of tables defined by a schema as seen in Table 7. This schema defines one table
'Customers' and defines the type of data to be stored each row of the 'Customers' table. Any data
stored within this SQL database must adhere to the predefined schema.
Table 7 SQL Example Table Schema
Table Name: Customers
CustomerlD
ContactName
Address
City
PostalCode
Country
Nullable
NO
YES
NO
NO
NO
Datatype
VARCHAR
VARCHAR
VARCHAR
INTEGER
VARCHAR
An example of the rows of information stored in the Customer table can be seen in Figure 35:
65
CustomnerID
CustomnerName
ContactNarne
Address
city
PostalCode Country
1
Alfreds Futterkiste
Maria Anders
Obere Str. 57
Berlin
12209
Germany
2
Ana Trujillo Emparedados y
helados
Ana Trujillo
Avda. de
2222
Mexico
D.F.
05021
Mexico
3
Antonio Moreno Taqueria
Antonio Moreno
Mataderos 2312
Mexico
D.F.
05023
Mexico
4
Around the Hom
Thomas Hardy
120 Hanover Sq.
London
WAl
5
Berglunds snabbk6p
Christina
Berglund
Berguvsvigen 8
Luld
S-958 22
la Constituci6n
1DP
UK
Sweden
Figure 35. Example of the Row and Column Storage of a SQL Database ("SQLSyntax", 2015)
Due to the table based, relational nature of SQL, linking one table data abstraction to another typically
requires use of a foreign key or a separate lookup table in order to get the appropriate linked data. That
means for any cross table lookups a table join is necessary which is on the order of O(N+M) for a Hash
Join or O(N*log(N)+M*log(M)) where N and M are the size of the tables being joined depending on if the
tables are properly indexed. This can make traversing long sets of cross table relations slow and
impractical. One benefit of SQL databases include the fact they have been in production for a very long
time and have been thoroughly tested, proving their ACID (Atomicity, Consistency, Isolation, Durability)
properties (Haerder, 1983).
ACID Explained:
*
(Atomicity) when you do something to change a database the change should work or fail as a
whole
*
(Consistency) the database should remain consistent (debated as to what this means)
*
(Isolation) if other things are going on at the same time they shouldn't be able to see things
mid-update
*
(Durability) if the system encounter a failure (hardware or software) the database needs to be
able to pick itself back up; and if it says it finished applying an update, it needs to be certain
For some applications such as e-commerce, the ACID properties are essential as any type of error in any
of these properties can lead to inconsistencies, or lost data, which is not acceptable for these platforms.
For other operations such as a social networking site, these properties not be as important as other
database aspects such as read speed. Another benefit of SQL is the support for complex queries due to
the relational structure of the data provided by the schema. This large set of SQL statements allow for
getting exactly the data required out of the database with very little post-processing and low bandwidth.
The final benefit of SQL is highly customizable indexing. Indexing is explicitly telling the database to
create a B-tree of a specified column such that searching for values on that index will be much faster
than looking through for the value arbitrarily in every row of the database.
66
There is a balance with indexing in SQL databases as indexing adds time on writes to update indexes but
can dramatically speed up reads for certain queries involving the index. The drawbacks of SQL include
being unable to handle unstructured data due to its inflexible typed columns provided by the schema,
currently difficult to scale up to multiple machines due to the nature of table joins which if done on
tables hosted on different machines becomes impractical (this hurts both scale and speed due to
possible sharding opportunities), and SQL databases don't handles graph well. The problem for SQL with
graphs is that to do a graph traversal one must pull the entire graph into memory in a single query, then
manipulate it and store the changes (which is infeasible given the amount of data we hope to use) or
one has to perform huge amounts of joins to traverse the graph one node at a time, which becomes
prohibitively slow. One example of a SQL database is MySQL, which is the database that IVTea Suite
currently uses.
3.1.2 NoSQL
NoSQL is a broad term for databases which do not follow the SQL model of pre-defined schema and as
as such allow unstructured/semi-structured data. Within the group of NoSQL databases there have
become a number of primary architecture types which include key-value stores, document oriented, and
graph databases.
Key-Value
A key-value store is a storage system that stores values indexed by a key. This allows very fast read and
write operations (a simple disk access) which can be used as a kind of nonvolatile cache (i.e. well suited
if you need fast accesses by key to long-lived data). Also key-value stores are very easy to shard by key
as the data has no inter-relations which allows for scalability, fast reads, and high availability. The
disadvantage to this is that users are limited to query by key as the store knows nothing about the
structure of the actual values. This can mean that querying the data can be very costly for storing
structured data as scans of the entire database might be necessary if the key is not known and getting
specific parts of a value or multiple values can require post-processing on the server or client. One
example of a Key-Value store is Amazon's S3(Simple Storage Service). An example of how data stored in
a key value can be seen in Table 8.
67
Table 8 Example of Data Stored in Key-Value Database
Key
Value
s3://aws-user/server-logs
123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET
/pics/wpaper.gif
HTTP/1.0"
200
6248
"http://www.example-site.com/" "Mozilla/4.05 (Macintosh; I;
PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:47 -0400] "GET
Iasctortf/
HTTP/1.0"
200
8130
s3://aws-user/user-name-1
"http://search.netscape.com/Computers/DataFormats/Doc
u ment/Text/RTF" "Mozilla/4.05 (Macintosh; I; PPC)"
.... (continued)
Billy
s3://aws-user/user-input-page
{
"title": "Userinput",
"type": "User",
"properties": {
"firstName": {
"type": "string"
"IastName": {
"type": "string"
"age": {
"description": "Age in years",
"type": "integer",
"minimum": 0
}
"required":["firstName",
"lastName"]
An small example of the type of updates and queries that can be done in a key value store can be seen
in Table 9:
Table 9 Example of Commands Used in Key-Value Database
Command
GET(s3://aws-user/user-name-1)
PUT(s3://aws-user/new-value, "new value")
Response
"Billy
COMMANDOK
Document-Oriented
A document-oriented database extends the key-value model having values stored in a structured format
(a document, hence the name) that the database can understand. For example, a document could be a
blog post and the comments and the tags stored in a denormalized way. Since the data stored is
transparent, the database can do more work (like indexing fields of the document) and you're not
limited to only querying by key. Document-oriented databases allows accessing an entire page's data
with a single query and is well suited for content oriented applications (which is why big sites like
Facebook or Amazon like them). Document-oriented database pros include being able to simplify your
data model and the fact that it is trivial to serialize data to and from objects in the document format.
Document-oriented databases are also typically automatically indexed which can be simpler to upkeep
but will not be nearly as performant as a database with custom, optimized indexes for specific queries.
Also like key-value stores, document-oriented databases are very easy to shard by key as the data has
no inter-relations which allows for scalability, fast reads, and high availability. Document oriented
68
databases also support multi-version-concurrency-control thereby making the writes to documents
really fast. Whereas writes to a relational database could be slow for various reasons such as locks,
transactional support, index updates, and so on. Document-oriented database cons include firstly that
they often do not have the strong ACID compliance to which most SQL databases adhere. What this
means is that in certain situations that can arise, in NoSQL databases, data can at times be lost and also
there are times where processes might miss an insert, lost data, or have relaxed consistency meaning
that if a write than a read occurs it might not be guaranteed that the write is seen as might be expected
depending on the application. Another drawback to Document-oriented databases is that due to there
being no schema, the allowable types of queries is much smaller than a in a SQL database. One example
of a Document-oriented database is MongoDB. It is important to note that the users of document
oriented databases typically choose to store their documents as JSON data structures and that is what
the examples will represent. An example of how a user might store information in MongoDB can be
seen in Table X:
Table 10 Example of JSON Data Stored in MongoDB
Database name: restaurants
"address":{
"street" : "2 Avenue",
"zipcode": "10075",
"building":"1480",
"coord" : [-73.9557413, 40.7720266],
},
"borough" : "Manhattan",
"cuisine" : "Italian",
"grades":[
"date" : ISODate("2014-10-01TOO:00:00Z"),
"grade": "A",
"score" : 11
"date" : ISODate("2014-01-16T00:00:00Z"),
"grade" : "B",
"score" : 17
I,
"name" : "vella",
"restaurantid" : "41704620"
Some example uses of the MongoDB database API are shown below:
Table 11 Example of Queries Done in Mongo DB
Possible Commands
db.restaurants.find( { "borough": "Manhattan" })
db.restaurants.find( { "grades.score": { $gt: 30 }}) [gt is the syntax for greater than]
db.restaurants.findo.sort( { "borough": 1, "address.zipcode": 1 })
db.restaurants.aggregate([{ $group: {" id": "$borough", "count": { $sum: 1
Graph
A graph database stores information as a graph with data entries representing vertices that contain a set
of data fields and relationships (edges) which can lead to other nodes. In doing so graph databases are
able to process graph traversing queries much faster than SQL databases because instead of having to
69
perform a repeated table join, a graph database is able to simply follow pointers to each node. This
speedup can be seen in this slide from NeoTechnology, the developer of the graph database Neo4j:
Graph db performance
*a
sample social graph
0 with -1,000 persons
®average 50 friends per person
®pathExists(ab) limited to depth 4
@caches warmed up to eliminate disk 1/O
query time
bsbins
1,000
2,000 ms
1,000
2 ms
1,000,000
2 ms
Nepj
*Addional Thrd Party Benchmark Available in Neo4j in Action: http-//www
nanning en/partne
Figure 36. Graph Database Speed Comparisons (Marzi, 2014)
This shows that for specific queries that traverse graphs, such as does a path exist between two nodes,
graph databases can be much faster than SQL databases. Also graph databases have built in support for
graph traversing queries (for example pathExists(ab) and allShortestPaths(ab)) make it much easier to
do them straight from the database without any need for long queries, multiple queries, or post
processing. Graph databases drawbacks include the inability to shard data across servers due to the
tight coupling of graphs, which means that opportunity for parallelization as well as horizontal scalability
is not possible. Also in a graph database, each record has to be examined individually during bulk
queries in order to determine the structure of the data, while this is known ahead of time in a relational
database making relational databases much faster when operating on huge numbers of records. This
also makes storing data that that is not easily relatable slower than other databases for querying.
3.2 Database Conclusion
For the purposes of Tradespace Exploration, fast bulk reads, complex query support, the ability to shard
information easily, and the ability to store unstructured/semi-structured data are the most important
factors to allow for real time, complex analysis of large datasets. Fast reads are important in order to
reduce latency of the bulk transfer of data that will be required when trying to render large datasets. In
the case of a multiple stakeholder workflow, the latency of streaming the vast amount of data is critical
as we hope to support real time analysis of the data and all changes. Complex query support for the
database is important because it will allow us to minimize the amount of data being streamed to only
what is necessary, allow us to do the actually querying as fast as possible given that the right data
structures/metadata is stored in the database, and get the desired data in a single query which also
minimizes the client side memory needed and post processing. The ability to shard data easily is
important in order to allow for a scalable amount of storage as well as rebalance the loads of any
70
individual server. Unstructured and semi-structured data support is important because it allows the
storage of all possible types of information including data users are currently gathering about designs for
a possible real-time feedback loop. Doing this allows for the updating of models and re-analysis of
lifecycle properties with data gathered from the field which can dramatically increase model correctness
and allow for proper decisions. Unstructured data also allows researchers to work on the same
database independently as there are no schema migrations required to add new properties to elements.
Unstructured data also enables users to easily add new keys to query on and metadata from real-time
analysis, allows us to store certain JSON data structures in a near native format and finally allow us to
easily serialize these data structures to allow easy streaming and use in the client side
rendering/modification of the data.
From the listed potential options, we believe a combination of database solutions will actually provide
for the best possible tradespace exploration experience. Leveraging each type of database separately
for its unique strengths, storing only appropriate information on each will allow IVTea to be both fast for
real-time analysis but also robust to the changing needs of researchers. First, by using a SQL database
for storing the standardized relational data we are able to very quickly query and update fractional parts
of very large data sets. For example selecting a calculated utility value is a simple query on that column
and updating that value from a new run of the performance model is a simple update on that column.
Document-oriented databases allow for arbitrary data collected throughout a system's lifetime to be
stored and analyzed. This allows for the desired paradigm of iterative tradespace exploration to more
easily leverage new information that is gathered during the design and operations phase. Given that the
early design process can be very conceptual and data driven, it makes sense to have access to a very
flexible way of inputted data which can later be converted into a finalized SQL schema. With this
arbitrary data storage also comes the ability to store any type of information about each design (3D
modeling data, video, etc.) that users might want to view. Also Document-oriented databases allow
researchers to easily add new metadata fields for research purposes (new metadata for ilities analysis,
etc.) without conflicting with other researchers or forcing schema updates and migrations. Finally using
a Graph database we are able to query much faster for any type of information that can modeled as a
graph and requires traversal for important analysis. In order to model designs evolving over
epochs/eras, the graph is a natural choice. In this case a node of the graph would be a design at a
particular point in time in a specific context and over time it traverses the graph going into different
contexts. Also for problems such as calculating how designs should change and adapt over time, a graph
is the most reasonable representation of the underlying design network of designs how they can change
through change paths. In these cases the graph database will provide much faster query times for these
traversals as shown in Figure 36.
The primary drawback to having multiple databases with different use cases is maintaining a
synchronized state of data across each of the databases. For example if information is added to one
database there is no way of making sure that this new information, even if it is in a different form, is
propagated to all other databases. However; this issue is not that different than the issue of
maintaining a common structure any NoSQL database. It is a necessary tradeoff made to allow for
easier partitioning of information and allow for the rapid innovation of analyses. That being said, one
71
possible way around this would be to create scripts that perform the appropriate series of insertions,
updates, or deletions for any data that is required to have full representation in each database. For
example the insertion of a single design is a meta-call that calls the 3 database functions specific
insertions that apply the appropriate respective insertions with user input arguments in each of their
databases.
3.3 Distributed Computing Infrastructure: Which is Best For IVTea?
AWS was mentioned as a distributed computing solution that could be leveraged to provide IVTea Suite
with distributed computing power. Computing power is essential for IVTea Suite in order to support
performance, value, and cost model executions, as well as conducting on-the-fly analyses instigated by
users during a session. The possible options that could be used include a self-hosting Hadoop cluster, or
a cloud hosting solution such as AWS(Amazon Web Services). Each of these options has their own set of
advantages and disadvantages.
3.3.1 Self hosting
The first option is buying and connecting a cluster of servers and running Hadoop on them, essentially
creating your own datacenter. The primary benefit of a self-hosted cluster are that the costs are fixed,
once you have purchased the servers you can run all of your calculations and store them for the price of
the electricity to do so. Also by self-hosting a cluster you are able to configure any aspect of the
hardware or software as needed. Conversely there are disadvantages to self-hosting a cluster.
Maintaining and repairing the hardware---as well as maintaining and updating the software---can be
expensive and time consuming. Once purchased, the scale of the cluster is fixed and can only be
increased by purchasing more servers; moreover, the cluster cannot be easily adapted to the needs of
specific jobs. Finally, self-hosting means there is no overarching cloud computing ecosystem to tap into,
meaning that common implementation problems like load balancing require developing and maintaining
a custom solution, as opposed to integrating your cloud hosts solution.
3.3.2 AWS
The most popular and well established cloud computing solution is AWS (Amazon Web Services). AWS
provides a variety of cloud services which deal with computation, storage & content delivery, databases,
analytics and more. The primary AWS tools IVTea could utilize include Amazon EC2 (Elastic Compute
Cloud), Amazon S3(Simple Storage Service), Amazon RDS(Relational Database Service), and Amazon
EMR(Elastic MapReduce). Amazon EC2 provides scalable server instances which can be launched with
various operating systems supported. After launching your chosen number of instances, software can
be installed and configured allow you to effectively scale any computing or storage task relatively easily.
Amazon EC2 also provides many different types of instances, for example computer optimized, memory
optimized, and GPU enabled so EC2 can give adapt to the jobs' requirements, potentially allowing
researchers to use hardware that would be infeasible to buy outright. Amazon S3 allows for bulk
storage of arbitrary data across Amazon's network as mentioned in Section 3.1.2. Amazon RDS makes it
easy to set up, operate, and scale a relational database in the cloud. It currently supports MySQL,
Oracle, SQL Server, PostgreSQL, and Amazon Aurora. For all of the offerings mentioned, the price scales
with use, allowing researchers to pay for only what they use and also allowing for a scaling up of use if
necessary. Amazon EMR is a Cloud based MapReduce offering that runs on top of Amazon EC2.
72
Amazon EMR simplifies big data processing, providing a managed Hadoop framework that makes it easy,
fast, and cost-effective for you to distribute and process vast amounts of your data across dynamically
scalable Amazon EC2 instances. A free tier of some of the offerings is supported as well so testing
research code can be free in some cases. AWS also provides a of web based user interface for all of
these tools. This AWS Dashboard allows new users to easily launch tasks and monitor their Amazon
cloud services. Amazon AWS also has a host of 3rd party libraries that integrate with its offerings,
making it very easy to integrate into existing projects. One disadvantage of AWS in the context of lVTea
Suite is the cost of the service. While AWS offers a free tier with the following EC2 services:
*
750 hours of EC2 running Linux, RHEL, or SLES t2.micro instance usage
*
750 hours of EC2 running Microsoft Windows Server t2.micro instance usage
*
750 hours of Elastic Load Balancing plus 15 GB data processing
*
30 GB of Amazon Elastic Block Storage in any combination of General Purpose (SSD) or
Magnetic, plus 2 million I/Os (with Magnetic) and 1 GB of snapshot storage
*
15 GB of bandwidth out aggregated across all AWS services
1 GB of Regional Data Transfer
This is likely not going to be enough for the jobs using actual large datasets. The costs for Amazon EC2
instances (which can run all databases and software frameworks) is shown in Figure 37:
t2.micro
1
Variable
1
EBS Only
$0.013 per Hour
t2.small
1
Variable
2
EBS Only
$0.026 per Hour
t2.medium
2
Variable
4
EBS Only
$0.052 per Hour
m3.medium
1
3
3.75
1 x 4 SSD
$0.070 per Hour
m3.large
2
6.5
7.5
1 x 32 SSD
$0.140 per Hour
m3.xlarge
4
13
15
2 x 40 SSD
$0.280 per Hour
m3.2xlarge
8
26
30
2 x 80 SSD
$0.560 per Hour
Figure 37. List of Amazon EC2 On-Demand Prices by Instance Type ("Amazon EC2 Pricing," 2015)
The prices for Amazon S3 are shown in Figure 38:
73
Standard Storage
Reduced Redundancy Storage
Glacier Storage
First 1 TB / month
$0.0300 per GB
$0.0240 per GB
$0.0100 per GB
Next 49 TB / month
$0.0295 per GB
$0.0236 per GB
$0.0100 per GB
Next 450 TB / month
$0.0290 per GB
$0.0232 per GB
$0.0100 per GB
Next 500 TB / month
$0.0285 per GB
$0.0228 per GB
$0.0100 per GB
Next 4000 TB / month
$0.0280 per GB
$0.0224 per GB
$0.0100 per GB
Over 5000 TB / month
$0.0275 per GB
$0.0220 per GB
$0.0100 per GB
Figure 38. List of Amazon S3 Prices by Storage/Month ("Amazon S3 Pricing," 2015)
As the prices for AWS are based on uptime, it is difficult to assess how much Cloud IVTea Suite would
use in the research setting as it would depend on a variety of a factors including the number of
researchers using the platform, the amount of data stored, and the amount of computing time used. As
a means of gaining insight into how much it might cost for a research initiative, we will look at the costs
for the AWS usage required for the work in Section 4.3 (See Figure 39):
$30
$25
$20
$15
$10
4tA
$5
$0
Last Month
(April 2015)
Month-to-Date
(May 2015)
Figure 39. Example of Amount Spent on AWS for Research
Over this period of time 53 hours on an EC2 c3.xlarge instance , 48 hours on an EC2 ml.small instance, 1
hour on an EC2 m3.xlarge instance, and 8 hours on an EC2 ml.large instance were used. Also a total of
55.0177 GB of information was stored using Amazon's S3 service. Actually purchasing the equipment
and installing the software required for these tests on a self-hosted cluster would not have definitely
have cost more in this circumstance and would not have realistically been feasible.
Another possible disadvantage associated with using AWS is that there is the potential for higher latency
when using remote servers to process information. In order to evaluate the latency of Amazon's web
services, 1000 HTTP pings were made to an EC2 instance running on every possible AWS region in order
74
to get an idea of the average response time for each AWS region. The results of this test can be seen in
Table 12:
Table 12. Shows Each AWS Region and the latency associated with pinging them
AWS Region
HTTP Ping Response Time
US-East (Virginia)
34ms
US-West (California)
81ms
US-West (Oregon)
126ms
Europe (Ireland)
106ms
Europe (Frankfurt)
107ms
Asia Pacific (Singapore)
281ms
Asia Pacific (Sydney)
206ms
Asia Pacific (Japan)
215ms
South America (Brazil)
249ms
As you can see, the fastest average response time was US-East. This makes sense as the HTTP pings
originated from (Cambridge, Massachusetts) is closest to the US-East AWS region. As users of AWS are
able to select the region that there instances are in, we can assume that users can get around this
average response time for all of their AWS EC2 use. In order to compare these values to a local server
implementation, the same test was done for a single local server running on nearby network. The
results for this was the local server was 18ms. With AWS' 34ms response time it does have a higher
latency but seeing as the 16ms difference most likely will not dominate when doing multiple database
operation, it should not be the case that using AWS has noticeable latency compared to a local server
solution
3.4 Distributed Computing Infrastructure Conclusion
As IVTea is a tool used by researchers to investigate tradespace exploration, leaving the management of
software and hardware to a 3rd party is ideal as it allows more research effort to be focused on solving
the relevant problems, not focusing on learning new tools and setting up infrastructure. While the costs
of the cloud based solutions are not negligible, the amount of processing power required for these large
scale jobs to be run in real-time will most likely be outside of the realm of what can be feasibly
purchased in house. Also to help mitigate costs, the software and concepts of research can be
developed locally and tested/iterated on for a very reasonable amount of money in batches when
necessary through the cloud service. As such the pre-installed software, auto-updating, well maintained
libraries/3rd party APIs, and maturity of the platform make Amazon AWS the best choice.
3.5 Distributed Computing Frameworks: Which are the Best for IVTea?
As tradespace exploration often deals with altering user inputted parameters and recalculating some
type of analysis (utility, changeability, etc.) for each independent design, a lot of the analysis is
inherently parallelizable. As it stands currently though, all of these computations are done serially on a
single processor. These processes could be made much faster through parallelizing the computations
75
for each design across a distributed computer cluster. In parallelizing the processing of designs across a
computer cluster with a user selectable number of computers/cores, you can subdivide these
complicated functions across designs to subsets of designs. Doing so can give you a performance boost
directly proportional to the number of cores as well as allow problems that were once too big to store in
memory/disk to become feasible. As of right now, many IVTea Suite operations load all of the design
data in memory and runs processes on the data locally. While this is feasible for smaller datasets, it
becomes prohibitive in disk space and memory as the number of designs grows. In order to overcome
these challenges, we propose utilizing distributed computing software frameworks that we can adapt
our algorithms into and run in a distributed fashion.
3.5.1 Hadoop/ Hadoop MapReduce
One such software framework is Apache Hadoop. Apache Hadoop is an open-source software for
reliable, scalable, distributed computing. Hadoop is a software ecosystem that comprises that allows for
massively parallel computing. The components of Hadoop that we are specifically interested in are
HDFS (Hadoop Distributed File System) and Hadoop MapReduce which leverages the MapReduce
programming model developed by Google to provide a framework for simpler distributed computing
(Dean, et al. 2008). Hadoop provides us with the ability to store and process large sets of data by giving
us a software framework that is easy to install on multiple computers and immediately allows those
computers to form a distributed computing cluster with a host of useful tools. Hadoop's HDFS
distributes the large data sets loaded/generated across the cluster and offers a file system like
abstraction for accessing them. This will be the large data that we store and feed into our distributed
computing operations. Our primary use for Hadoop will be running MapReduce jobs across each node
in our distributed cluster. MapReduce is a programming model which consists of two primary phases,
mapping and reducing in order to solve problems at scale. In MapReduce, a user supplies an input file to
be processed, a map function, and a reduce function to the cluster (Dean, et al. 2008). The Map
function takes some subset of an innut (for example a line of a text fiel and oitniits a set of
intermediate key/value pairs. The MapReduce library(in this case Hadoop) groups together all
intermediate values associated with the same intermediate key and passes them to the Reduce
function. The Reduce function accepts these intermediate keys and the grouped set of values for that
key. The reduce function then "reduces" these values to form a potentially smaller set of values and
outputs that set of values to disk.
An example of a MapReduce operation is counting the number of occurrences of each word in a large
collection of documents. In order to solve this problem with MapReduce the user would upload the text
file into disk storage that the MapReduce library supports and supply code similar to the pseudocode
below following map and reduce functions (Dean, et al. 2008):
map(String key, String value):
/key: document name
// value: document
contents for each word w
in value:
EmitIntermediate(w, "1 ");
76
reduce(String key, Iterator values):
// key: a word
// values: a list of
counts int result = 0;
for each v in values:
result +=
Parselnt(v);
Emit(AsString(result));
The map function emits each word of the input plus the value '1' signifying that one instance of the
word was seen. The reduce function sums together these ''s to create and emits a count for all of the
particular words in the document. MapReduce is able to parallelize these large jobs by appropriately
distributing work throughout the designated computer cluster. A complete overview of the MapReduce
execution pipeline can be seen in Figure 40. To begin, a user uploads the file to be processed into a
distributed file system that the MapReduce library supports. This data is then split up into smaller pieces
which can be processed in parallel. The MapReduce library then starts up many copies of the
MapReduce program on a cluster of machines. One of these copies is the master and the rest are
workers. There are M map tasks and R reduce tasks to assign. The master chooses idle workers from
the cluster and assigns each one a map or reduce task. A worker assigned a map task reads in data from
the corresponding split input (split 1 -> map 1). It then runs the user inputted map function and outputs
the intermediate key/value pairs which are buffered in memory. These buffered values are periodically
written to local disk and passed to the master who forwards them to the reduce workers. When a
reduce worker is notified by the master about these locations, the reduce worker uses remote
procedure calls to read in the buffered data from the local disks of the map workers. Once a reduce
worker has read in all of the mappers intermediate data, it sorts the data by intermediate key such that
all occurrences of the same key are grouped together. The reduce worker then iterates over this sorted
intermediate data, passing the key and values to the Reduce function. The output of the Reduce
function is then appended to the final output for this reduce. This process continues until all map and
reduce tasks are completed.
77
@
(I) fort_..
.
<I> tit
·mrort
(2)
: .
-~·/
Input
files
Map
lnte:rmediate files
phase
(on localdiiks)
,,,..
Figure 40 Overview of MapReduce's Structure (Dean, et al. 2008)
MapReduce is useful for problems which are parallelizable and loosely coupled. By loosely coupled I
mean that each separate parallelized task does not require information from another task. This is
because MapReduce has no shared memory between each parallelized task, instead adopting to split a
large input into small pieces that can be pulled into each processes memory separately. As such,
MapReduce is not well suited to situations where tight coupling (e.g. message passing/shared memory,
or large graph processing algorithms}. It also does better where the amount of computation required is
significantly more than any one computer. An example use case where MapReduce is useful is as
follows: Assume there are 10,000 source files that are relatively independent of one another (e.g. you
don't have cross file references to resolve}, and processing each file takes one minute. On a single
computer, this would take
~1
days to process. With a MapReduce cluster of 100 machines, it would take
about 2 hours. Also for Hadoop, Hadoop MapReduce has been around for a significant period of time so
there are multiple libraries, tools, and APls that can be leveraged when attempting to run Hadoop
MapReduce jobs.
3.5.2 Spark
Another software framework that enables parallel processing of large, complex problems through
distributed computing is computing framework Apache Spark. Apache Spark is programming interface
that exposes a distributed memory abstraction to users, letting them perform in-memory computations
on large clusters in a fault tolerant manner. Compared the MapReduce's heavily disk based method of
persistence as shown in Figure 40,Spark primarily uses the computing clusters RAM for executing it's
tasks as shown in Figure 41:
78
Figure 41. Overview of Apache Spark's Structure (Zaharia, et al. 2012)
The distributed memory abstraction used by Spark is called Resilient Distributed Datasets (RDDs). By
creating this fault tolerant shared memory abstraction, Spark is able to perform certain computing
problems much faster and opens up the possibilities for using distributed computing for entirely new
problems that the established Hadoop MapReduce cannot handle.
Formally an RDD is a read only, partitioned collection of records (Zaharia, et al. 2012). RDDs can only be
created through deterministic operations on either data in stable storage or from other RDDs. These
operations are called transformations and some examples include map, filter, and join. Through RDDs,
Spark keeps track of the data that each of the transformations, which enables applications to reliably
store this data in memory. This is the key to Spark's performance, as it allows applications to avoid
costly disk accesses. This enables low-latency computations by caching the working dataset in memory
and then performing computations at memory speeds. This also enables efficient iterative algorithm by
having subsequent iterations share data through memory, or repeatedly accessing the same dataset.
RDDs can be created and utilized through the Apache Spark programming interface to accomplish many
diverse tasks, one such example which we will detail is the real time analytics on server logs.
For this example we will assume that we have an HDFS file containing a collection of lines of text which
holds our server log information. Using Spark the user is able to load just the error messages from the
logs in9to the distributed RAM of the cluster and query them interactively. The example code is shown
below (Zaharia, et al. 2012):
I
2
3
4
5
6
7
8
9
lines = spark.textFile("hdfs://<path-to-large-server-log-file>")
errors = lines.filter(_.startsWith("ERROR"))
errors.persist()
errors.count()
// Count errors mentioning MySQL:
errors.filter(_.contains("MySQL")).count()
// Return the time fields of errors mentioning
// HDFS as an array (assuming time is field
// number 3 in a tab-separated format):
79
10
errors.filter(_.contains("HDFS")).maP(_.split(('\t')(3)).-collect()
This example uses a series of Scala commands through the Spark Command Line Interface to
interactively query our cluster. Line 1 defines an RDD backed by an HDFS file which contains our text file
of server logs. Line 2 creates a filter for the error line of the log and applies that the RDD of line 1,
creating a new RDD 'errors'. Line 3 asks for our 'errors' RDD to persist in memory so that it can be
shared across multiple queries. It is important to note that at this point no actual work has been
performed on the cluster as RDDs are lazily evaluated to allow for optimization. Line 4 launches the RDD
accesses and performs a count of the number of error messages in our log. Lines 5 through 10 perform
various queries the user desires on the errors RDD which in now stored in cluster memory. This
interactive style of data interrogation is something not really possible with other distributed computing
paradigms due to their job scheduled and disk based architecture.
An example of the performance gain possible when using Spark compared to other distributed
computing paradigms can be seen when doing triangle counting on graphs. Triangle counting is a
measure of the cohesiveness of the "community" of a vertex by counting the number of connected
nodes who are themselves connected. GraphLab, a graph processing library which has recently been
ported to work with the Spark programming interface is designed for graph processing and was used to
conduct a comparison between triangle counting on Hadoop vs GraphLab. While the results shown
come from a period where GraphLab did not utilize Spark, it is believed that the runtime should be
comparable while running on top of Spark. The results can be seen in Figure 42:
Triangle Counting on Twitter
40M Users, 1.4 Billion Links
Hadoop
-
-
Counted: 34.8 Billion
Trianaies
[VVWV'11
42 Miutes
GraphLab
64 Machine
IS Seconds
S. Suri and S. Vassilvitskii, "Counting triangles and the curse of the last reducer,' WWW'11
Figure 42. Demonstrates triangle couting speedup resulting from using in memory architecture vs disk based architecture
(Suri, et al. 2011)
Spark is useful for tightly coupled problems as there is essentially a single pool of shared memory that all
the parallelized processes can pull from. Use cases where this is especially useful include iterative
algorithms such as those in machine learning (e.g. gradient descent, etc.), interactive data mining, data
streaming applications, and graph processing. For these application, as Figure 42 shows, Spark is
dramatically faster disk based solutions.
80
3.6 Distributed Computing Framework Conclusion
IVTea can leverage both of these types of computing frameworks in order to allow for large
computational tasks such as traversing the complicated graph that emerges when designs are evaluated
through various eras with calculation being done for specific 'ilities'. The newer RDD paradigm that
Spark offers can be used for real time large data analysis which is the desired response time for a tool
like IVTea Suite. It is still the case thought that due to the established database ecosystem of Hadoop
and the ability to process large files for which the memory required is still too large even with RDDs. For
example processing a large design variables through a new physics model and storing it into a database
would be simpler, cheaper, and easy to schedule with the current Hadoop infrastructure.
3.7 IVTea Suite Recommended Architecture Summary
As mentioned in Section 3, re-architecting IVTea Suite to allow for larger data sets, faster
computation, and real-time data exploration requires the addition of 3 major components: database
architecture, distributed computing infrastructure, and distributed computing software framework.
After analyzing the alternatives and comparing the strengths and weaknesses of various, we have
decided on a potential architecture. For the database architecture we have selected using SQL,
Document-Oriented, and Graph databases for the reasons outlined in Section 3.2. For the distributed
computing infrastructure we have selected AWS (Amazon Web Services) for the reasons outlined in
Section 3.4. Finally for the distributed computing framework we have selected Apache Spark and
Hadoop MapReduce for the reasons outlined in Section 3.6.
81
82
Chapter 4: Detailed Overview of Cloud IVTea
As mentioned in Section 1.5.1, the goals that spurred the creation of IVTea were to allow users to view,
and interact with, vast amounts of design data in order to reason about their complex system, compare
designs across all important lifecycle properties that are currently overlooked (changeability,
survivability, etc.), and iteratively evaluate their design throughout the entire design process
incorporating newly obtained information. While IVTea has many useful features, it currently has only
partially delivered on these goals. In order to fully achieve these goals, IVTea Suite must exhibit
enhanced functionality. Incorporating the technologies outlined in Section 3.7, we can put together a
new architecture (Cloud IVTea) that would allow IVTea Suite to be able meet the outlined goals for an
ideal tradespace exploration software tool.
By incorporating the cloud hosting solution Amazon AWS, IVTea would be able to massively parallel
process information and store/query that information from the ~5 million servers that Amazon AWS has
across its infrastructure. The database software and distributed computing would sit on top of Amazon
AWS providing them with scalable infrastructure that can adapt to the needs of IVTea and scale with it.
Also as IVTea Suite's requirements grow, adding things such as a memcache layer or added security
features can be drawn from the AWS toolkit as opposed to creating solutions in house.
By incorporating a database ecosystem of MySQL, SimpleDB, and Neo4j into IVTea Suite, we are able to
leverage all of the strengths of each database type while mitigating their weaknesses by using them in
an application specific manner. Combined with a cloud hosting platform such as Amazon EC2, or
Amazon RDS, these databases will be able to scale vertically (as is the case with SQL and Graph
databases) and horizontally with NoSQL databases. With these databases in place we can store
information in its most native format and by leveraging each database type in an application specific
way we can gain incredible performance for our analytics and visualization
By incorporating a distributed computing framework such as Apache Spark or Apache Hadoop we are
able solve problems in a distributed fashion. This allows for problems which were too complicated
computationally to now be analyzed. It also allows for much larger datasets to be analyzed which
directly fits in with the goals of tradespace exploration.
4.1 Architectural Diagram
The architectures of the current IVTea Suite and the proposed Cloud IVTea Suite will now be discussed.
4.1.1 IVTea Suite
The current architecture of IVTea Suite can be seen in Figure 43:
83
SQL
Database
~~~~pre..optt-sa
me4ies
""""ulf
Realdlme Camputalkms
IVTea Suite
Figure 43. IVTea Suite current architecture
In the current architecture, IVTea Suite uses pre-computed values for representing attributes, transition
rules, and fuzzy Pareto numbers. These values are stored in a SQL database as shown in Figure 43 and
read out when necessary by IVTea Suite. For the current real-time calculations of utility, IVTea Suite
runs the utility function across each designs stored attributes which are located in the host machines
local memory. This is depicted by the red arrow in Figure 43. While this is acceptable for small data sets
and operations with relatively fast run-times, it is prohibitive to the exploring aspect of tradespace
exploration in which more and more information is added by users and the analyst using the tool. The
reason that attributes are currently precomputed is due to the fact that running performance models is
computationally expensive and requires a nontrivial amount of time to run for large sets of designs. The
reason that values for lifecycle properties are precomputed is due to the computational complexity of
traversing all of the epochs and possibly eras that each design can encounter. For example analyzing the
changeability of designs across a lifecycle require a graph search across all possible epochs and all
designs reachable through changing elements of each design. This problem grows exponentially as the
number of designs and possible eras increases. For a reasonable number of designs and epochs, this is
far beyond what a single computer could do. Ideally these computationally expensive precomputed
metrics could also be adjusted in real-time to allow for user feedback to be put back into the model as is
desired from the feedback loops depicted in Figure 13.
4.1.2 Cloud IVTea
The proposed architecture for Cloud IVTea Suite can be seen in Figure 44:
84
(NoJ
bvaaing
SC S& Nwo
AWS
(Amazon
Web
Seces)
Spark/GraphX
E
E
EC2
EC2
EC2
EC2
MapReduce
Figure 44. Proposed Cloud lVTea architecture
In the Cloud IVTea architecture shown in Figure 44, it is no longer the case that we are restricted to the
local memory and processing power of a single computer. This is achieved by running the entire
architecture through AWS(Amazon Web Services). By using Amazon EC2(Elastic Compute Cloud) we are
able to scale the amount of servers used to properly meet what is required by the current exploration
task. By installing the appropriate software on each one of our server instances we are able to easily
scale our computation and storage needs as the problem requires. For example to migrate the design
data that was once in our systems fixed memory, we can instead load these values into RDDs on a
running Apache Spark cluster. This information can then be processed in parallel by Apache Spark and
yield real-time values for ilities calculations that were once to slow to do in real-time or process larger
sets of designs which were once too memory consuming. As AWS payment scheme is hourly based on
usage, it is easy to spawn up exactly what is needed at the right time and then terminate the servers
after use.
For precomputing large quantities and adding them to a database, Amazon EMR (Elastic MapReduce)
can be used to easily facilitate the launching of MapReduce tasks. With this in place, jobs that were
once too large to run can now be computed and stored in AWS. One advantage of using MapReduce is
that for tasks which have inputs that are greater than the possible RAM in multiple computers,
MapReduce can still operate. Also Hadoop MapReduce supports the ability to read and write
information from stdin and stdout. This allows developers to write MapReduce code in the whatever
language they are already using in their research, making the process faster and easier than using, for
example, Apache Spark. For example, the analysis of multi-arc change paths in Section 4.3 utilizes
Amazon EMR to calculate the metric for 10A6 designs, a number that would have previously been
infeasible. The utility of the databases used are discussed in Section 3.2.
4.2 Multi Era Affordability with Change Paths
Stakeholders want the design that delivers the highest value (based on their criteria) for an affordable
85
cost across the entire expected life cycle of the system. This is difficult to evaluate, however, as
perceived system value and system costs can change with time. Along with that, the cost is often times
multi-dimensional and the expected budget can also change over time. In these cases, stakeholders
want design solutions that are affordable given their multi-dimensional budget constraints. In this
context, affordability is defined as the property of becoming or remaining feasible relative to resource
needs and resource constraints over time (Wu, et al. 2014). Wu's work has framed the problem of
finding affordable designs by utilizing the framing of epochs and eras from epoch-era analysis. Epochera analysis is used in this case as a means of predicting the lifecycle cost of designs as they move
throughout their lifecycle, transitioning through epochs (Figure 45).
f .
. .. . .
;
iota! l '1;;-r;: ycle
,-c-.,:
l
Figure 45 Visualization showing a tradespace moving through epochs, forming a single era (Ross, et al. 2006)
This resulted in allowing users to see which designs were affordable and met performance constraint in
each respective epoch as well as across epochs. From this, users could see different designs best and
worst case expenses and evaluate which designs were affordable as well as which designs could be
robust to budget changes.
For EEA generally, the framework is set up such that meaningful insights are possible without having
probabilistic data or estimates (e.g. the likelihood that a particular epoch will occur). However, it can be
the case that in certain systems probability of epochs are known or can be estimated to reasonable
precision. It can also be the case that during the design phase and operation phase, designs are able to
change (refueling, integration of new technologies, etc), possibly in reaction to epochs changing. When
this is the case, users can leverage this information to find in addition to which design paths are most
affordable, what change options users should select in every possible scenario in order to minimize cost.
This analysis of looking at large numbers of possible futures based on the likelihood of epochs requires
framing the problem as multi-era analysis (MERA). Schaffner first demonstrated multi-era analysis and
showed that it could be used successfully as a means for affordable concept selection (Schaffner, 2014).
Schaffner has two relevant research question he affirmed in his work: "Can the affordability of complex
systems be better enabled through an early-phase design method that incorporates multi-attribute
tradespace and Epoch-Era Analysis?" and "Can the affordability of changeable systems be better
evaluated by leveraging existing path planning approaches to efficiently evaluate long-term change
strategies for systems in many possible futures. We are presenting a solution that expands upon
Schaffner's affordability work. Our approach uses Markov decision processes to find affordable designs
and allows calculations to be parallelized across all inputs (designs, epochs, costs, and transition rules).
With the problem formulated in a parallelizable fashion, given a distributed computing architecture we
86
hope to allow the calculation of affordable designs and affordable change paths across multiple eras for
large sets of inputs.
4.1.2 The Big Problem Approach
By specifying a probabilistic model for the occurrence of an epoch, we can examine the expected cost of
a design across all eras. This gives us a great deal of modeling power, at the cost of only limited ability
to model changing preferences. One way to specify this probabilistic model is using a Markov decision
process. Markov decision processes provide a mathematical framework for modeling decision making in
situations where outcomes are partly random and partly under the control of a decision maker. This is
exactly the case in large scale systems where users are able to change designs at any point at a cost but
do not know what epochs may occur in the future.
Formally, a Markov Decision Process (MDP) is a tuple (S,A,T,J), where:
*
S is a set of possible world states. This can be constructed from a combination of epoch and
design spaces
*
A is a set of possible actions. This can be constructed from the traditional rule/path space
*
J is a function mapping each state-action pair to a user-specified cost. This can be constructed
using the traditional transition/accessibility matrices.
defines the probability of transitioning between states given a particular action
ST
It is important to note that the T matrix requires information beyond that of ordinary epoch-era analysis. In this case
we assume that the probabilities of epochs occurring
are known.
The overall cost of a sequence of states is just the sum of the cost of the state at each time step. In
general, we can't specify a state sequence; the best we can do is specify a policy. A policy is a mapping
from states to actions: given the current state, it returns a particular choice of action. To evaluate a
policy n, we simply need to sum over all epochs within the time horizon. Suppose that at epoch t, we are
in state s with probability ps. Then we will take action n(s) with probability p(s), and incur expected cost
Es psJ(s, n(s)) during that time step. Moreover, we have
p(st+1) =
P(s)P(s8t+1s,7r(s))
Given an MDP, an initial state sinitialand a time horizon T, we can recursively compute the expected
cost incurred by following a particular policy.
1.
Initialize E[C(iT, 0)] = 0,p(s1 ) = p(s1 ISo = Sinitial , 7(sinitiai))
2.
For t=1 to T,
a. Compute:
p(st) =
Compute:
E[C(n, t)] = E[C(n, t - 1)] +
87
p(st = s)J(st = S, St-
1
)
b.
p(st-1 = s)p(stIst- 1 = s, at- 1 = n(s))
3.
E[C(w, T)] is the expected cost of the policy n
Bellman famously presented an efficient algorithm for determining the minimum-cost policy (Bellman,
1957). We define a 'cost-to-go' function, which describes the expected cost of being in a particular state
over all remaining time steps. At the last time step, this is trivial: VT(s, a) = J(s, a). We can recursively
define the cost to go at previous time steps.
1.
Set VT (s, a) = J(s, a).
2.
For t=T:-1:1, compute:
Vt- 1 (s) = min
p(st = s'Ist- 1 = s,at-i = a)[Vt(s') +J(s',s)]
S/
The result of this computation, Vo(s, a) is the minimal expected cost for each state-action pair at the
initial time step. The optimal policy is then to always choose the action which minimizes the cost to go
at the next time step. Note this algorithm requires only matrix multiplication and picking the min from a
list, both of which are easy to parallelize.
Applying MDPs to Tradespace Exploration
The set of possible states S for the multi-era affordability with change paths problem is the set product
of all possible epochs and all possible designs. This means that the number of world states
ISI is of size
E * D where E is the number of epochs and D is number of designs.
The actions are the possible change paths that can be used to transfer from one design to another. In
the simplest case, where there is exactly one path to each design, there would be D possible actions.
There may be more than D actions if there is more than one way to reach a given design.
For this problem, the transition probability factors into two terms. Let et be the epoch at time t, and let
dt be the design chosen at time t. Then p(et+i, dt+1 let, dt, at) = p(et+i Iet)p(dt+1 Idt, at). That is,
the epoch is independent of the design we choose, and likelihood of transitioning to a given design
depends only on the current design and the design path we choose. The term p(et+1Iet) reflects the
likelihood of various epoch sequences, and must be specified by a user; the term p(dt+1 Idt, at) is
deterministic, so that p(dt+1 Idt, at) = 1 if design path at terminates in design dt+1 when it begins in dt,
and is zero otherwise. Fully specifying the transition probabilities then requires E 2 + IAI numbers,
although representing them explicitly in memory requires E 2 D 2
IAI space.
Finally, the cost function represents user preferences for various design-epoch pairs. For instance,
consider modeling a satellite communication system where solar flares present an operational risk. If a
solar flare occurs and the satellite is unshielded, the system may fail and we incur a high cost. On the
other hand, installing shielding may be expensive, so if solar flares are very rare it may not be worth
installing. The cost function allows us to express these tradeoffs numerically. Note this simple
representation of cost is a disadvantage of the model; the cost function for our use must represent an
aggregate of many considerations, including the financial cost and the cost in terms of time. We will
also include a matrix containing the running sum for each of our 2 dimensions of cost, time and money.
88
These matrices, C, and Ctwill show the cost associated with the paths that that the policy generates. By
comparing this with their budgets users will be able see which designs are affordable.
We will show an example of the MDP matrices and how they are solved being applied to a simple
satellite development tradespace exploration problem. The MATLAB code for this example can be
found in Appendix A.
&
The Setup
We are attempting to design a satellite with 3 potential designs: unshielded, 3rd Party Shielded, and R
D Shielded. The system the satellite exists in has 2 epochs: a no solar flare epoch (regular operation)
which occurs with probability .8 and a solar flare epoch which occurs with probability .2 . The transition
rules are that any design can become any other design through a pre-specified amount of time and
money. The negative impact of the solar flare is depicted in the model as costing additional time and
money to designs that are impacted by solar flare (in our case unshielded and 3 rd Party Shielded to a
small extent). The budget for our project is $50 million and 50 months. By using MDPs to look at the
multi-era affordability of designs, we are hoping to be able to evaluate our tradespace of designs and
select the proper affordable change path based on how much we value time vs money. In this way we
will be able to select what we estimate is the most cost effective action to take for any design in any
epoch (epochs here assume the same fixed duration). This calculation can also be re-evaluated at any
point in the systems lifecycle with new information to re-assess the most affordable action to take. By
doing this we hope to gain insight into the costs associated with actions as well as learn what the most
affordable action to take is in any epoch is.
0
S is a set of possible world states:
d1/n, d2/n, d3/n, d1/y, d2/y, d3/y
dl: unshielded, d2: 3 rd Party Shielded, and d3: R&D shielded and
where
n: no solar flare, y: solar flare
*
A is a set of possible actions:
action 1: change to unshielded (dl)
action 2: change to 3 rd party shielded (d2)
action 3: change to R&D shielded (d3)
*
J is a function mapping each state-action pair to an user-specified cost:
To make our J easier to reason about we will construct it from lots of smaller matrices
encapsulating smaller cost concepts:
0
T
is0
The cost in millions of dollars of being in state i (the rows are dl/n, d2/n, d3/n, d1/y, d2/y, d3/y)
0
Jse
=
5
0
89
0.
0
The cost in time of being in state i (the rows are dl/n, d2/n, d3/n, dl/y, d2/y, d3/y)
25
J=
0
0
0 0 20
it=
Jt
01
0 0
The cost in $ of transitioning from design i to design j
=3
0
50 The cost in time of transitioning from design i to design j
3
6
0
We also specify a which is a weighting representing the tradeoff the user desires between time and
money
With this in place the J matrix is calculated as follows:
wm_states - 6;
nua designs - 3;
Ja - a*Jsc + (1-a)*Jst; % Aggregate cost of being in state i
Jr - a*Jtc + (1-a)*Jtt; * Aggregate cost of transitioning from design i to design
J - zeros(6,6); % Aggregate cost matrix
for sp-l:numstates
for a-I:numstates
d - mod (s-1,numdesigns)+1;
dp - mod(sp-1,num_designs)+1;
J(sps)-Jt(d,dP) + Js(sp);
end
end
For a =1 (meaning the user has a preference towards saving $ and doesn't care about time), the
resulting J matrix is:
0
1
0
0
0
0
0
1
0
0
0.
0
20 20 0 20 20 0
15 15 15 15 15 15
20 5 5
6 5 5
20 20 0 20 20 0-
*
T defines the probability of transitioning between states given a particular action, so that
(T)Jkis the probability of transitioning to state j from state k when taking design path i.
From the global pno_solar_fiare = .8 and Psolarflare = .2 we have the following T matrices for each
action:
.8
.8
0 0
T, =
2
0
2
0
0 0
.8
.8-
0 0 0
0
.8
2
0
.8
.
0
Probabilities of end state assuming action 1
0
0
0 0 0
0
90
-0 0
.8 .8
T2
T3 =
0 0
.8 .8
0
.8
01
.8
0
0
.2
0
.2
0
.2
0
.2
0
.2
.2
0
0
0
0
0
0
0
0
0
0
0
01
0
0
0
0
0
0
.
.
.
.08
8 .
0
0
0
0
0
0
.2
.2
.2
.2
.2
.2
Probabilities of end state assuming action 2
Probabilities of end state assuming action 3
Solving the Equation
To find the policy Yr and the cost-to-go matrix V, we will use the Bellman equation mentioned above.
As this equation relies on iteration, it can either be run for a repeated number of steps in a finite horizon
or run for as many iterations as required until the V matrix converges which is classified as an infinite
horizon. We will analyze the results of the finite horizon as it allows for more insight to be gained about
the system but the code for both can be found in Appendix A. The finite horizon implementation of the
Bellman function (with added expected cost calculation) in MATLAB can be seen in Figure 46:
T - 30; % Length of horizon
V
(number of steps to plan into the future)
zeros (6,T);
zeros (6, T);
Vt - zeros (6,T);
pi- zeros(6,T);
Sfor tiT-1:-1:1
Jn - zeros (6,3);
Jnc-Jn:
Jnt-Jn;
for 3-1:6
for a-1:3
for sp-1:6
Jn(a,a)-Jn(a,a)+p(sp,a,a) * (V(sp,t+1)+J(ap,a));
Jnt(s, a)mJnt(s,a)+p(sp,a,a)*(Vt(ap,t+l)+Jt(ap,a));
Jnc (a, a)=Jnc (a, a)+p (sp, s,a) * (Vc(up, t+1)+Jc (sp, s));
end
-
Vc
-
end
end
*
[V(:, t), pi (:, t)]=mIn (Jn, [], 2)
for a-1:6
Vc(s,t)=Jnc(s,pi(a,t));
Vt (9, t) -Jnt (s,pi (9, t));
lend
end
Figure 46. Modified Bellman equation code, calculating optimal policy and expected cost in $ and money
The Results
The outputs of this function are the policy 7r and the cost-to-go matrix V, as well as the total dollar cost
matric Cc and the total time cost matrix Ct. For the following parameters:
Horizon length = 30, a = 1
91
Columns
1
through 11
20.0000
20.0000
0
20.0000
20.0000
0
20.0000
20.0000
0
20.0000
20.0000
0
20.0000
20.0000
0
20.0000
20.0000
0
20.0000
20.0000
0
20.0000
20.0000
0
20.0000
20.0000
0
20.0000
20.0000
0
20.0000
20.0000
0
20.0000
20.0000
0
20.0000
20.0000
0
20.0000
20.0000
0
20.0000
20.0000
0
20.0000
20.0000
0
20.0000
20.0000
0
20.0000
20.0000
0
20.0000
19.0000
0
20.0000
19.0000
0
17.0000
16.0000
0
17.0000
16.0000
0
16.0000
15.0000
0
16.0000
15.0000
0
15.0000
14.0000
0
15.0000
14.0000
0
14.0000
13.0000
0
14.0000
13.0000
0
13.0000
12.0000
0
13.0000
12.0000
0
12.0000
11.0000
0
12.0000
11.0000
0
11.0000
10.0000
0
11.0000
10.0000
0
10.0000
9.0000
0
10.0000
9.0000
0
9.0000
8.0000
0
9.0000
8.0000
0
6.0000
5.0000
0
6.0000
5.0000
0
5.0000
4.0000
0
5.0000
4.0000
0
4.0000
3.0000
0
4.0000
3.0000
0
3.0000
2.0000
0
3.0000
2.0000
0
2.0000
1.0000
0
2.0000
1.0000
0
0
0
0
0
0
0
20.0000
20.0000
0
20.0000
20.0000
0
Columns 12 through 22
18.0000
17.0000
0
18.0000
17.0000
0
19.0000
18.0000
0
19.0000
18.0000
0
Columns 23 through 30
7.0000
6.0000
0
7.0000
6.0000
0
8.0000
7.0000
0
8.0000
7.0000
0
Figure 47. Output V matrix showing the expected aggregate cost for different epoch lengths
-
PC
Column 1 through 19
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
2
3
2
2
3
2
2
3
2
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
3
3
2
3
2
2
3
2
2
3
2
2
3
2
2
3
2
2
2
3
2
2
3
2
2
3
2
2
3
0
0
0
0
0
0
3
3
2
2
3
2
2
3
2
2
3
2
2
3
2
2
3
2
2
3
2
2
3
2
2
3
2
2
3
2
2
3
2
2
3
2
2
3
2
2
3
2
2
3
2
2
3
2
2
3
Columns 20 through 30
2
2
3
2
2
3
2
2
3
2
2
3
2
2
3
2
2
3
Figure 48. Output
2
3
2
2
3
n matrix showing the best policy for different epoch lengths
Each column of these matrices represents an iteration of the Bellman Function selecting the lowest cost
path at each step from inception. The first column of 7r represents the change path to take from each
design that minimizes cost assuming the lifecycle of the system is 30 epochs. The 2 9 th Column of 7r
represents the change path to take if the lifecycle of the system is 1 epoch. V contains the expected
costs for these change paths.
From this we can see clearly see that there is a shift in which design path is more affordable depending
on the expected lifecycle of the design. If the lifecycle is >19 epochs the change path that minimizes
cost is to always (regardless of current design) immediately pay the high costs for the R&D shielded
designs. In the case that the expected lifecycle is equal to 19 epochs the affordable choice becomes to
stay in the
3 rd
party shielded design if you have currently there but otherwise switch to the R&D shielded
design. In the case that the expected lifecycle is <19 epochs the affordable choice becomes to stay in the
92
party shielded design even if you have started in the unshielded case.
3 rd
Below, we list similar expected cost matrices V, and Vt, which list the expected cost in millions of dollars
and months, respectively. The formats of the matrices are identical to the format of V above.
>>
VC
-
VC
Columnns 1 through 11
20.0000
20.0000
20.0000
20.0000
20.0000
20.0000
20.0000
20.0000
20.0000
20.0000
20.0000
20.0000
20.0000
20.0000
20.0000
20.0000
20.0000
20.0000
20.0000
20.0000
20.0000
19.0000
0
0
20.0000
20.0000
0
0
20.0000
20.0000
0
0
20.0000
20.0000
0
0
20.0000
20.0000
0
0
20.0000
20.0000
0
0
20.0000
20.0000
0
0
20.0000
20.0000
0
0
20.0000
20.0000
0
0
20.0000
20.0000
0
0
20.0000
17.0000
16.0000
0
17.0000
16.0000
0
16.0000
15.0000
14.0000
0
13.0000
12.0000
0
13.0000
12.0000
0
12.0000
11.0000
0
12.0000
11.0000
0
11.0000
10.0000
0
11.0000
10.0000
0
10.0000
9.0000
0
10.0000
9.0000
0
9.0000
8.0000
0
9.0000
14.0000
0
14.0000
13.0000
0
14.0000
13.0000
0
6.0000
5.0000
0
6.0000
5.0000
4.0000
0
5.0000
4.0000
0
4.0000
3.0000
0
4.0000
3.0000
0
3.0000
2.0000
0
3.0000
2.0000
0
2.0000
1.0000
0
2.0000
1.0000
0
0
0
0
0
0
0
20.0000
20.0000
0
19.0000
0
Columns 12 through 22
18.0000
17.0000
0
18.0000
17.0000
0
19.0000
18.0000
0
19.0000
18.0000
0
15.0000
0
16.0000
15.0000
0
15.0000
8.0000
0
Columns 23 through 30
7.0000
6.0000
0
7.0000
6.0000
0
8.0000
7.0000
0
8.0000
7.0000
0
5.0000
0
Figure 49. Output Ve matrix showing the best policy for different epoch lengths
C01=02
I
throgh
25
50
50
50
5
50
0
50
0
50
0
S
0
0
0
so
Colun1
so
26
5
000
5
0
0
5
0
0
so
tthrough
5
0
0
S
0
5
0
0
30 5
0
0
so
so
50
0
so
so
0
50
to
0
50
50
so
so
0
0S
0
0
50
0
0
50
0
0
0
0
5
S
0
0
0
6
0
0
5
0
a
0
0
50
05
00
0
0
5
0
ID
0
S
0
0
0
5
0
0
S
50
05
00
50
05
00
S
0
0
50
05
00
00
00
00
Figure 50. Output Vt matrix showing the best policy for different epoch lengths
By comparing this to our budget of $25M and 50 months outlined in the setup we can see that we can
see that our policy is expected to be within budget. Of course, if the solar flare epoch occurs more often
than expected, we may over-spend our budget. Knowing the odds of a solar flare allows us to estimate
how much money will need to be spent repairing the system, and to make better decisions about when
to research advance shielding.
This very simple example shows the power of applying MDPs to the multi-era affordability problem. By
utilizing MDPs, users will be able to see the expected cost of transitions, gaining insight into how their
systems evaluates change paths as well as seeing the most affordable action to take in all situations. Our
method cannot in its current form evaluate the actual budget constraints though as it optimizes its
93
choice of change path every iteration. As such it does not directly result in answers that meet budget
requirements. Instead user are able to actively tweak alpha and look at the rule paths chosen, exploring
the possible set of actions and their associated costs. Through this exploration, users should be able to
gain insights into affordable moves to make throughout their designs lifecycle and even find some that
exactly meet their budget requirements.
Future additions to our current MDP implementation include adding design utility into the cost matrix
for design selection. This could be done by including the multi-attribute utility for each design and then
weighting the overall performance of a design with a value similar to the a used for multiple dimensions
of cost. Also various alterations to the structure of generating the overall cost matrix J can allow for
modeling different properties. For example value degradation as depicted in Figure 5 can be evaluated
using a linearly scaling value which is multiplied to the utility of designs. Another possible extension
would be to take the time costs and actually encapsulate them as epoch-like durations. In doing so the
timeline for the design will be more explicit as the period for change is better reflected.
Need For Cloud Technologies
The problem with using Markov decision processes with IVTea in its current state is that it is
computationally too complex to solve the Bellman equation for a large number of designs, epochs, and
transition rules. Storing the cost matrices can require a large amount of memory, and storing the
transition probabilities in a dense format can also be too large for the memory of a single computer.
Also for performing multiple large matrix multiplications can take an incredibly long time if done serially
on a single computer. With our enhanced cloud-based architecture, this problem begins to become
approachable. We will detail an example approach to solving this problem by leveraging the power of
distributed computing.
The dense transition matrices Tare on the order of E 2 D 2 JAI. This means that in order to do this
calculation for 1,000 designs, 10 epochs and 100 actions with values stored as float32, you would need
~4000GB of RAM. This amount of ram is not conceivable on a single computer as current hardware
specs even for high end servers hover around -200GB of RAM. However; using Apache Spark on top of
Amazon EC2 with large memory optimized instances which have 244GB of RAM per instance, this is
possible to store in memory with ~17 instances. While Spark with EC2 solves the memory problem, the
actual calculation of the bellman equation is still a problem as it will require a large number of matrix
multiplications.
The bellman equation requires a series of matrix multiplication as shown in Figure 46. What is
convenient about matrix multiplication is that matrix multiplication can be done through a simple
distributed algorithm as was shown by Agarwal (Agarwal, et al. 1994). The problem setup and
algorithm, summarized, is as follows:
Matrix Multiplication Problem Setup:
0
Matrix A(m
and 1
*
x
r) m rows and r columns, where each of its elements is denoted aij with 1
i
m
j r
Matrix B(r
x
n) of r rows and n columns, where each of its elements is denoted bij with 1:5 i ! r,
94
and 1
*
j n
Matrix C resulting from the operation of multiplication of matrices A and B, C = A
that each of its elements is denoted ij with 1
i : m and 1
x
B, is such
j5n
Distributed Multiplication Algorithm Summary:
1. Partition these matrices in square submatrices p, where p is the number of processes
available.
2. Create a matrix of processes of size p 1/2 x p 1/2 so that each process can maintain a
submatrix of A matrix and a submatrix of B matrix.
3. Each submatrix is sent to each process, and the copied submatrices are multiplied together
and the results added to the partial results in the C submatrices.
4. The A submatrices are moved one step to the left and the B submatrices are moved one step
upward
5. Repeat steps 3 and 4 sqrt(p) times.
This algorithm can run in a scalable, distributed fashion by using AWS and Apache Spark. This is done by
sending the square submatrices (Step 2) to separate Apache Spark instances and performing the first set
of multiplications (Step3). Then an intermediate step follows which involves writing these outputs to
Amazon Elasticache/disk, shifting values (Step 4), then repeating the process sqrt(p) times (Step 5) . The
final resulting matrix comes from summing the partial sums using Apache Spark instances and getting
the multiplied value.
Using MDPs combined with the cloud architecture enables a solution to the multi-era affordability with
change paths problem for large space of designs, epochs, and transition rules. By breaking this large
problem into of matrix multiplications with MPDs and then parallelizing those matrix multiplications
across AWS hosted Spark instances, users can get results as to the affordable set actions to take
throughout the lifetime of their complex system. From this users can more accurately gauge the
affordability of systems with change options as well as interrogate this results and learn about what
situations are costly, how long is my design solution going to be affordable, and gain insight into the
underlying problems increasing costs. The actual implementation of this solution is left as future work in
as discussed in Section 5.3.
4.3 Demonstration: Parallelizing Multi-Arc Cost Paths with MapReduce
In order to demonstrate the effectiveness and practical application of cloud technologies for calculating
tradespace related attributes, we have applied a fraction of the recommended architecture to a
problem that is currently computationally too complex to solve for large numbers of designs. This is the
multi-arc change path calculation proposed by Fitzgerald as a means of representing possible multi-step
transitions that designs can make (useful when evaluating changeability) (Fitzgerald, 2012). Through
parallelizing this process we hope to show that larger number of designs can be computed than is
feasible serially on a single computer. This work also serves as a template for parallelizing tradespace
95
calculations in the cloud that future research can use.
4.3.1 Introduction to Changeability
An important concept to consider when designing large, complex systems is can we change the designs
we are evaluating in the face of certain disturbances or contexts. In this context, change can be defined
as the transition over time of a system to an altered state (Ross, et al. 2006). Changeability is the system
property that takes changes designs are capable of making into consideration and serves to improve the
lifetime value delivery of designs. For example, the ability to redesign a particular subsystem of a
satellite system in the event of an alteration in policy/requirements would represent design phase
changeability, whereas the ability to burn fuel and adjust the satellite system's orbit altitude would
correspond to operations phase changeability.
Adding changeability options into a design, however, typically comes with associated costs, including
development costs, physical build and inclusion costs, and potentially additional costs required to
exercise a change. While these costs for these change features are apparent and well known, the
benefits of changeability are significantly harder to capture, especially when the system's performance
or benefits are not readily monetized (Fitzgerald, 2012). Due to the difference in difficulty when it comes
to quantifying the usefulness of changeability versus the cost of changeability, it often becomes difficult
to justify the inclusion of changeability enabling features in real system design. Existing techniques for
valuing the benefits of changeability are useful for some applications but suffer under the burden of a
large number of assumptions that limits their general applicability to the broader tradespace exploration
problem. An improved means of valuation for changeability has the potential to allow changeability to
be considered more effectively in the early design phase of systems that could greatly benefit from
increased responsiveness to shifts in their operational context.
In order to effectively evaluate the changeability of designs in a general manner, there needs to be a
quantifiable measurement of changeability which can be compared across designs. When considering
changeability one must take into account the change paths of each design and their associated costs
which can be evaluated and compared in order to see how useful the change paths. One method of
effectively evaluating the Pareto optimal cost paths for a given context is by considering the multi-arc
change paths of each design. The multi-arc change paths for a design are the possible sets of transition
rules that can be utilized to change from one design to another through a specific combinations of rules.
The multi-arc change paths of a design are a quantity that can be used to assess, given a set of cost
constraints and a specific context, what the changeability of designs are. This means that with multi-arc
change paths, users are able see, for their current multidimensional budget, what change options exist
for each design and what are the associated cost tradeoffs are. Solving for multi-arc change paths
requires thinking about the tradespace of designs as a tradespace network graph. In this graph, each
vertex is a specific design and the edges of the graphs are transition rules as Shown in Figure 51.
96
Utiity
00
Utility
UNR$
0
0 0
cos
i Te
cost
cost
Figure 51. Tradespace modeled as a tradespace network graph through the use of transition rules
To better illustrate this concept, we will go through an example involving calculating the multi-arc paths
for an example SpaceTug tradespace similar to that in Section 2.4. The transition rules for the
tradespace network are as follows:
Table 13. Example set of transition rules for SpaceTug tradespace
Rule number
Description
change engine type
$ Cost
Time Cost
Rule 1: Engine change
used by the SpaceTug
(for example cryo->
biprop)
change the amount of
propellant in the
SpaceTug
$150M
7 months
$25M
5 months
Rule 2: Propellant
Rh2Pre
change
501
2
25, 5
25,5
3
507
Key:
Rule 1:
Rule 2:
-
For these transition rules, a simple tradespace network can be constructed as shown in Figure 52:
4
Figure 52. Example tradespace network graph for SpaceTug tradespace
This simple example demonstrates that through combinations of rules, designs can change to other
designs beyond their initial transition rule. In this example, through using a combination of rule 1, then
rule 2, design 1 is able to change to design 4, something that isn't obvious from just the transition rules.
Assuming that a user inputs that they can afford $300M and 15 months to change, using the data from a
multi-arc change path analysis would yield the following for design 1:
97
Table 14. Multi-arc change paths for design 1
Design #
End Design
Rule Path
Design Path
Path Cost
1
1
1
1
2
3
4
4
Rule
Rule
Rule
Rule
1,2
1,3
1,2,4
1,3,4
150,7
25,5
175,12
175,12
1
2
1, Rule 2
2, Rule 1
In order to calculate the multi-arc path for any given tradespace network, an algorithm was developed.
Given a set of adjacency matrix for each Rule:
S
2
Rule X
Rule 2:
Rule 1:
2
a
Osigi
ID
1
WiA
3
WA
450,12
200A
I
2
100.3
RiA
800.7
2
3
100.5
2
3_
WiA
2001
600.1
w4A
Figure 53. Example set of adjacency matrices for each rule
1. Generate a set of permutations for the desired minimum and maximum arc paths.
Example: Assuming 6 rules, a minimum arc length of 2 and a maximum arc length of 2
Permutations = [[1, 1], [1, 2], [1, 3], [1, 4], [1, 5], [1, 6], [2, 1], [2, 2], [2, 3], [2, 4], [2, 5], [2, 6],
[3, 1], [3, 2], [3, 3], [3, 4], [3, 5], [3, 6], [4, 1], [4, 2], [4, 3], [4, 4], [4, 5], [4, 6], [5, 1], [5, 2], [5, 3], [5,
4], [5, 5], [5, 6], [6, 1], [6, 2], [6, 3], [6, 4], [6, 5], [6, 6]]
2. The graph exploration pseudocode:
For each permutation:
For each design:
Traverse the permutation rule path if available (via checking the
adjacency matrix)
store the path created
sum each dimension of cost (from the adjacency matrix)
3. Pareto Frontier Change Paths
For each output path grouped by start design and end design
Prune the set of paths, only keeping the Pareto optimal paths
This implementation was originally done serially on a single computer and as such was infeasible for use
on large numbers of designs. Another issue with the previous iteration was that some of the all aspects
of the calculation were hard coded to a specific example which did not allow for reuse. In order to
compute the multi-arc change paths for tradespace on the order of 1,000,000 designs, we will need to
98
incorporate some of the previously discussed big data software principles into this changeability
calculation.
4.3.2 Distributed Computing for Changeability Calculations
As an example of how cloud computing can be leveraged to incorporate larger data sets into lVTea Suite
and allow for faster computation of parameters, we have implemented a distributed algorithm to
process the multi arc change paths for large datasets which previously would have been impossible due
to hardware constraints.
For our work, we parallelized the multi arc path calculation using Amazon EMR running MapReduce
processes and stored the output into Amazon S3. The code for this calculation can be seen in Appendix
B. The steps involved for the calculation, and storing of the multi-arc path information from a given
changeability matrix is shown in Figure 54:
EMI
Amazon
ValueMapbe
Figure 54. Diagram showing the user flow of the parallelizd multi-arc change path calculation
An overview of the flow of information across the various architectures used in the distributed
calculation can be seen in Figure 55:
99
Reduce E2
E2
E2
Sto
Map E 2
EC2
83
EMR
EC
$3
Bootstrapping
Input Files
Research
Computer
Launch
Figure 55. Diagram showing information piped throughout the architecture used in the multi-arc calculation
The entire process for calculating multi-arc change paths for a large set of designs and storing that
information in a database is as follows;
Step 1. Changeability Matrix import
Assuming that the changeability matrix was generated in MATLAB, a script was created to convert a
stored .MAT file into a Numpy matrix, and then pickles that sparse matrix to be used alongside each
MapReduce instance. Pickling is the standard mechanism for object serialization in Python. An example
show how the script is used is show below:
./matriximport.py input matrix.mat output filename
The script assumes a 2Dmatrix in the following format:
Table 15. 2D matrix format used to store the tradespace network graph
Rule2 ... Rule X
Rule 1
1
adjdesign
cost1
cost2...
costX
adjdesign
costi
cost2...
costX
2
adjdesign
cost
cost2....
costX
adjdesign
cost
cost2...
costX
This format was preferred over sparse matrices. This is because while the full 2D matrix representation
is larger to store, it has much faster index accesses as it is entirely in memory. Sparse matrices on the
other hand follow chains of pointers which, while causes many more cache invalidations and
dramatically slows down overall time. A version with sparse matrices was created but it is dramatically
100
slower for the aforementioned reasons and as such the 2D matrix input was preferred.
Step 2. Command line launch
After generating the pickle file containing the changeability matrix, mapReduce_arcpaths.py is run.
This script has the following flags:
Table 16. Command line flags used for running the multi-arc change path calculation
Flags
--numdesigns(integer)(required)
--num-rues(integer)(required
--use-jobpools(optional)
Step 3. Job Pool Creation (optional)
As the process for running MapReduce jobs on Amazon EMR requires booting up Amazon EC2 instances
and installing the appropriate software, it is typically better to create a job pool of servers that wait for
MapReduce jobs and stays online after completion waiting for more MapReduce jobs. Compare this to
the alternative case of launching a job and after completion, all of your servers terminate. This is
important to do if want to send multiple jobs to a cluster at once and have them be distributed or want
to send a series of jobs to clusters without waiting for the boot and installation time that goes into first
launching servers. Also given that charges on Amazon EC2 are based on hourly use, the ability to not
have to relaunch clusters can save a significant amount of money. A job pool is created if the
-use job pools flag is used when the mapReduce arc paths.py script is run. This launches the
requested number of designs (with the -num designs flag) of Amazon EMR instances.
By checking the Amazon EC2 console on Amazon AWS or using the command line aws tool with options:
aws ec2 describe-instances --output table
it is possible to see if your job pool is ready to. Once ready, a second call to the
mapReduce arc paths.py program with the --usejobpools again set will run the multi-arc change
paths MapReduce job on the Amazon EMR job pool we intitialized.
Step 4. Bootstrapping phase
In order to run the 'map' procedure for multi-arc change paths we will need to configure our Amazon
EC2 cluster with all of the required libraries and files needed for the MapReduce job to run properly. For
our purposes, as Amazon EMR comes preinstalled with NumPy, the fundamental package for scientific
computing with Python, the only bootstrapping we will need to do is to upload our pickle file to each
server along the Python file containing our map and reduce functions.
Step 5. MapReduce Job Phases
Step Sa. The Map Initialization Phase
Each mapper instance requires an adjacency list in order to traverse the graph to designs to design and
properly evaluate the multi-arc change paths. For this reason, a mapper initialization phase is required
in which the pickle file containing our adjacency list is loaded into the memory of each separate server.
101
Also in the initialization phase we calculate our permutation matrix which is used to enumerate all
possible rule paths that our designs can interact with. This initialization is required because the only
alternative to loading the list into memory prior to the map function running would be to load it into
memory for each run of the map function which would dramatically slow down our process with no
additional benefits.
Step 5b. The Map Phase
For the map phase of our multi-arc change path calculation, we do a process similar to that of the
process Fitzgerald outlined with some small modifications (Fitzgerald, 2012). The input into the map
function is a text file in which the lines are the design-id of the designs whose multi-arc path are to be
processed. This can be seen in Table 17:
Table 17. Input file into the
MapReduce
Mapper function where a single line is input to each mapper at a time
multi-arc-path-input.txt
0
1
3
4
5
999999
num designs (the inputted desired number of designs)
The steps for the map process can be seen in the following pseudocode:
map(String line):
//line: a single row from the input text file which
startDesign
=
int(line)
designPath
=
[startDesign]
//pathCosts is a list as there are multiple dimensions of cost
pathCosts =
[
for each rule permutation r in rule-permutations:
//follow the given rule path while summing the path costs
designPathToolMR(designPath, pathCosts)
//designPathToolMR puts the traversed path into designPath
//and the summed costs into pathCosts endDesign =
designPath[-1] Emitintermediate((startDesign,
endDesign,) pathCost)
The text file shown in Table 17 is split up into chunks which the map function reads in lines from. The
input file into our map function is a text file where each row of the file is the design-id of the design to
be executed for that map function, essentially a parameter to the function.
Step 5c. The Reduce Phase
The reduce phase of the process operates on the emitted key value pairs from our map phase. The
102
Hadoop Streaming protocol we are using sorts our keys lexically so that as detailed in Section 3.5.1 of
this paper. This means that the reducer receives (key, values) pairs not (key, value) pairs. For the
reduce phase of our process we take the outputted set of design, rulepaths/designpaths/costs and
reduce them to just the pareto optimal set based on the multi-dimensional cost.
reduce(String key, Iterator values):
// key: startDesign, endDesign
// values: a list of pathCosts
pareto front =
pareto frontier(list(values)) Emit((key),
pareto front)
4.3.3 Multi-Arc Cost Paths with MapReduce Results
In order to evaluate the performance of our parallelized process for calculating and storing the optimal
multi-arc change paths for a given input we measured the time it takes to perform the multi-arc change
path calculation for different numbers of designs across different cluster sizes. In order to do so,
random example graphs needed to be created in order to test the performance across various graph
sizes. In attempting maintain a graph structure that is similar to that used by Fitzgerald, the random
graphs were constructed based on a hypothetical randomized set of 6 transition rule paths which are
randomized each run (Fitzgerald, 2012). Each node in the graph has the full set of 6 ways as a way of
dictating worst case runtime for a rule set. This is because it is often the case that transition rules
increment or decrement design variables and as such each design is likely to have the full set of options
for its changeable design variables. As such we used the worst case, dense version of the graph to
simulate the most likely scenario. An example visualization of the randomized graphs used for 20
designs and 100 designs respectively can be seen in Figure 56.
20 Designs
100 Designs
Figure 56. Examples of the random graphs generated for testing the multi-arc change path calculation
The timing was done using pre-initialized Amazon EMR job pools of the corresponding number of EC2
instances. This was done in order to properly compare it the timing to the serialized, local case as server
booting time would detract from the actual calculation time. Timing began with the start of the
command line launch (Note: the job pools were already setup) of the python program and ended when
the cluster reported it was done with the MapReduce job, terminating the python program. The
103
Amazon EC2 'c3.xlarge' instance type was used for each test. The 'c3.xlarge' instance was chosen as it is
the smallest instance available that supports an SSD (Solid State Drive) which, given the heavily disk
based nature of MapReduce, dramatically increases performance. The graph of our results showing the
runtime of the launched MapReduce tasks can be seen in Figure 57.
300
1 Instance
2 Instances
5 Instances
10 Instances
20 Instances
-.
-.
. ---.250 F -.-.
0 Setup Overhead
200
(1)
------------.-.
- . .- ----. . . .- . .--
....................
150 F-
.............
100 I-................
.............
E
I-
...........
50 I
0
-
- 10^4
-...--
rl
IhjE~prnq
10A5
Designs
10^6
Figure 57. Graph results showing the runtime of the launched MapReduce tasks which solve the multi-arc change path
problem
From Figure 57, we can see that there is noticeable speedup in the multi-arc calculation as more and
more servers are added to the MapReduce cluster. For the case of 10A6 designs, there was a ~10X
speedup when comparing the single cluster instance versus the 20 cluster instance runs. This puts
processing millions of designs in a timespan that would be allowable for a rerun of for a single
tradespace exploration session. The speedup also appears to get more pronounced as the number of
designs increases. One reason for this could be that that for MapReduce jobs with a smaller number of
designs, the overhead of writing files to disk and moving data around as outlined in the MapReduce
process in Section 3.5.1 takes a considerable part of the total runtime. This part of the process can
actually increase with parallelization and so for small numbers of designs, the speedup is not as
noticeable as there is more overhead in the aforementioned areas. As the time required for the multiarc path calculation time increases relative to the startup time, the benefits of parallelization increases.
Due to the overhead of involved in setting up job pools, the bar graphs are divided into The graph shows
that every order of magnitude jump in design brings about an equivalent jump in processing time. This
104
makes sense because the arc length evaluated is constant for our runs (min_arc = 1, max arc = 3). As
such an increase in designs increases the number of designs processed by a constant amount. The light
green top portion of each graph indicates the amount of time used initializing the nodes of the cluster
while the bottom portion represents the actual EC2 runtime for each cluster. The top portion of the
graph increases with because larger there are larger transition rule matrices that are being uploaded to
the cluster.
105
106
Chapter 5: Discussion and Conclusions
After evaluating the available cloud technologies we have discovered that leveraging cloud technologies
allows IVTea Suite to better satisfy the needs of users. By expanding the hardware available for problem
solving and simulation, users can view and interact with vast amounts of design data in real time, in
order to reason about complex systems and compare designs across important lifecycle properties,
like changeability and survivability, which are often overlooked. The increase in speed allows users to
iteratively evaluate their design throughout the entire design process, incorporating newly obtained
information to improve the resulting decisions.
5.1 State of TSE Tools and IVTea Suite's Role
In Section 1.5.5, we analyzed the current state of tradespace exploration and compared the strengths
and weaknesses of a subset of current tools in performing tradespace analysis for large scale systems.
We determined that while IVTea Suite is not superior generally, it fills a valuable niche, providing an
easy way to bring cutting edge systems engineering techniques to bear on real-world problems. For
instance, IVTea Suite makes it easy to reason about changing contexts and value models, and provides a
workflow to guide the user through the design process.
5.2 Results and Recommendations
In Section 1.5.3, we suggested some current limitations of IVTea Suite: due to its single-threaded
implementation, it can only solve problems which can fit in the memory of a single machine, using local
storage and serial computation. This means that problems with a very large trade space cannot be
analyzed in real time. Even if the analysis is done off-line, rendering the results can be expensive enough
to make real-time visualization impossible. Using our improved Cloud IVTea Suite, we have effectively
infinite data storage, memory, and parallel computation. This does not address the problem of realtime rendering, but it greatly enhances the potential for real-time analysis, cutting the latency for large
problems from a matter of hours to a matter of minutes.
Table 18. Summary of advances made by Cloud IVTea Suite
Challenge
Data Storage
Computation Time
Current IVTea Suite
Limited to hard disk space
Limited to a single core
Problem Size for Solving
Limited by available local RAM
Rendering Time
Limited by MATLAB limitations
(1^OA6 for real-time
exploration through
visualizations)
Cloud IVTea Suite
Limited only by AWS costs
Scalable on-demand,
distributed across many cores
Scalable on-demand,
effectively unlimited
Limited by MATLAB limitations
(1^OA6 for real-time
exploration through
visualizations)
We discovered after exhaustive evaluation that no single cloud technology satisfies all the needs of
IVTea Suite's users, and so we recommend a composite cloud architecture, utilizing SQL, DocumentOriented, and Graph Databases, AWS, and both Apache Spark and Hadoop MapReduce (see Figure 44).
107
By using the aforementioned cloud architecture, IVTea Suite is now able to solve problems that were
once too large to compute in solve some of them in potentially real-time. We demonstrated this by
framing the multi-era affordability problem in a parallelizable manner using Markov Decision Processes,
Apache Spark, and a distributed matrix multiplication algorithm. When the outlined approach is setup
on top of the outlined highly scalable cloud architecture, it is possible to interrogate computationally
expensive lifecycle properties in real-time. We were able to implement and use a slightly simplified
version of the outlined architecture which was able to solve the multi-arc change path problem lIx
faster than it could have been done prior. Also the outlined framework is scalable on-demand and could
easily solve 10A7 and even 10A8 designs in a reasonable amount of time. Overall, this work shows that
adopting a cloud architecture into lVTea Suite can definitely be used to enhance the tradespace
exploration experience.
5.3 Future Work
While the work with calculating multi-arc change paths using MapReduce, Amazon EMR and Amazon S3
was promising, it is still lacking in the sense that MapReduce will not be able to provide real time
feedback from analysis of large datasets due to its primarily disk based and job scheduling architecture
as mentioned in Section 4.3.3 . For this reason the use of Apache Spark would serve as promising future
work as it could yield computation times to allow for real time large data interrogation and exploration
for various lifecycle properties. For example, the methods mentioned with Spark and Markov Decision
Processes in Section 4.1.2, could be used as potential scalable solutions to the multi-era affordability
with change paths problem. Graph databases such as Neo4j should also be explored in the future as
they are able to quickly store, retrieve, and traverse the paths generated by chaining epochs as well as
changing designs which can be represented as graph data structures.
While calculating large data sets is within the realm of possibility with the outlined architecture, actually
rendering those large datasets on screen is not possible within IVTea with MATLAB's built in GUI. For
this reason creating an alternative GPU accelerated GUI that can render on the order of 10A6 points
would very useful. Along with this future work could be done into properly indexing the database to
allow for fast range queries of data to allow for subsections of graphs to be rendered and allow for a
hierarchical binning of points as mentioned by Liu (Liu, et al. 2013).
Using massive parallelization, such as that provided by cloud technologies, would allow for
unprecedented increases in the scale of problems we can attack using methods like tradespace
exploration. This improvement in scalability makes for an exciting research direction for programs like
ERS, described in Section 1.6; the ability to incorporate ever larger amounts of data will enable more
informed decision making, and will improve the value and resiliency of the selected designs. Cloud IVTea
Suite will be of great use to decision makers and stakeholders in many large and complex domains.
108
Bibliography
Agarwal, Ramesh C., Fred G. Gustavson, and Mohammad Zubair. "A high-performance matrixmultiplication algorithm on a distributed-memory parallel computer, using overlapped
communication." IBM Journal of Research and Development 38.6 (1994): 673-681.
"Amazon EC2 Pricing." Amazon Web Services, Inc. Amazon.com, Inc. Web. 20 May 2015.
<http://aws.amazon.com/ec2/pricing/>.
"Amazon S3 Pricing." Amazon Web Services, Inc. Amazon.com, Inc. Web. 20 May 2015.
<http://aws.amazon.com/s3/pricing/>.
AMSC, N., and A. AREA HFAC. "Department of Defense Design Criteria Standard". Signal 44.5.3: 4. 1999.
Balabanov, Vladimir, Christophe Charpentier, D. K. Ghosh, Gary Quinn, Garret Vanderplaats, and
Gerhard Venter. "VisualDOC: A software system for general purpose integration and design
optimization." 9th AIAA/ISSMO Symposium on Multidisciplinary Analysis and Optimization. 2002.
Bellman, Richard. "A Markovian Decision Process". Journal of Mathematics and Mechanics 6. 1957.
Card, Stuart K., George G. Robertson, and Jock D. Mackinlay. "The information visualizer, an information
workspace." Proceedings of the SIGCHI Conference on Human factors in computing systems. 1991.
Carlson, Jean M., and John Doyle. "Highly optimized tolerance: Robustness and design in complex
systems." Physical Review Letters 84.11 (2000): 2529-2532.
Curry, Michael D., and Adam M. Ross. "Considerations for an Extended Framework for Interactive
Epoch-Era Analysis." Procedia Computer Science 44 (2015): 454-465.
Daskilewicz, Matthew J., and Brian J. German. "Rave: A computational framework to facilitate research
in design decision support." Journal of computing and information science in engineering 12.2. 2012.
de Weck, Olivier L., Adam M. Ross, and Donna H. Rhodes. "Investigating relationships and semantic sets
amongst system lifecycle properties (ilities)." Third international engineering systems symposium
CESUN. 2012.
Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large
clusters." Communications of the ACM 51.1 (2008): 107-113.
Diller, Nathan P. "Utilizing multiple attribute tradespace exploration with concurrent design for creating
aerospace systems requirements." Diss. Massachusetts Institute of Technology. 2002.
Fitzgerald, Matthew Edward. "Managing uncertainty in systems with a valuation approach for strategic
changeability." Diss. Massachusetts Institute of Technology. 2012.
Fitzgerald, Matthew E., and Adam M. Ross. "Controlling for Framing Effects in Multi-Stakeholder
Tradespace Exploration." Procedia Computer Science 28 (2014): 412-421.
109
Goerger, Simon R., Azad M. Madni, and Owen J. Eslinger. "Engineered Resilient Systems: A DoD
Perspective." Procedia Computer Science 28 (2014): 865-872.
Haerder, Theo, and Andreas Reuter. "Principles of transaction-oriented database recovery." ACM
Computing Surveys (CSUR) 15.4 (1983): 287-317.
Keeney, Ralph L. "Value-focused thinking: A path to creative Decisionmaking." Harvard University Press.
1994.
"Limits on Table Size." MySQL. Oracle Corporation. Web. 20 May 2015.
<https://dev.mysql.com/doc/refman/5.0/en/table-size-limit.html>.
Liu, Zhicheng, Biye Jiang, and Jeffrey Heer. "imMens: Real-time Visual Querying of Big Data." Computer
Graphics Forum. Vol. 32. No. 3pt4. Blackwell Publishing Ltd. 2013.
Marzi, Max. "Graph Database Use Cases." Neotechnology. Web. 20 May 2015.
<http://www.slideshare.net/maxdemarzi/graph-database-use-cases>.
Miller, R. B. "Response time in man-computer conversational transactions." Proc. AFIPS Spring Joint
Computer Conference Vol 33 (1968): 267-277
Neches, Robert. "Engineered Resilient Systems (ERS) S&T Priority Description and Roadmap." (2011).
Rhodes, Donna H., Ross, Adam M., and Daniel E. Hastings. "Responsive System Comparison Method for
Performance at Less Cost." DII Year Two. September 2010.
Rhodes, Donna H., and Ross, Adam M., " Engineered Resilient Systems - Systems Engineering:
Knowledge Capture and Transfer" Unpublished Manuscript. 2014.
Ricci, Nicola, Matthew E. Fitzgerald, Adam M. Ross, and Donna H. Rhodes. "Architecting Systems of
Systems with Ilities: An Overview of the SAI Method." Procedia Computer Science 28 (2014): 322331.
Ross, Adam Michael. "Multi-attribute tradespace exploration with concurrent design as a value-centric
framework for space system architecture and design." Diss. Massachusetts Institute of Technology.
2003.
Ross, Adam M., Daniel E. Hastings, Joyce M. Warmkessel, and Nathan P. Diller. "Multi-attribute
tradespace exploration as front end for effective space system design." Journal of Spacecraft and
Rockets 41.1 (2004): 20-28.
Ross, Adam M., and Daniel E. Hastings. "The tradespace exploration paradigm." INCOSE int Symp,
Rochester, NY. 2005.
Ross, Adam M., and Daniel E. Hastings. "Assessing changeability in aerospace systems architecting and
design using dynamic multi-attribute tradespace exploration." AIAA Space. 2006.
110
Ross, Adam M., and Donna H. Rhodes. "Using Natural Value-Centric Time Scales for Conceptualizing
System Timelines through Epoch-Era Analysis." INCOSE International symposium. Vol. 18. No. 1.
2008.
Ross, Adam M., "Insights from a Multisensory Tradespace Exploration Laboratory for Complex System
Selection." 2009 SEAri Annual Research Summit. 2009.
Ross, Adam M., Hugh L. McManus, Donna H. Rhodes, and Daniel E. Hastings. "Revisiting the tradespace
exploration paradigm: structuring the exploration process." AIAA Space. 2010.
Ross, Adam M., Donna H. Rhodes, and Matthew E. Fitzgerald. "Interactive Value Model Trading for
Resilient Systems Decisions." Procedia Computer Science 44 (2015a): 639-648.
Ross, Adam M. "lVTea Suite Summary" Unpublished Manuscript. 2015b.
Schaffner, Michael Andrew. "Designing systems for many possible futures: the RSC-based method for
affordable concept selection (RMACS), with multi-era analysis." Diss. Massachusetts Institute of
Technology. 2014.
Shishko, Robert, and Robert Aster. "NASA systems engineering handbook." NASA Special
Publication 6105. 1995.
Simpson, Timothy W., and Joaquim RRA Martins. "Multidisciplinary design optimization for complex
engineered systems: report from a national science foundation workshop." Journal of Mechanical
Design 133.10 (2011): 101002.
Spero, Eric, Christina L. Bloebaum, Brian J. German, Art Pyster, and Adam M. Ross. "A Research Agenda
for Tradespace Exploration and Analysis of Engineered Resilient Systems." Procedia Computer
Science 28 (2014): 763-772.
"SQL Syntax." W3Schools. Refsnes Data. Web. 20 May 2015.
<http://www.w3schools.com/sql/sqlsyntax.asp>.
Stump, Gary, Mike Yukish, Jay D. Martin, and T. W. Simpson. "The ARL trade space visualizer: An
engineering decision-making tool." 10th AIAA/ISSMO Multidisciplinary Analysis and Optimization
Conference. Vol. 30. 2004.
Stump, Gary, Sara Lego, Mike Yukish, Timothy W. Simpson, and Joseph A. Donndelinger. "Visual steering
commands for trade space exploration: User-guided sampling with example." Journal of Computing
and Information Science in Engineering 9.4 (2009): 044501.
Suri, Siddharth, and Sergei Vassilvitskii. "Counting triangles and the curse of the last
reducer." Proceedings of the 20th international conference on World Wide Web. ACM. 2011.Tiwari,
Santosh, Hong Dong, Brian C. Watson, and Juan P. Leiva. "VisualDOC: New Capabilities for
111
Concurrent and Integrated Simulation Design." 13th AIAA/ISSMO Multidisciplinary Analysis
Optimization Conference. 2010.
U.S. Government Accountability Office. "Many Analyses of Alternatives Have Not Provided a Robust
Assessment of Weapon System Options (GAO 09-665)." Washington, DC: U.S. Government Printing
Office. 2009.
Wu, Marcus Shihong, Adam M. Ross, and Donna H. Rhodes. "Design for Affordability in Complex Systems
and Programs Using Tradespace-based Affordability Analysis." Procedia Computer Science 28
(2014): 828-837.
Zaharia, Matei, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley,
Michael J. Franklin, Scott Shenker, and Ion Stoica. "Resilient distributed datasets: A fault-tolerant
abstraction for in-memory cluster computing." Proceedings of the 9th USENIX conference on
Networked Systems Design and Implementation. USENIX Association. 2012.
112
Appendix A: Markov Decision Process Example Code
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
Jsc = [0 0 0 15 5 0]'; % Cost in money of being in state i
Jst = [0 0 0 25 0 0]'; % Cost in time of being in state i
Jtc = [0 1 20;
0 0 20;
0 0 0]; % Cost in money of transitioning from design i to design j
Jtt = [0 5 50;
3 0 50;
3 6 0]; % Cost in time of transitioning from design i to design j
a=.1; % Tradeoff between time and money
numstates = 6;
num designs = 3;
Jc = zeros(6,6); % Aggregate $ cost matrix
Jt = zeros(6,6); % Aggregate time cost matrix
for sp=1:numstates
for s=1:numstates
d = mod(s-1,num designs)+1;
dp = mod(sp-1,num.designs)+1;
Jc(sp,s)=Jtc(d,dp) + Jsc(sp);
Jt(sp,s)=Jtt(d,dp) + Jst(sp);
end
end
J
=
a
*
Jc
+ (1-a) *
Jt;
pf = .2; % probability of a solar flare
p=zeros(6,6,3); % Prob of transitioning to state i from state j given action k
p(1,:,1)=1-pf;
p(4, :,1)=pf;
p(2,:,2)=1-pf;
p(5, :,2)=pf;
p(3, :,3)=1-pf;
p(6,:,3)=pf;
%% Finite horizon
T = 30; % Length of horizon (number of steps to plan into the future)
V = zeros(6,T);
Vc = zeros(6,T);
Vt = zeros(6,T);
pi= zeros(6,T);
for t=T-1:-1:1
Jn = zeros(6,3);
Jnc=Jn;
Jnt=Jn;
for s=1:6
for a=1:3
for sp=1:6
Jn(s,a)=Jn(s,a)+p(sp,s,a)*(V(sp,t+1)+J(sp,s));
Jnt(s,a)=Jnt(s,a)+p(sp,s,a)*(Vt(sp,t+1)+Jt(sp,s));
enc(s,a)=Jnc(s,a)+p(sp,s,a)*(Vc(sp,t+d)+Jc(sp,s));
end
end
end
[V(:,t), pi(:,t)]=min(Jn,[],2);
for s=1:6
Vc(s,t)=Jnc(s,pi(s,t));
Vt(s,t)=Jnt(s,pi(s,t));
end
end
% The result
V % Cost to go from a state i at time step j
pi % Optimal action from state i at time step j
113
.....
......
..............
...
..................
......
...........
.......
59.
60.
61.
62.
63.
64.
65.
66.
67.
%%
Cc = zeros(num-states,1);
Ct = zeros(num-states,1);
for t=1:T-1
for s = 1:numstates
for sp = 1:numstates
Cc(s) = Cc(s)+p(sp,s,pi(st))*Jc(sp,s);
Ct(s) = Ct(s)+p(sp,s,pi(s,t))*Jt(sp,s);
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
81.
82.
83.
84.
85.
86.
87.
88.
89.
90.
91.
92.
93.
94.
end
end
end
%% Infinite horizon
g = .9; % discount factor
V = zeros(6,1);
pi= zeros(6,1);
delta = inf;
iteration = 1;
while delta > le-6 && iteration < 1000
Jn = zeros(6,3);
for s=1:6
for a=1:3
for sp=1:6
Jn(s,a)=Jn(s,a)+p(sp,s,a)*(g*V(sp)+4sp,s));
end
%
end
end
[Vn, pi]=min(Jn,[],2);
delta = sum(abs(Vn-V));
iteration = iteration + 1;
fprintf('%04d: %g\n',iteration,delta);
V=Vn;
end
% The result
V % Cost to go from a state i
pi % Optimal action from state i
95. %%
96. Cc = zeros(numstates,1);
97. Ct = zeros(numstates,1);
98. gt = g;
99. iteration = 1;
while gt > le-16 && iteration < 1000
100.
for s = 1:numstates
101.
for sp = 1:numstates
102.
Cc(s) = Cc(s)+gt*p(sp,s,pi(s))*c(sp,s);
103.
Ct(s) = Ct(s)+gt*p(sp,s,pi(s))*Jt(sp,s);
104.
end
105.
end
106.
107.
gt=gt*g;
iteration = iteration + 1;
108.
109.
end
114
Appendix B: Multi-Arc Change Path MapReduce Code
1.
2.
#!/usr/bin/python
import mrjob
3.
from mrjob.job import MRJob
import itertools
4.
5.
import cPickle as pickle
6.
import pdb
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
import numpy as np
import scipy.sparse
from pareto import paretofrontier
def createRulePermutations(minArc, maxArc, numrules):
ruleslist = range(num rules)
permMat = []
for arc in range(1,maxArc+1):
permMat.append([list(p) for p in itertools.product(ruleslist, repeat=arc)])
return permMat
17.
18. def designPathToolMR(designPathln, curCosts,remainingRulePath,fullRulePath,Tcost,allDes
ignPaths, numcosts = 2, numjindices = 1):
if remainingRulePath:
19.
currRule = remainingRulePath.pop(0)
20.
currDesign = designPathIn[-1]
21.
newCosts = curCosts[:]
22.
endstates = Tcost[currDesign][currRule*(numcosts+numindices)]
23.
for nextDesignId in [endstates]:
24.
if nextDesignId not in designPathIn:
25.
newDesignPathIn = designPathIn[:]
26.
newDesignPathIn.append(int(nextDesignId))
27.
newCosts[0] += Tcost[currDesign][currRule*(num_costs+numindices)+1]
28.
newCosts(1] += Tcost[currDesign][currRule*(num_costs+numjindices)+2]
29.
designPathToolMR(newDesignPathIn,newCosts,remainingRulePath[:],fullRulePath
30.
Tcost, allDesignPaths)
else:
31.
start = designPathIn[0]
32.
finish = designPathIn[-1]
33.
allDesignPaths.append(((start,finish), (designPathIn, fullRulePath, curCosts)))
34.
35.
)
36. class MRChangeability(MRJob):
def configure options(self):
37.
super(MRChangeability, self).configureoptions()
38.
self.addfileoption('--cost-file', dest='cost file',
39.
default=None, action="append")
40.
def mapperinit(self):
41.
self.numcosts = 2
42.
self.indices = 1
43.
self.minArc = 1
44.
self.maxArc = 3
45.
self.numrules = 6
46.
self.Tcost = pickle.load( open( self.options.cost_file[e], "rb" )
47.
self.permMat = createRulePermutations(self.minArc, self.maxArc, self.numrules)
48.
49.
50.
51.
52.
53.
54.
55.
def mapper(self, _, line):
startDesign = int(line)
numcosts = 2 #cost, time
allDesignPaths = []
curCosts = [0] * numcosts
for ruleSet in range(self.minArc-1,self.maxArc):
115
for rulePath in self.permMat[ruleSet]:
designPathToolMR([startDesign],curCosts,rulePath[:],rulePath[:],self.Tcost, all
DesignPaths)
return allDesignPaths
58.
56.
57.
59.
60.
61.
62.
63.
def reducer(self, key, paths):
pfront = []
p_front = paretofrontier(list(paths), maxX = False, maxY = False)
yield(key, pfront)
64.
== 'main_':
65. if _name
66.
MRChangeability.run()
116