Visualization Considerations for Interactive EpochEra Analysis by Varsha J. Raghavan S.B., Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 2014 Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of Master of Engineering in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology June 2015 ©2015 Massachusetts Institute of Technology All rights reserved. Author: Department of Electrical Engineering and Computer Science May 22, 2015 Certified by: Dr. Adam M. Ross Research Scientist, Engineering Systems Lead Research Scientist, Systems Engineering Advancement Research Initiative Thesis Supervisor May 22, 2015 Certified by: Dr. Donna H. Rhodes Principal Research Scientist and Senior Lecturer, Engineering Systems Director, Systems Engineering Advancement Research Initiative Thesis Co-Supervisor May 22, 2015 Accepted by: Prof. Dennis M. Freeman Chairman, Masters of Engineering Thesis Committee 1 2 Visualization Considerations for Interactive Epoch-­‐Era Analysis by Varsha J. Raghavan Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of Master of Engineering in Electrical Engineering and Computer Science Abstract Epoch-Era Analysis (EEA) is a quantitative analysis approach that allows decision-makers to evaluate performance of design alternatives over a set of possible futures. To address the computational and cognitive burden arising from a potentially unlimited number of futures, the insertion of human-in-the-loop interaction into certain modules of EEA, or Interactive Epoch-Era Analysis (IEEA), was proposed. This thesis discusses user goals of these modules as well as principles of data visualization and user interface design, and evaluates the functionality and usability of selected visualizations and interfaces accordingly in the context of these IEEA modules. Thesis Supervisor: Adam M. Ross Title: Research Scientist, Engineering Systems, Systems Engineering Advancement Research Initiative 3 4 Acknowledgments The author wishes to thank a number of individuals without whom this work would not be possible: Most importantly, to Dr. Adam Ross and Dr. Donna Rhodes, thank you both so much for allowing me the opportunity to join SEAri and for giving me the necessary guidance, tools, support, and prodding to accomplish all I did. Thanks also to all SEAri students who lent me their time and advice over my year here, especially Mike, Matt, and Paul for giving me all the answers I asked (and didn’t ask) for. Thank you to all my friends and classmates here (especially Rachael!) that have pretty much kept me sane and motivated to keep going throughout MIT. Lastly, to my mom and dad, thank you for all of your endless love and encouragement, and for never giving up on me. 5 6 Table of Contents Visualization Considerations for Interactive Epoch-­‐Era Analysis ....................................... 3 Abstract ................................................................................................................................................... 3 Acknowledgments ............................................................................................................................... 5 Table of Contents ................................................................................................................................. 7 List of Figures ...................................................................................................................................... 10 List of Tables ........................................................................................................................................ 13 Chapter 1: Introduction ................................................................................................................... 15 1.1 Background ............................................................................................................................................. 15 1.2 Thesis Overview .................................................................................................................................... 19 Chapter 2: Functionality Criteria .................................................................................................. 21 2.1 Sampling Module Activities ............................................................................................................... 21 2.1.1 Epoch Sampling Submodule ......................................................................................................................... 21 2.1.2 Era Sampling Submodule ............................................................................................................................... 22 2.2 Analysis Module Activities ................................................................................................................. 23 2.2.1 Single-­‐Epoch Analysis Submodule ............................................................................................................. 23 2.2.2 Multi-­‐Epoch Analysis Submodule .............................................................................................................. 24 2.2.3 Single-­‐Era Analysis Submodule ................................................................................................................... 25 2.2.4 Multi-­‐Era Analysis Submodule .................................................................................................................... 25 Chapter 3: Usability Criteria ........................................................................................................... 27 3.1 Overall Design Considerations ......................................................................................................... 27 3.2 Usability .................................................................................................................................................... 28 3.2.1 Learnability .......................................................................................................................................................... 29 3.2.2 Efficiency ............................................................................................................................................................... 29 3.2.3 Error-­‐Tolerance ................................................................................................................................................. 30 Chapter 4: Visual Analytics ............................................................................................................. 31 4.1 Geometric Visualizations .................................................................................................................... 32 4.1.1 Scatterplots .......................................................................................................................................................... 32 4.1.2 Line Graphs .......................................................................................................................................................... 36 4.1.3 Parallel Coordinate Plots ............................................................................................................................... 36 4.1.4 Force Diagrams (Interactive) ....................................................................................................................... 38 4.1.5 Sankey Diagrams ............................................................................................................................................... 39 4.2 Pixel-­‐Based Visualizations ................................................................................................................. 40 4.2.1 Pixel Bar Charts .................................................................................................................................................. 40 4.2.2 Color Mapping (Heatmaps) ........................................................................................................................... 41 4.3 Icon-­‐Based Visualizations .................................................................................................................. 42 4.3.1 Star Plots ............................................................................................................................................................... 42 7 4.3.2 Chernoff Faces .................................................................................................................................................... 43 4.3.3 Stick Figures ........................................................................................................................................................ 44 4.3.4 Color/Shape Icons ............................................................................................................................................. 45 4.4 Hierarchy-­‐Based Visualizations ....................................................................................................... 45 4.4.1 Hierarchical Axes ............................................................................................................................................... 45 4.4.2 Trees ....................................................................................................................................................................... 45 4.4.3 Treemaps .............................................................................................................................................................. 46 4.4.4 Circle Packing ...................................................................................................................................................... 47 4.5 Data Interaction Techniques ............................................................................................................. 49 4.5.1 Drag and Drop ..................................................................................................................................................... 49 4.5.2 Selection ................................................................................................................................................................ 50 Chapter 5: Functionality and Usability Examination of IEEA Epoch Sampling Submodule ............................................................................................................................................ 57 5.1 Functionality ........................................................................................................................................... 57 5.1.1 Scatterplots/Bubble Charts .......................................................................................................................... 57 5.1.2 Parallel Coordinate Plots ............................................................................................................................... 58 5.1.3 Trees ....................................................................................................................................................................... 59 5.1.4 Treemaps and Circle Packing ....................................................................................................................... 60 5.1.5 Evaluation Summary ........................................................................................................................................ 62 5.2 Implementation ..................................................................................................................................... 64 5.3 Usability .................................................................................................................................................... 66 5.3.1 Learnability .......................................................................................................................................................... 66 5.3.2 Efficiency ............................................................................................................................................................... 67 5.3.3 Error-­‐Tolerance ................................................................................................................................................. 67 Chapter 6: Functionality Examination for Other IEEA Submodules ................................. 69 6.1 Era Sampling ........................................................................................................................................... 69 6.1.1 Parallel Coordinates ......................................................................................................................................... 69 6.1.2 Sankey Diagrams ............................................................................................................................................... 70 6.1.3 Tree Structures ................................................................................................................................................... 71 6.1.4 Bar Chart Icons ................................................................................................................................................... 72 6.1.5 Drag and Drop ..................................................................................................................................................... 73 6.1.6 Evaluation Summary ........................................................................................................................................ 74 6.2 Multi-­‐Epoch Analysis ............................................................................................................................ 76 6.2.1 Scatterplot Variants .......................................................................................................................................... 76 6.2.2 Epochs as Parallel Coordinates ................................................................................................................... 78 6.2.3 Circular Extensions ........................................................................................................................................... 78 6.2.4 Evaluation Summary ........................................................................................................................................ 79 6.3 Single-­‐Era Analysis ............................................................................................................................... 80 6.3.1 Designs as Trees ................................................................................................................................................ 81 6.3.2 Line Graphs .......................................................................................................................................................... 82 6.3.3 Parallel Coordinates ......................................................................................................................................... 83 6.3.4 Scatterplot Matrices ......................................................................................................................................... 84 8 6.3.5 Evaluation Summary ........................................................................................................................................ 84 6.4 Multi-­‐Era Analysis ................................................................................................................................. 86 6.4.1 Line Graph Matrix ............................................................................................................................................. 86 6.4.2 Line Graphs to Represent One Design ...................................................................................................... 87 6.4.3 Sankey Diagrams ............................................................................................................................................... 88 6.4.4 Evaluation Summary ........................................................................................................................................ 89 Chapter 7: Discussions and Conclusion ...................................................................................... 91 Bibliography ........................................................................................................................................ 93 Appendix ............................................................................................................................................... 97 9 List of Figures Figure 1-1: Activities involved in Epoch-Era Analysis (taken from Curry 2015) ....................... 17 Figure 2-1: Illustration of the concept of Fuzzy Pareto Optimality, where K is the level of “fuzziness” applied to the Pareto front (left) to create the Fuzzy Pareto Front (shaded area, right). Graphic taken from (Schaffner et al., 2014) .............................................................. 24 Figure 4-0: The visual analytics process, taken from (Keim, et al. 2010). ................................... 31 Figure 4-1: Example of scatterplot ............................................................................................... 33 Figure 4-2: Example of bubble chart ............................................................................................ 34 Figure 4-3: Scatterplot Matrix of a 6-dimensional car dataset, with variables plotted pairwise, from (Hoffman, 1999)........................................................................................................... 35 Figure 4-4: Scatterplot Matrix with histograms plotted along diagonal, from (Grinstein, 2001) . 35 Figure 4-5: Line graph with multiple lines, from (Wallace 2004) ............................................... 36 Figure 4-6: Example of Parallel Coordinate Plot, from Wikipedia .............................................. 37 Figure 4-7: Polar chart showing Iris Flower dataset (left) and RadViz showing example car dataset (right). Both images taken from (Hoffman 1999)..................................................... 38 Figure 4-8: Force-Directed Graph depicting character co-occurrence in “Les Miserables,” from (Bostock, 2012) ..................................................................................................................... 39 Figure 4-9: Sankey diagram showing a possible scenario for UK energy production and consumption in 2050, with supply on the left and demands on the right, from (Bostock, 2012) ..................................................................................................................................... 40 Figure 4-10: Equal-height pixel bar chart with color encoding different attributes, from (Chan, 2006) ..................................................................................................................................... 41 Figure 4-11: Heatmaps encoding data in every pixel. Random data set encoded into a 10x10 pixel square (left) from (Grinstein, 2001), and local thermal power data encoded into a map of the whole US (right) from (ICM Consulting 2015).......................................................... 42 Figure 4-12: 36 twelve-dimensional data points represented as star plots, and organized by “weight” (bottom variable), from (Friendly, 1991) .............................................................. 43 Figure 4-13: Different Chernoff facial features (left) and Chernoff faces plotted in various 2D positions on a scatterplot (right), taken from (Chan, 2006) .................................................. 44 Figure 4-14: A family of 12 stick figures (left) and a scatterplot of stick figures (right), taken from (Liu 2014)..................................................................................................................... 44 Figure 4-15: Splitting scheme of hierarchical axes (left) next to the final histograms-withinhistograms matrix visualization (right), from (Chan, 2006) ................................................. 45 Figure 4-16: Example unlabeled tree visualization, from (BigML Blog 2012) ........................... 46 Figure 4-17: Example treemap of country population by continent, from (Veroy 2013)............. 47 Figure 4-18: Example circle packing layout from Mike Bostock’s website ................................ 48 Figure 4-19: An example resizable object, denoted by the grooved “grippable” corner .............. 50 Figure 4-20: An example of checkboxes vs. radio buttons, from (Lepofsky 2015) ..................... 50 Figure 4-21: An example of toggle switches, from XOO.me design directory ............................ 51 10 Figure 4-22: Examples of filters for different types of variables: Size, Designer, and Color all allow discrete selection of respective values (numerical and categorical), and the slider (bottom right) allows selection for the continuous variable Price (as shown here, the selection allows values from $0-$750). All taken from an actual retail website, Rent the Runway ................................................................................................................................. 52 Figure 4-23: An example of data brushing, taken from Mike Bostock’s website. The data was selected in the top-left box, and is colored the same across all scatterplots in the matrix. ... 53 Figure 5-1: Example of IEEA Epoch Sampling implemented as a scatterplot. The epoch variables were “Tech Level,” with values “future” or “present,” and “User Preference,” with values 18............................................................................................................................................. 58 Figure 5-2: Example of IEEA Epoch Sampling sketched as a Parallel Coordinate Plot .............. 59 Figure 5-3: Example of IEEA Epoch Sampling on NGSC data implemented as a Tree .............. 60 Figure 5-4: Treemap visualization of NGCS epochs ................................................................... 61 Figure 5-5: Circle packing visualization of NGCS epochs as seen at different zoom levels ....... 62 Figure 5-6: Start state of Epoch Sampling interface ..................................................................... 64 Figure 5-7: IDs of selected epochs (top) along with current state of top-right box of interface displaying fraction of epochspace selected (12 epochs out of 108 total; bottom) ................ 65 Figure 5-8: Partially expanded tree (using NGCS database epoch variables/values) in implemented interface ........................................................................................................... 65 Figure 5-9: The bottom box of the Epoch Sampling interface in “SELECT mode” .................... 66 Figure 6-1: Automatically enumerated era set represented as a Sankey diagram of epoch flows.71 Figure 6-2: A single era represented as a series of epochs along a single axis............................. 72 Figure 6-3: An unorganized set of 7 eras (represented by bar chart icons) with hue, color, and orientation as additional encoding. ....................................................................................... 73 Figure 6-4: Sketch of leveraging drag-and-drop and resizing functionalities for manual era creation .................................................................................................................................. 74 Figure 6-5: Bubble chart paired with parallel coordinates that allow user to choose which attributes to plot (taken from Rhodes and Ross, 2015) ......................................................... 77 Figure 6-6: Parallel coordinate plot showing Fuzzy Pareto Number of 7 designs (horizontal lines) being plotted over 6 epochs (vertical axes)........................................................................... 78 Figure 6-7: Parallel sets (Sankey) visualization of designs following a changeability strategy from frame to frame (i.e. every transition), from (Schaffner 2014). Color coded by start design, horizontal line size “reflects the proportion of clips in which the corresponding design number appears in that frame” .................................................................................. 82 Figure 6-8: Two line graphs showing MAU (left) and MAE (right) of 6 designs over the course of a 4-epoch era (Epoch 1: 3 yrs, Epoch 2: 3 yrs, Epoch 3: 2 yrs, Epoch 4: 2yrs), from (Schaffner 2014) ................................................................................................................... 83 Figure 6-9: Two layouts of a scatterplot matrix showing a four-epoch era .................................. 84 Figure 6-10: A selection of four eras displayed in a line graph matrix ........................................ 87 11 Figure 6-11: Line graph showing trajectories (in terms of MAU) for one design (and subsequent changes/options) across 3 eras .............................................................................................. 88 12 List of Tables Table 4-1: Summary of visualizations presented in this chapter (Sections 4.1-4.4). Includes visualization names, brief notes about their major strengths/capabilities, the number of dimensions supported (either a number or number range, multidimensional [meaning 2+], or in the case of Force Diagrams, not applicable), the variable types supported (Discrete, Continuous, or Any), the dataset size supported (Small, Med, Large, or Any), and the types of interactions supported from those presented in Section 4.5. ............................................ 54 Table 5-1: Summary of characteristics for each visualization (Sec. 5.1.1-5.1.4), with best alternative for each row underlined. “Fine” represents a visualization is passable – not helpful, but not unhelpful. ..................................................................................................... 63 Table 6-1: Summary of characteristics for each visualization (Sec. 6.1.1-6.1.4), with best alternative for each row underlined. “Fine” represents a visualization is passable – not helpful, but not unhelpful. ..................................................................................................... 75 Table 6-2: Summary of characteristics for each visualization (Sec. 6.2.1-6.2.3)......................... 80 Table 6-3: Summary of characteristics for each visualization (Sec. 6.3.2-6.3.4)......................... 85 Table 6-4: Summary of characteristics for each visualization (Sec. 6.4.1-6.4.3)......................... 90 13 14 Chapter 1: Introduction Systems engineering, according to the International Council on Systems Engineering (INCOSE), is “an interdisciplinary approach and means to enable the realization of successful systems.”1 Such systems can range from spacecraft manufacture to computer chip design, from impacting hundreds to millions of people, locally or globally. The field encompasses the design, operation, and management of these systems, focusing on customer needs and required functionality before proceeding with manufacture. In order to start designing these systems, a number of decisions must be made about the specifications of the system. There are several methods for analyzing and choosing from various available design alternatives, and one or more could be utilized depending on the type of problem, time and resources available, end goal, etc. Regardless of the method(s) used, it is advantageous for the decisionmaker to leverage computer software to perform the analysis, as machines are able to process and display information much faster than humans can manually. However, for a computer program to be useful, it must not only have the ability to correctly perform the tasks the user requires (functionality), but also the user must be able to understand how to use it in order to optimize satisfaction and results (usability). This thesis will draw principles of data visualization and user interface design from the field of computer science to address these two properties in depth and examine them in the context of Epoch-Era Analysis, a systems engineering framework for decision-making in across uncertain futures, described in the next section. 1.1 Background Traditional systems engineering analysis approaches develop system specifications under the assumption of relatively static environments and stakeholder needs. Cost-benefit analysis, for example, in which a decision-maker assigns a measure of “utility” and “cost” to each design alternative and selects designs in the so-called “tradespace” that maximize utility while minimizing cost (designs on, or close to, the Pareto front), allows system designers to optimize the system as defined in the current context. Unfortunately, this approach does not account for the changes in environment and stakeholder needs that almost inevitably will occur over the entire life cycle of a system. Clients will likely change their minds about what they want (i.e. needs) or where they want to use it (i.e. context), resulting in “change requests” that systems engineers operating under the aforementioned assumption will likely not have accounted for in their previously developed specifications. This illustrates that analyses performed under the assumption of static context and needs do not ensure that a system will continually deliver value and meet stakeholder expectations in the face of changing contexts and needs throughout its life cycle. To address this problem, Ross and 1 "What Is Systems Engineering?" INCOSE. International Council on Systems Engineering, 14 June 2004. <http://www.incose.org/practice/whatissystemseng.aspx>. 15 Rhodes, of MIT’s Systems Engineering Advancement Research Initiative (SEAri), introduced Epoch-Era Analysis (EEA), an analysis approach that allows decision-makers to evaluate performance of design alternatives over a set of possible futures (Ross 2006; Ross and Rhodes 2008)2,3. The fundamental unit of EEA is the epoch, a period of time with fixed context (e.g. political, economic) and needs (of any stakeholders). Each epoch can be described with a combination of epoch variables, which represent important uncertainty factors in contexts and needs that could potentially affect system performance. For example, a case study commonly used (seen in Fitzgerald, et al. 20124,5,6; Fulcoly et al. 20127) involves the design of a space tug, with two defined epoch variables: Technology Level and User Preference. There are two values of Technology Level (Present and Future contexts) and eight different user preferences (or sets of stakeholder needs), making for sixteen total epochs in this study. An ordered string of epochs, each with a defined duration, is called an era. Once a set of epochs or eras is generated, users may compare and evaluate designs in different epochs or eras. Figure 1-1 shows the activities involved in EEA: Once users have identified decisions to be made and all relevant epoch and design variables (“Problem Definition” and “Design Formulation”), they can create epochs through enumeration and selection of epoch variables (“Epoch Characterization”). If users are planning on performing era analysis, they must also construct eras by selecting and ordering epochs (“Era Construction”). After developing models through which to evaluate designs in each epoch (e.g. a measure of utility or expense, “Design-EpochEra Evaluations”), users can perform Single- or Multi- Epoch or Era Analysis to better understand how the designs they selected will fare over the set of possible futures they selected. As each of these analyses has slightly different requirements and goals (discussed more in the next chapter), it is important that they each be represented and handled accordingly. 2 Ross, A.M., “Managing Unarticulated Value: Changeability in Multi-Attribute Tradespace Exploration,” PhD thesis, MIT Engineering Systems Division, June 2006. 3 Ross, A.M., and Rhodes, D.H., "Using Natural Value-centric Time Scales for Conceptualizing System Timelines through Epoch-Era Analysis," INCOSE International Symposium 2008, Utrecht, the Netherlands, June 2008 4 Fitzgerald, M.E., Ross, A.M., and Rhodes, D.H., "Assessing Uncertain Benefits: a Valuation Approach for Strategic Changeability (VASC)," INCOSE International Symposium 2012, Rome, Italy, July 2012. 5 Fitzgerald, M.E. and Ross, A.M., "Mitigating Contextual Uncertainties with Valuable Changeability Analysis in the Multi-Epoch Domain," 6th Annual IEEE Systems Conference, Vancouver, Canada, March 2012 6 Fitzgerald, M.E. and Ross, A.M., "Sustaining Lifecycle Value: Valuable Changeability Analysis with Era Simulation," 6th Annual IEEE Systems Conference, Vancouver, Canada, March 2012. 7 Fulcoly, D.O., Ross, A.M., and Rhodes, D.H., "Evaluating System Change Options and Timing using the Epoch Syncopation Framework," 10th Conference on Systems Engineering Research, St. Louis, MO, March 2012. 16 Figure 1-1: Activities involved in Epoch-Era Analysis (taken from Curry 2015) Epoch-Era Analysis was originally introduced to provide an extension to single-context tradespace exploration. Recently, many case studies using parts, or all, of Epoch-Era Analysis have been conducted on different datasets for different purposes, including (Fulcoly 2012)8, (Pina 2009)9, (Rader 2014)10, and (Schaffner 2014)11, where EEA has not only been demonstrated to help choose best designs over a whole system lifecycle, but also to gain insights about different characteristics of the tradespace and effects of different design attributes and futures. Schaffner (2014) shows that the number of possible epochs and eras generated by enumerating epoch variables can quickly exceed a feasible number for users to explore: If an epoch can be described with V epoch variables, each Vi of which has Li levels, the number of total epochs a system can experience, NEpochs, can be described as: NEpochs = !!! !!! 𝐿! To then construct an era, a number of these NEpochs epochs must be selected, ordered, and assigned durations. Even if we simplify the process by allowing only a selection of n epochs and assuming that all epochs have the same duration, the total number of possible eras a system can experience (assuming any epoch can transition to any other epoch), NEras, is at most: ! !!! 𝑁!"#$ = 𝑁!"#$!! ! = 𝐿! !!! It should follow that the size of the era space is necessarily greater than or equal to the size of the epoch space. Schaffner’s example of a model of 5 epoch variables with 3 levels by these calculations resulted in NEpochs = 243 possible epochs, and, assuming n is in the range of 15 to 20, a maximum of between NEras = 6 x 1035 and NEras = 5 x 1047 possible eras on which to perform 8 Fulcoly, D.O., Ross, A.M., and Rhodes, D.H., "Evaluating System Change Options and Timing using the Epoch Syncopation Framework," 10th Conference on Systems Engineering Research, St. Louis, MO, March 2012. 9 Pina, A.L. “Applying Epoch-Era Analysis for Homeowner Selection of Distributed Generation Power Systems,” Master of Science Thesis, Engineering and Management, Massachusetts Institute of Technology, June 2014. 10 Rader, A.A., Ross, A.M., and Fitzgerald, M.E., "Multi-Epoch Analysis of a Satellite Constellation to Identify Value Robust Deployment across Uncertain Futures," AIAA Space 2014, San Diego, CA, August 2014. 11 Schaffner, M.A., “Designing Systems for Many Possible Futures: The RSC-based Method for Affordable Concept Selection (RMACS), with Multi-Era Analysis,” Master of Science Thesis, Aeronautics and Astronautics, Massachusetts Institute of Technology, June 2014. 17 analyses. If the simplifying assumptions are removed, this maximum number of possible eras can grow potentially boundlessly. The availability of so many epochs and eras could potentially result in biased or uninformative analysis. To address this, Curry et al. have proposed a framework for Interactive Epoch-Era Analysis (IEEA), in which certain EEA activities are performed with human feedback, as seen in Figure 1-2. Curry and Ross hypothesized that this interactivity would enable improved decisionmaking intuition and insight, as well as “intelligently limit the potential unbounded growth in the epoch/era space” (Curry 2015)12. Figure 1-2: A framework for Interactive Epoch-Era Analysis, showing five “modules” with human feedback (taken from Rhodes and Ross 2015)13 This framework can more easily be abstracted into six main modules: Elicitation of relevant epoch and design variables (often through interview), Generation of all epochs and design tradespaces (often including enumeration), Sampling of epochs and eras in which to evaluate design choices, Evaluation of designs in sampled subset of epochs and eras Analyses of design choices in the previously evaluated epochs and eras, and finally Decisions of final designs based on iterative evidence from previous modules. 12 Curry, M.D. and Ross, A.M., "Considerations for an Extended Framework for Interactive Epoch-Era Analysis," CSER 2015. 13 Rhodes D.H. and Ross A.M., Interactive Model-Centric Systems Engineering (IMCSE) Phase Two Technical Report SERC-2015-TR-048-2; February 2015. 18 While the sequence of these modules flows logically, IEEA is intended to be an iterative process where users can go back and change responses within earlier modules at any point to reflect what they have learned from later ones. Elicitation and generation have been primarily a human task, with some structured support via static documentation; sampling, however, is the first module in the framework that can clearly benefit from human-computer interaction and feedback. In this module, the human must make sense of, and decide upon, which subset of epochs and eras to spend computational and human attention (i.e. scarce) resources. This module can be thought of as encompassing two submodules of IEEA: Epoch Sampling and Era Sampling. Visualization and feedback are key tasks for the user in order to interact with the data representing possible epoch and era subset samples from the generated larger epoch and era spaces. Evaluation again is primarily a human task as it requires judgment above computational power, but the subsequent analyses module (encompassing the submodules of Single-Epoch, Multi-Epoch, Single-Era, and Multi-Era Analyses) is the opposite, requiring as much interaction and human-computer feedback as necessary for a user to fully explore all of the design options he is faced with to finally make a decision. 1.2 Thesis Overview The work presented in this thesis aims to contribute to an overall research effort to demonstrate that adding interactivity to interfaces increases user satisfaction, through elevated functionality and usability. The overview of this thesis is now described. Chapter 2 will propose criteria for evaluating designs with respect to functionality, while Chapter 3 will present usability criteria, including overall practices of “good” graphic design. Chapter 4 will present an overview of the field of visual analytics, including a survey of existing visualization and data manipulation techniques. Chapter 5 will evaluate the functionality and usability of potential visualization interfaces for the introduced Epoch Sampling submodule, and Chapter 6 will go on to evaluate the functionality of potential visualizations for the Era Sampling as well as the IEEA MultiEpoch, Single-Era, and Single-Epoch Analysis submodules. Note that Single-Epoch Analysis will not be considered in this thesis, as this submodule by itself bears no fundamental difference to the aforementioned traditional analysis under static contexts/needs assumptions. Finally, Chapter 7 will provide concluding thoughts about the research contributions of this thesis as well as considerations for future work. 19 20 Chapter 2: Functionality Criteria As mentioned above, this thesis will evaluate the functionality and usability of potential interfaces for five interactive submodules in IEEA: Epoch Sampling, Era Sampling, and MultiEpoch, Single-Era, and Single-Epoch Analysis. Before we examine usability criteria, in this chapter we first discuss these submodules a bit further and propose a set of user-centric criteria for each submodule, adapted from Curry et al.’s hypotheses regarding IEEA (Curry 2015), with which to evaluate the functionality of its proposed interfaces. These criteria were then presented to three current SEAri graduate students who identified as novice to expert Epoch-Era Analysis practitioners, along with misleading criteria and opportunity for write-in criteria, and the strongest responses for each submodule have been included here. It is important to note this distinction before proceeding: This thesis does not aim to develop or evaluate functionality related to analysis models, strategies/procedures, or modules or submodules of IEEA themselves, but rather it aims to introduce considerations for visualization interfaces for and evaluate the functionality of these pre-developed submodules alone. 2.1 Sampling Module Activities By the time users have gotten to the Sampling module in IEEA, the assumption is that they will already have brainstormed all relevant epoch variables, as well as enumerated all of the possible epochs by taking a full factorial of all variable combinations. It should be noted that in practice, data for every one of these epochs will not necessarily be generated already, so in many cases, only a fraction of this enumeration is readily available for analysis. 2.1.1 Epoch Sampling Submodule The end result of Epoch Sampling is for users to have selected a number of epochs with which to construct eras and/or conduct Single- or Multi-Epoch Analysis. In order to get anything useful later from the analysis module, users must start to have a thorough understanding of the concept of epochs and what kinds of impacts they can have on later analysis, so it is very important to pick user goals that stress this understanding: • • • The user should understand how each of the epochs are defined in the dataset (e.g. epoch variables and values; what is a context and what is a need, etc.). Based on this, the user should be able to find and select epochs that he deems important on which to conduct further analysis. The user should understand a) the size of the epoch space, b) what fraction is available to explore (for which epochs data has already been generated), c) what fraction of this has already been explored or selected to explore, helping to “intelligently limit the potentially unbounded growth in the epoch/era space.” 21 2.1.2 Era Sampling Submodule Similar to Epoch Sampling, the end result of Era Sampling is for users to have selected a number of eras with which to conduct Single- or Multi-Era Analysis while starting to understand the concept and potential impact of eras on these later analyses. As shown in the previous chapter, the size of the era-space is necessarily larger than that of the epoch-space; in fact, there are always an infinite number of possible eras that can be enumerated from epochs, as each era is constructed by stringing together multiple epochs, each with a [potentially infinite] number of durations to choose from, thus it would be futile for a user to visualize the size of the era space, or even be confident that the fraction he explores is representative of the possible futures. There are two main categories of era construction methods: Computational (automatic) and Narrative (manual). Fitzgerald and Ross describe a simple era simulator, that exemplifies the former category, that automatically “constructs a stochastic sequence of epochs over which designs will be valued” by randomly selecting epochs in succession14. The alternative to this type of automatic era construction is the [less computationally taxing] method of manually creating eras through narrative, or writing story-like scenarios to explain and dictate the change in contexts and needs over time, as performed in (Schaffner, 2014; Pina, 2014). Roberts, et al. point out that since fewer eras can be created and analyzed through narrative-based approaches due to their time-intensiveness, these generally consider extreme scenarios15. For both of these methods, the goal of understanding the size of the era space (analogous to part (a) of the third goal in Epoch Sampling above) is removed since it is always infinite. After constructing eras through either method (as part of the IEEA Generation module), a user may still want to sample eras from this subset of all possible eras, thus leading him to perform Era Sampling as part of the IEEA Sampling module. This will be most common in the case of computationally generated eras, as users will tend to only manually construct eras they intend on analyzing in the first place. Thus, this section will list goals for era sampling from computationally generated eras, as well as for “manual sampling,” or manually deciding on and constructing eras. • Sampling from computationally-generated eras: a) The user should understand how eras are defined and represented in the interface (e.g. epochs and durations) b) Based on this, the user should be able to find and select important eras on which to conduct further analysis. 14 Fitzgerald, M.E. and Ross, A.M., "Sustaining Lifecycle Value: Valuable Changeability Analysis with Era Simulation," 6th Annual IEEE Systems Conference, Vancouver, Canada, March 2012. 15 Roberts, C.J., Richards, M.G., Ross, A.M., Rhodes, D.H., and Hastings, D.E., "Scenario Planning in Dynamic Multi-Attribute Tradespace Exploration," 3rd Annual IEEE Systems Conference, Vancouver, Canada, March 2009. 22 • c) The user should understand i) how much of the era space is available to explore (for which eras data has already been computationally generated), ii) what fraction of this has already been explored or selected to explore. Sampling through manual era generation: a) The user should be able to create an era by choosing epochs and setting their durations. b) Based on this, the user should be able to find and select important eras on which to conduct further analysis after they have been created. c) The user should understand i) how much of the era space is available to explore (how many eras have been generated thus far), ii) what fraction of this has already been explored or selected to explore. 2.2 Analysis Module Activities Once the user has selected epochs to analyze, he enters the Evaluation module, where he must generate models to evaluate designs with respect to the current context and stakeholder needs. The resulting metrics may be as simple as the level of a positive design attribute and initial cost of manufacture, or a more involved aggregation of models such as Multi-Attribute Utility (introduced by Keeney and Raiffa, 1976)16 and Multi-Attribute Expense (introduced by Diller, 2002)17. The Evaluation module relies on human judgment to pick appropriate evaluation metrics, and possibly further computation to score alternatives with these metrics, which will then enable the user to enter the Analyses module, where Single- and/or Multi- Epoch and/or Era Analysis can be conducted. As mentioned in the thesis overview, Single-Epoch Analysis will not be discussed in depth in this thesis, though we will present a section on this submodule here just for comparison and completeness. 2.2.1 Single-­‐Epoch Analysis Submodule Single-Epoch Analysis allows evaluation of multiple designs in one epoch (a single combination of contexts and stakeholder needs) at a time, based on any metric of the user’s choosing. The approach used in this submodule is analogous to the aforementioned traditional tradespace exploration based on design and epoch attributes, thus will similarly be limited its ability to inform about a design’s performance over the entire system lifecycle. Criteria for evaluating this submodule would be as follows: • The user should be able to evaluate and compare designs’ performance in a single epoch by user’s choice of evaluation metric. 16 Keeney, R. L., & Raiffa, H. (1976). Decision with multiple objectives. Wiley, New York. Diller, N. P. “Utilizing Multiple Attribute Tradespace Exploration with Concurrent Design for Creating Aerospace Systems Requirements,” Master of Science Thesis, Aeronautics and Astronautics, Massachusetts Institute of Technology, June 2002. 17 23 • If the user chooses for the computer to perform a calculation, the user must be able to explore how the computer arrived at results (and hopefully gain trust in results). 2.2.2 Multi-­‐Epoch Analysis Submodule Multi-Epoch Analysis allows evaluation of designs across all selected epochs of interest. All of the analysis methods in this whole module are designed to be performed iteratively, so users can use information learned from past analyses to inform future ones. (Pina, 2009)18, (Fitzgerald, 2012)19, (Schaffner 2014)20, etc. use extensions of Fuzzy Pareto Optimality (introduced by Smaling, 200521; illustrated in Figure 2-1) to automatically calculate which designs appear close to the Pareto Front for the highest percentage of epochs being analyzed. The user can calculate this for any level of “fuzziness” (distance from the true Pareto Front), and pick the most fuzzyPareto efficient designs based on the results. This method, while highly useful, is not necessarily interactive (besides setting the fuzziness factor), and requires trust in the computer’s calculations and recommendations, which new users may not have gained yet. Figure 2-1: Illustration of the concept of Fuzzy Pareto Optimality, where K is the level of “fuzziness” applied to the Pareto front (left) to create the Fuzzy Pareto Front (shaded area, right). Graphic taken from (Schaffner et al., 2014) For the purposes of this thesis, all modules will target newer users, therefore goals for the MultiEpoch Analysis interface itself will center on promoting interactivity (comparing designs visually and exploring further) to get a sense of the designs before computing the most fuzzy- 18 Pina, A.L. “Applying Epoch-Era Analysis for Homeowner Selection of Distributed Generation Power Systems,” Master of Science Thesis, Engineering and Management, Massachusetts Institute of Technology, June 2014. 19 Fitzgerald, M.E. and Ross, A.M., "Mitigating Contextual Uncertainties with Valuable Changeability Analysis in the Multi-Epoch Domain," 6th Annual IEEE Systems Conference, Vancouver, Canada, March 2012. 20 Schaffner, M.A., Ross, A.M., and Rhodes, D.H., "A Method for Selecting Affordable System Concepts: A Case Application to Naval Ship Design," 12th Conference on Systems Engineering Research, Redondo Beach, CA, March 2014. 21 Smaling, R.M. “System Architecture Analysis and Selection Under Uncertainty,” PhD thesis, Engineering Systems Division, Massachusetts Institute of Technology, June 2005. 24 Pareto optimal designs. A byproduct of this interaction will hopefully be building user trust in the computationally recommended results. • • • The user should be able to evaluate and compare system performance across selected epochs a) by user’s choice of evaluation metric and b) simultaneously (without having to switch screens). The user should understand that epochs being analyzed are not necessarily sequential. If the user chooses for the computer to perform a calculation, the user must be able to explore how computer arrived at results (and hopefully gain trust in results). 2.2.3 Single-­‐Era Analysis Submodule Single-Era Analysis, also part of the Analyses module, allows evaluation of designs over the span of an era, or a sequence of epochs with specified durations. Single-Era Analysis is similar to Multi-Epoch Analysis in its design comparison objectives, but the fixed order of epochs within an era allows decision-makers the ability to understand and utilize designs’ possibility for change between epochs (changeability22), either manually (usually incurring additional expense) or naturally, as well as the cumulative impact of time-varying metrics. Thus the major new benefit of Single-Era Analysis is allowing the user to examine time- and path-dependence, as well as identify design change strategies that keep the system delivering value even if it does not remain in the same design state as it started in. Historically, the aforementioned fuzzy Pareto metrics have also been used in this analysis, thus the goals for this submodule, like the last one, are centered around promoting interactivity and building user trust as a byproduct. • • • • The user should be able to evaluate system performance in the whole selected era by user’s choice of a) the same evaluation metric or b) different evaluation metrics for each composite epoch. The user should understand that epochs being analyzed are sequential and thus be able to understand path-dependent effects of epoch shifts. The user should be able to understand effects of designs’ potential changeability from epoch to epoch. If the user chooses for the computer to perform a calculation, the user must be able to explore how computer arrived at results (and hopefully gain trust in results). 2.2.4 Multi-­‐Era Analysis Submodule Logically following from the previous two analysis types, Multi-Era Analysis allows evaluation of designs across all selected eras of interest. Similar to Single-Era Analysis, this submodule can be very valuable in allowing a user to explore time- and path- dependencies and identify strategies based on design changeability. The last process introduced in EEA, this type is the 22 Ross, A.M. and Hastings, D.E., "Assessing Changeability in Aerospace Systems Architecting and Design Using Dynamic Multi-Attribute Tradespace Exploration," AIAA Space 2006, San Jose, CA, September 2006. 25 most complex of the analyses, relying on and synthesizing results gleaned from other analyses. In his Master’s Thesis, Schaffner (2014) explores the process of performing Multi-Era Analysis, stating that “the amount and variety of information that can be incorporated into [Multi-Era Analysis] is significant,” going on to describe how the process takes different forms depending on the inputs provided and the seven activities that can be performed as part of this process, including identifying metrics of interest, creating design change strategies, creating eras, and evaluating, or generating data for, each design-strategy-era combination23. For the purposes of this thesis, we assume that any possible aforementioned processes given the available inputs are done, and focus on the last of these processes: Results Analysis, or the exploration of system behavior and trajectory based on what analyses have been conducted so far. The user should be able to evaluate and compare system performance across selected eras by user’s choice of a) the same evaluation metric or b) different evaluation metrics for each composite epoch in the eras, and simultaneously (without having to switch screens). • • • • The user should understand that eras being analyzed are not necessarily sequential, but epochs within eras are. The user should be able to understand effects of designs’ potential changeability from epoch to epoch. The user should be able to understand and compare time- and path-dependent effects of epoch durations and shifts. If the user chooses for the computer to perform a calculation, the user must be able to explore how computer arrived at results (and hopefully gain trust in results). 23 Schaffner, M.A., “Designing Systems for Many Possible Futures: The RSC-based Method for Affordable Concept Selection (RMACS), with Multi-Era Analysis,” Master of Science Thesis, Aeronautics and Astronautics, Massachusetts Institute of Technology, June 2014. 26 Chapter 3: Usability Criteria Functionality is only one attribute of a system. Analogous to Ricci and Schaffner’s concepts of “trust” and “truthfulness” in a model (Ricci et al. 2014)24, flawless functionality of an interface (“truthfulness”) does not guarantee usability (“trust”). This usability can only be earned with good interface design. Now that we have presented functionality criteria in the form of submodule goals in the previous chapter, in this chapter we examine the usability criteria for evaluating the interfaces that display these visualizations to users. Overall design principles for creating good data graphics are introduced first, followed by usability criteria based on principles for good user interface design. 3.1 Overall Design Considerations There are countless guidelines that have been developed over the years that prescribe measures to be taken when creating data visualizations. These range from very general (e.g. important data should be easy to find and understand; tell the truth about the data) to very specific (e.g. colors should be chosen so that all, including color-blind, users can distinguish them; avoid using gray scale to represent more than 2-4 values; words should be spelled out and run horizontally, left-toright). There should be internal consistency within the visualizations, as well as external consistencies with common conventions the user may be familiar with. Visualizations should attract anyone viewing them to think about the substance rather than the methodology or any other distracting features. They should be clear and reveal the data at several levels of detail, attracting and encouraging the user to explore further. Above all, they should enable the user to be more productive, efficient, and/or gain more insight than they could have without the tool (Ware 2013; Tufte 1983)25,26. Especially when dealing with quantitative data, it is important to take into account how different values are encoded to reflect their size or order. According to a 1984 study by Cleveland and McGill, humans are most accurately able to encode quantitative data in the following ways, in ranked order (Cleveland 1984)27: 1) 2) 3) 4) 5) Position along a common scale (e.g. scatterplots) Position along nonaligned scales (e.g. multiple scatterplots) Length, direction, angle/slope (e.g. bar chart, pie chart) Area (e.g. bubbles) Volume, curvature (e.g. spheres) 24 Ricci, N., Schaffner, M.A., Ross, A.M., Rhodes, D.H., Fitzgerald, M.E., "Exploring Stakeholder Value Models Via Interactive Visualization," 12th Conference on Systems Engineering Research, Redondo Beach, CA, March 2014. 25 Ware, Colin. Information Visualization: Perception for Design. Elsevier, 2013. 26 Tufte, Edward R. The Visual Display of Quantitative Information. Cheshire, Conn.: Graphics, 1983. 27 Cleveland, W.S. and R. McGill, “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods,” Journal of the American Statistical Association, 79-387, 1984. 27 6) Shading, color saturation (e.g. heatmap) Thus, representations of quantitative (or even some categorical) information should take this list into account. While contrasting visual variables can add plenty of information, however, visualization designers must constantly be wary of the tradeoff between information displayed and simplicity. Selectivity and associativity are two properties that illustrate this tradeoff: Selectivity is “the degree to which a single level of variable can be selected from the entire visual field,” whereas associativity “refers to how easy it is to ignore the variable” (Miller 2015)28. While important elements of a visualization should appropriately stand out, unnecessary encoding of unimportant information should be avoided (without, of course, sacrificing too much functionality). While the more general guidelines are more obvious and widely accepted, more specific ones should be treated with caution, as not all users share the same visual preferences or cognitive processes. It is important to note here the differences between perception and cognition. Early stages in human visual processing are largely automatic and based on general human perception, independent of cognition: detecting basic figures, borders, and distinguishing a foreground from a background. As this information reaches later stages of processing, it is combined with an individual’s long-term visual memory to allow him or her to understand what he or she is seeing. Thus later stages are thus influenced by an individual’s knowledge, or cognition.29 It is important to optimize designs and visualizations for both human perception (general) and cognition (individual). We assume all of these guidelines strive to optimize the processing power of human cognitive ability to understand whatever is being displayed. Tufte summarizes the concept of graphical excellence as “that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space.” For all of the types of visualizations we are about to introduce, we credit that these baseline design criteria will be heeded and satisfied in the implementations. 3.2 Usability For the purposes of this thesis, we will focus on three dimensions of usability: Learnability, Efficiency, and Error-Tolerance. For each of these dimensions, we list a few questions to help guide the evaluation thought process, followed by discussion about further concepts and metrics related to the usability dimension. When evaluating the usability of the Epoch Sampling submodule later on, we will pay special attention to the italicized concepts introduced in each of these sections. 28 Miller, R. 6.831 User Interface Design and Implementation, Spring 2015. (Massachusetts Institute of Technology: MIT Stellar <https://stellar.mit.edu/S/course/6/sp15/6.813/materials.html>) 29 Tacca, M. C. “Commonalities between Perception and Cognition,” Frontiers in Psychology, 2: 358. PMC. 2011. 28 3.2.1 Learnability The learnability of an interface refers to how easy it is for a new user to learn the complete functionality of an interface without outside help. Some questions to consider when evaluating learnability are: • • • • Is the interface easy to learn at first? How helpful is the interface? Can tasks be completed and mastered without outside help? Does the interface have built-in instructions or guidance? While many pieces of technology were developed with the assumption that users would read a manual or take a class first, that is increasingly not the case. More often than not, users are goaloriented, and will learn to operate a system by way of exploring how to complete tasks (learning by doing) or by seeing others complete a task (learning by watching). If users need help from the system along the way for whatever reason, the help must be searchable and goal-oriented in order to be most effective. As visual cues are much easier to aid in user memory (“recognition” – knowledge in the world) than no such help (“recall” – knowledge in the head), it is important that systems somehow help the user rather than require the user to remember everything about its operation. As mentioned in the previous section with functionality, consistency is important within the interface as well as externally (so perhaps users can transfer existing knowledge from other applications to aid in using this interface). Quick, visible system responses are also critical so that users can get immediate feedback on whether or not they have actually done something. If an interface has multiple states or modes, these should also be very apparent to the user (and their transitions, if applicable). Finally, the interface should provide affordances, or the ability of an object to appear that it can be used in a certain way. For example, a text box offers the affordance that a user can click into it and type. Ideally, an object’s perceived properties to the user should match its actual properties, so the user knows exactly what s/he is to do with the object (Miller 2015). 3.2.2 Efficiency The efficiency of an interface refers to how fast it is for returning users to navigate and perform tasks using the interface. Some questions to consider when evaluating efficiency are: • • • • Once learned, is the interface fast to use? How long does it take to complete common tasks? Does the interface feel efficient to users? Are there bottlenecks or shortcuts? Once a user is familiar with a system, s/he tends to group parts of it in a unit of memory. This is called “chunking,” and good interfaces should present information in such chunks that are easily recognizable by the user. The interface should also be fast to navigate, in terms of pointing and 29 steering. Fitts’s Law, T = a + b*log(D/S+1) = RT + MT, represents the time T it takes to move your hand to a target of size S and distance D, or the reaction time RT plus the movement time MT. This law for pointing has many implications for interface design to speed up pointing time, such as the fact that targets at the edge of the screen are easy to hit, whereas unclickable margins require increased accuracy. To aid with pointing efficiency, it is good to make frequently-used targets bigger and put them near each other. There is a similar law for steering, T = a + b*D/S, representing the time T that it takes to move your hand through a tunnel of length D and width S. The index of difficulty, represented by the constant b, is now linear instead of logarithmic, showing that steering is much harder than pointing. Thus things like requiring the user to steer through narrow tunnels on the screen will severely damage efficiency. Keyboard shortcuts or anticipating the user’s next movement (e.g. autocomplete) also help users perform tasks faster (Miller 2015). 3.2.3 Error-­‐Tolerance The error-tolerance, or safety, of an interface deals with how the interface prevents and covers up any errors users make while using the interface. Some questions to consider when evaluating error-tolerance are: • • • Are errors few and recoverable? Does the interface help to prevent errors? Does the interface help when errors occur? Human error is unavoidable. Slips (failure of execution) and lapses (failure of memory) are fairly common simply due to inattention, but interfaces should take measures to prevent complete mistakes (using the wrong procedure for a goal). Some ways of accomplishing this are avoiding actions with similar descriptions, avoiding habitual action sequences with identical prefixes, and/or adding confirmation dialogs, clearly marked exits, manual overrides, error messages or the ability to undo (Miller 2015). 30 Chapter 4: Visual Analytics Leo Cherne has been credited with saying, “The computer is incredibly fast, accurate, and stupid. Man is unbelievably slow, inaccurate, and brilliant. The marriage of the two is a force beyond calculation.”30 The field of visual analytics aims to leverage such a marriage for vast potential knowledge gain. Visual Analytics, according to Keim, et al. (2008), can be described as “an iterative process that involves information gathering, data preprocessing, knowledge representation, interaction and decision making.” To reach the ultimate goal of gaining insight into a problem described by a large amount of data, this field “combines the strengths of machines with those of humans”31. A graphic of the visual analytics process is shown in Figure 4-0. As shown, there are multiple ways one can go through this process, and more feedback loops can result in more informed analysis. Figure 4-0: The visual analytics process, taken from (Keim, et al. 2010). Well-defined problems with clear sets of rules are poor choices for visual analytics since computers can simply be programmed to arrive at an optimal answer without human input. Similarly, exploratory problems with small amounts of data that humans can sift through to arrive at an optimal answer are also poor choices for visual analytics as they do not require the 30 Chang, Remco. “Big Data Visual Analytics: A User-Centric Approach” [PowerPoint Slides]. Keim, D. A., Mansmann, F., Schneidewind, J., Thomas, J., & Ziegler, H. (2008). Visual Analytics : Scope and Challenges. In Visual Data Mining (pp. 76–90). 31 31 aid of a computer. Thus, good applications of visual analytics require harnessing the computational power of a computer with the insight and intelligence of a human, making it a powerful tool for the type of exploratory data analysis needed in IEEA. Keim et al. (2008) present a “visual analytics mantra” to describe the process: “Analyse First – Show the Important – Zoom, Filter and Analyse Further – Details on Demand”32 By definition, visual analytics requires the use of visualizations that humans can interact with. The rest of this chapter will now focus on presenting a survey of some existing data visualization techniques, focusing on those intended for multidimensional (referring to the dimensionality of independent variables) and multivariate (referring to that of dependent variables) datasets, as well as some techniques for interacting with data. It is important to note that this thesis does not present an exhaustive list of all possible visualizations or data manipulation techniques, but rather a selection of fairly basic and common visualizations that carry potential to be used in IEEA modules. The visualizations are grouped into four types, discussed one per section: Geometric, PixelBased, Icon-Based, and Hierarchy-Based. These will mainly be static visualizations, but those that necessarily involve interaction will be marked as such. The last section will be devoted to discussing interaction schemes. As mentioned in Section 3.1 above, we assume that the previously discussed baseline design criteria will be satisfied in any implementations of all of the visualizations being presented. In the following chapters, we will pick a few different [interactive] visualizations for each IEEA submodule we present, and assess the extent to which they meet the functionality goals (as presented in the Chapter 2) for each respective submodule. 4.1 Geometric Visualizations Geometric visualizations are perhaps the most common and broad category of visualizations, mapping data attributes to a two- (or sometimes three-) dimensional surface (Chan, 2006)33. These include scatterplots, line graphs, parallel coordinate plots (including polar charts), force diagrams, and Sankey diagrams. 4.1.1 Scatterplots As seen from Cleveland and McGill’s aforementioned study, people are most accurately able to decode information when it is represented by position along a common scale, making a scatterplot a good place to start. A scatterplot allows one variable to be mapped on each axis, so 32 Keim, D. A., Mansmann, F., Schneidewind, J., Thomas, J., & Ziegler, H. (2008). Visual Analytics : Scope and Challenges. In Visual Data Mining (pp. 76–90). 33 Chan, W.W. “A Survey on Multivariate Data Visualization.” Department of Computer Science and Engineering, Hong Kong University of Science and Technology, June 2006. 32 each point’s location easily encodes two dimensions of its characteristics to new users. An example is shown in Figure 4-1. Figure 4-1: Example of scatterplot 4.1.1.1 Bubble Charts and Motion Charts A bubble chart is a scatterplot with two additional values encoded with color and size. Additional visual variables such as shape and orientation can encode more variables, but as this may overwhelm a user with information, bubble charts are most commonly thought of as simply x/y/color/size charts. Figure 4-2 shows an example of such a bubble chart. Motion charts are yet another extension of scatterplots/bubble charts, animating the trajectory of points as another variable changes (most commonly, time). If the dataset that produced Figure 42 had multiple years’ worth of similarly stored data, this visualization can easily be turned into a motion chart, where the x-position, y-position, color, and/or size vary as time t increases. 33 Figure 4-2: Example of bubble chart 4.1.1.2 Scatterplot Matrices Scatterplot matrices are essentially just what they sound like: a grid of 2D scatterplots, usually utilized with high dimensional data to view “cross sections,” or pairwise relationships between dimensional attributes. Generally all horizontal axes in a row and vertical axes in a column represent the same variable on the same scale to reduce confusion in what can already be a cluttered set of plots (Hoffman, 1999). An example scatterplot matrix is shown in Figure 4-3. Notice that the entire grid is symmetrical across the top-left to bottom-right diagonal, so only the lower or upper triangle is needed to convey the same amount of information. There are many variants of scatterplot matrices. Some recognize the ineffectiveness of plotting variables against themselves (along the diagonal) if there are more points than distinguishable. Figure 4-4 shows a scatterplot matrix with histograms of the value distribution for each variable plotted along its diagonal instead. 34 Figure 4-3: Scatterplot Matrix of a 6-dimensional car dataset, with variables plotted pairwise, from (Hoffman, 1999)34 Figure 4-4: Scatterplot Matrix with histograms plotted along diagonal, from (Grinstein, 2001)35 34 Hoffman, P.E. “Table Visualizations: A Formal Model and Its Applications”, Doctoral Dissertation, Computer Science Department, University of Massachusetts at Lowell, 1999. 35 Grinstein, G., Trutschl, M., and Cvek, U. “High-Dimensional Visualizations.” 7th Data Mining Conference-KDD 2001. 35 4.1.2 Line Graphs Line graphs are very similar to scatterplots, but in line graphs, the data points for each value of the independent variable are connected together to form a line, highlighting the local change between pairs of adjacent points and the overall trend. For this reason, line graphs are especially useful in visualizing data with an ordinal independent variable, such as time series data. Multiple variables can be encoded, similar to scatterplots, using line color and shape of points, as long as adding these variables does not detract from the interpretation of the relationship between adjacent points. For example, Figure 4-5 shows a line graph with two independent variables: month (on the x-axis) and year (represented by color and shape as seen in the key). The multiple lines on the same plot allow a viewer to easily spot trends within a single year as well as differences and common patterns across different years. Figure 4-5: Line graph with multiple lines, from (Wallace 2004) 36 4.1.3 Parallel Coordinate Plots Parallel coordinate plots, as shown in Figure 4-6, display high-dimensional data by representing each variable on a vertical axis (in Fig. 4-6, these variables are “Sepal Width,” “Sepal Length,” “Petal Width,” and “Petal Length”) that are not necessarily scaled the same. An individual line spanning the axes represents the point that takes the values of each variable it intersects (for example, following the red line at the top of the “Sepal Width” axis, the corresponding entry 36 Wallace, Rosa. “Graphic Resources.” NC State University <https://www.ncsu.edu/labwrite/res/gh/ghlinegraph.html> 2004. 36 seems to have the following approximate values: Sepal Width – 4.4, Sepal Length – 5.8, Petal Width – 0.4, Petal Length – 1.5). Additional characteristics can be encoded in color, as in Fig. 45, but are not necessary. Parallel coordinate plots are quite effective at representing and revealing patterns in high-dimensional data when each data point has slightly different values, however the horizontal order of the axes may affect interpretation, as different patterns may emerge with different orders (Chan, 2006; Grinstein, 2001). Figure 4-6: Example of Parallel Coordinate Plot, from Wikipedia37 4.1.3.1 Polar charts Polar charts, also known as spider charts or kiviat diagrams, are a circular extension of parallel coordinates, essentially pinching the coordinate axes around into a circle, creating a wrap-around version of Figure 4-6 where each design is now represented as a circle (Hoffman 1999)38. An example of this kind of visualization is shown in Figure 4-7. 37 en.wikipedia.org/wiki/Parallel_coordinates Hoffman, P.E. “Table Visualizations: A Formal Model and Its Applications”, Doctoral Dissertation, Computer Science Department, University of Massachusetts at Lowell, 1999. 38 37 Figure 4-7: Polar chart showing Iris Flower dataset (left) and RadViz showing example car dataset (right). Both images taken from (Hoffman 1999). 4.1.3.2 RadViz A similar idea is the Radial Coordinate Visualization (RadViz), proposed by (Hoffman 1999), where n-dimensional data can be plotted on n axes emanating outwards from the center of a circle and ending on the circle’s perimeter, also seen in Figure 4-7. Axes are normalized and data points are plotted as if attached by a separate spring to its axis intersection points from the polar chart. This kind of visualization minimizes clutter and allows for easy spotting of outliers, irregularities, or patterns, though the location of the points largely depends on the organization and order of the axes. 4.1.4 Force Diagrams (Interactive) A force diagram is an interactive graph layout in which all data points form connected components that behave as though they were attached by a spring. This visualization, supported by d3.js39, relies on physical simulation to allow users to explore to what extent data points affect the other data points in the visualization. An example of two different positions of the same force-directed graph is shown in Figure 4-8, in which data points are characters in Victor Hugo’s Les Miserables, and connections between points represent the fact that the characters appear in a scene together. When a user clicks on points around the periphery, such as the pink points at the bottom of the left pane, the whole graph does not move much. However, when a more centrally connected point, such as the light blue point in the middle representing Valjean (the main character) is moved, the whole graph is affected by it, as seen in the right pane. 39 Data-Driven Documents, d3js.org 38 Figure 4-8: Force-Directed Graph depicting character co-occurrence in “Les Miserables,” from (Bostock, 2012)40 4.1.5 Sankey Diagrams Sankey diagrams are widely used in fields such as chemical engineering that have an abundance of processes in which heat, energy, or other quantities flow between nodes. To represent the volume of flow between nodes, one-directional arrows have widths proportional to the flow quantity they represent41,42. Sankey diagrams do a very good job of representing a particular kind of data, but are close to nonexistent in fields that do not deal with flow data. An example of this kind of diagram is shown in Figure 4-9. This visualization is discussed for the sake of completeness, but also because different ways of encoding nodes may make this diagram useful for systems engineering decision-making purposes. 40 Bostock, M. “Force-Directed Graph” <http://bl.ocks.org/mbostock/4062045> Nov 2012. Bostock, M. “Sankey Diagrams” <http://bost.ocks.org/mike/sankey/> May 2012. 42 Sankey Diagrams. “Sankey Definitions” <http://www.sankey-diagrams.com/sankey-definitions/> 41 39 Figure 4-9: Sankey diagram showing a possible scenario for UK energy production and consumption in 2050, with supply on the left and demands on the right, from (Bostock, 2012) 4.2 Pixel-­‐Based Visualizations Pixel-oriented visualizations map data attributes to pixels based on a color scale. This tends to get rid of visual noise as lots of dimensional information can be encoded in such a small space (Chan, 2006). These include pixel bar charts and color mapping (heatmaps). 4.2.1 Pixel Bar Charts A bar chart, like a scatterplot, is an extremely common type of geometric-based visualization in everyday graphics, displaying a rectangle for each type of data point, whose length is proportional to its (1D) value. There are many variants on bar charts (e.g. plotted vertically or horizontally, cumulative, stacked, etc.), but a pixel-based bar chart can encode additional information (for additional variables) by representing each individual data point within a bar with a color pixel, as seen in Figure 4-10. Thus, along with the total number of data items per type, pixel bar charts allow individual attribute values to be seen at-a-glance. 40 Figure 4-10: Equal-height pixel bar chart with color encoding different attributes, from (Chan, 2006) 4.2.2 Color Mapping (Heatmaps) By now, encoding additional attribute values with color or hue is not a new idea. However, for completeness, the most straightforward single-pixel-encoding method is presented: heatmaps. For a given 2D area, whether part of another visualization (as in pixel bar charts above) or not, every pixel represents a data point, and the value of that pixel represents its data point’s value. Care must be taken if the value of a pixel is represented by color, as the color wheel does not actually have a “natural order,” due to the nonstandard and/or cyclic nature of many color palettes. Another concern with using color is user color-blindness, or ability to tell different colors apart. A more standard way to represent value of a pixel is by hue, or saturation of color. No matter what the base color, there is a natural order from light to dark, and color-blindness is no longer an issue to consider (Spears, 1999)43. A couple of examples of heatmaps are shown in Figure 4-11, one using hue (in this case, grayscale) and one using color (over an actual map). 43 Spears, W.M. “An Overview of Multidimensional Visualization Techniques.” Evolutionary Computation Visualization Workshop, 1999. 41 Figure 4-11: Heatmaps encoding data in every pixel. Random data set encoded into a 10x10 pixel square (left) from (Grinstein, 2001), and local thermal power data encoded into a map of the whole US (right) from (ICM Consulting 2015)44 4.3 Icon-­‐Based Visualizations Much like encoding information in hue or color, it is possible to encode additional information in shape, which is the idea behind iconography, or icon-based visualization techniques. Data items are mapped to icons, or glyphs, whose shape and features differ depending on attribute values. While hue-encoding worked well for quantitative data, iconography works better for categorical data, as shape features are also more categorical in that they do not necessarily have an order. While humans generally recognize graphical features more than simple geometric shapes/patterns, this only works up to a smaller volume of data than can be displayed with geometric techniques, so data sizes for icon-based visualizations are generally on the smaller side. As a disclaimer, Chan points out that while geometric techniques “treat all the dimensions equally, some features in glyphs are more salient than others, adjacent elements are easier to be related and accuracy of perceiving different graphical attributes varies between humans tremendously. It thereby introduces biases in interpreting the result” (Chan, 2006). Icon-based visualizations include star plots, Chernoff faces, stick figures, and color/shape icons. 4.3.1 Star Plots Introduced by Chambers, et al. in 1983, star plots are star-shaped figures that can display n dimensions using n rays emanating from the center of the glyph. All variables can be displayed in each figure, and the length of the rays are proportional to the values of the variables they represent. There is no standard for the way the data are arranged on a page, but usually the figures are placed into a rectangular array with some sort of grouping or ordering based on variable values to make certain trends, groupings, or features apparent at a glance. Figure 4-12 44 http://icmconsulting.com/media/uploads/Geothermal_heat_map_US.png 42 shows an example of a 12-dimensional car dataset with 36 points. The ray emanating straight downward from the center represents “weight,” so it is noticeable that the data in this figure are arranged such that the lightest cars are at the top of the figure, while the heaviest are at the bottom (Friendly, 1991)45. Figure 4-12: 36 twelve-dimensional data points represented as star plots, and organized by “weight” (bottom variable), from (Friendly, 1991) 4.3.2 Chernoff Faces Another famous icon-based visualization is the set of Chernoff faces, named after their inventor Herman Chernoff (1973), where each data point is represented as a human face. This was originally proposed because of humans’ natural ability to differentiate between and recognize human faces. Data attributes are mapped to different facial features: head eccentricity, eye eccentricity, pupil size, eyebrow slant, nose size, mouth shape, eye spacing, eye size, mouth length, and degree of mouth opening. These faces may again be arranged any way: randomly, on a rectangular grid, ordered to bring out a salient feature, or on a scatterplot (Liu 2014; Spears, 1999; Chan, 2006)46. Figure 4-13 shows a set of 12 different facial features as well as a scatterplot of Chernoff faces. It is important to note that the assignment of dimensions to facial 45 Friendly, M. “Statistical Graphics for Multivariate Data.” SAS SUGI 16 Conference, Apr 1991. Liu, Y. “Visualization of Multivariate Data” Department of Biomedical, Industrial and Human Factors Engineering, Wright State University, Fall 2014. <http://www.stat.sc.edu/~hansont/stat730/MultivariateDataVisualization.pdf> 46 43 attributes really matters in this visualization, as human facial recognition ability can introduce strong bias into the interpretation of the faces. Figure 4-13: Different Chernoff facial features (left) and Chernoff faces plotted in various 2D positions on a scatterplot (right), taken from (Chan, 2006) 4.3.3 Stick Figures Similar to Chernoff faces, stick figures (introduced by Pickett & Grinstein, 1988) encode attributes in the angle, length, thickness, or color of the 5 “limbs” in the body of a stick figure. Usually stick figures are plotted on a scatterplot with the two most important attributes being represented in the x- and y- position of the figure on the plot (Chan 2006; Liu 2014). Figure 4-14 shows an example family of 12 stick figures (with 10 features - angle and length for each limb), as well as a full scatterplot of properly positioned stick figures. Figure 4-14: A family of 12 stick figures (left) and a scatterplot of stick figures (right), taken from (Liu 2014) 44 4.3.4 Color/Shape Icons Color icons are essentially a hybrid between heatmaps and icons, assigning a pixel or region of the icon to each attribute, and encoding the value through color or texture. (Chan, 2006). The idea of icon hybrids opens up a world of extensions, where attributes can be encoded by any visual variable (e.g. color, hue, orientation, shape, texture, size, etc.) on an icon of any inherent shape, size, etc. for additional dimensionality mapping. 4.4 Hierarchy-­‐Based Visualizations Hierarchical techniques all require data to be organized in a format such that each data point belongs to a certain “level” or has a parent and/or children nodes. In other words, data must be structured hierarchically. This allows the space to be subdivided recursively and present logarithmically more information in the same space (Chan, 2006). Hierarchy-based visualizations include hierarchical axes, trees, treemaps, and circle packing. 4.4.1 Hierarchical Axes This technique partitions the display axis repeatedly, plotting (“stacking”) data elements within other data elements, with the most important variable being plotted first, then the next most important being plotted within that, etc. Color coding the final visualization helps distinguish layers, and can also perform double-duty by encoding additional information if necessary (Chan, 2006). Figure 4-15 shows both the splitting scheme of the axis as well as the final visualization. Figure 4-15: Splitting scheme of hierarchical axes (left) next to the final histograms-within-histograms matrix visualization (right), from (Chan, 2006) 4.4.2 Trees Trees make it fairly straightforward to visualize all of the options at each variable, making this option very useful for hierarchical data. (This visualization is also very well supported in D3.js!) To find the characteristics of a certain leaf node (at the bottom of a tree), one traverses up the path to the root from the leaf; Similarly to find a node with specified values for each variable, traverse down the corresponding paths from the root to reach that node. An example diagram of such a visualization is shown in Figure 4-16. 45 Figure 4-16: Example unlabeled tree visualization, from (BigML Blog 2012)47 4.4.3 Treemaps Treemaps are another hierarchical layout also supported by d3.js in which tree nodes are represented by rectangles, and “parent” rectangles are recursively partitioned into smaller “children” rectangles, much like a 2D version of hierarchical axes. Again, this is obviously very effective for hierarchical data, as well as representing all of a tree’s leaf nodes compactly (Wang, 2006)48. There is also, of course, the option of encoding additional values in shapes’ color and size to reveal more attributes of leaf nodes. Treemaps are generally good for representing trees when the distribution of children is non-uniform, so that they can display the variety at a glance. An example of such a treemap is shown in Figure 4-17, displaying the populations of all the countries in the world. The tree structure in this example stores the six continents as the children of the root, and each continent’s children are all the countries that belong to that continent. The divisions between continents in the figure are denoted with bold black lines, whereas those between countries are simply gray. In this example, the country’s population is encoded in area and its Gross National Income (GNI) is encoded in color. 47 BigML Blog. Jan 2012. <https://littleml.files.wordpress.com/2012/01/screen-shot-2012-01-23-at-10-00-17am1.png> 48 Wang, Y., Teoh, S.T., Ma, K. “Evaluating the Effectiveness of Tree Visualization Systems for Knowledge Discovery.” Eurographics/IEEE-VGTC Symposium on Visualization, 2006. 46 Figure 4-17: Example treemap of country population by continent, from (Veroy 2013)49 4.4.4 Circle Packing Circle packing is yet another hierarchical layout supported by d3.js in which tree nodes are represented by shapes. In this visualization, children nodes are recursively packed into parent circles to fill the area as compactly as possible, again proving very effective for hierarchical data. Again, there is the option of encoding additional values in shapes’ color and size. Figure 4-18 shows an example of a circle packing layout (the original circle packing tutorial on the D3 page, 49 Veroy, R. L. Feb 2013. <http://www.eecs.tufts.edu/~rveroy/stuff/GNI2010-treemap.png> 47 in fact), showing the Flare50 class hierarchy. The largest (outermost) circle represents the root node. Bigger circles encompass all their children, which in turn encompass all their children until the tree’s leaves are reached (the smallest circles). In this example, the leaves have been colored orange while all intermediate nodes are shades of blue. Figure 4-18: Example circle packing layout from Mike Bostock’s website51 50 51 Flare Data Visualization library, http://flare.prefuse.org/ http://bl.ocks.org/mbostock/raw/4063530/ 48 4.5 Data Interaction Techniques As human interaction with data is a fundamental requirement of visual analytics, this chapter would not be complete without a discussion of techniques to use in interacting with data. Direct data manipulation allows a user to interact with data and change some aspect about how it is displayed. Though multimodal interfaces can allow many ways for users to interact with data (e.g. speech, gestures, touch), we now present two solely mouse-based interaction techniques, namely drag and drop and selection. These two fundamental techniques can act as building blocks for more complex interactions, such as sorting, resizing, toggling, filtering, and brushing, which will be discussed below. 4.5.1 Drag and Drop A simple way to move objects around on screen is by clicking, dragging, and releasing the mouse where the object is intended to land. JQueryUI52 offers an API for easy implementation of this technique, allowing the creation of draggable elements and drop targets for draggable elements. Dragging and dropping enables the real-life metaphor of picking up objects and changing around their locations, making it easy to learn and efficient to use. Drag and drop is easy to undo, as a user can simply drag an item back to where it started if a move was unintended. 4.5.1.1 Sorting Dragging and dropping additionally facilitates the ability to sort data, or reorder items in a list or grid directly using the mouse. On top of simply changing the display of the interface, sorting can be linked to a backend to manipulate the state of a database based on the new order of the list or grid of items. 4.5.1.2 Resizing An important property of some interface elements is their size. To enable a user to directly change the size of an object, JQueryUI also offers a separate API for object resizing. Resizing easily allows a user to stretch or squeeze an object through drag and drop (supporting easy undoing of the action). By convention, the affordance for resizability is usually represented by a grooved border or corner (derived from the real-life metaphor of being able to “grip” a corner), as seen in Figure 4-19. Thus, implementations with the capability to resize objects should take this into consideration for easy learnability. 52 www.jqueryui.com 49 Figure 4-19: An example resizable object, denoted by the grooved “grippable” corner 4.5.2 Selection A fundamental capability to enable a user to choose items and show that those items have been chosen is selection. Selection can be implemented in many ways, depending on how it will be used in an interface. The most straightforward selection method is by clicking on objects to select them, and having them change some aspect of their representation (size, color, etc.) to indicate they have been selected. This can allow selection of either one or multiple items in an interface (though for easy learnability, there should be some indication of which is supported). A common implementation of this is through checkboxes (which allow multiple items to be selected) and radio buttons (which only allow one item to be selected), as illustrated in Figure 420. Figure 4-20: An example of checkboxes vs. radio buttons, from (Lepofsky 2015)53 Selection can also be implemented by allowing a user to “draw” a box or a lasso over all elements he wishes to select. JQueryUI again offers an API for this method of selectability. The ability to deselect items (and recognize deselection) is an important feature of safe interfaces. Once items are selected, they can be highlighted and can remain distinguished from the rest of the items until deselected. Selection is an instrumental building block for many other common kinds of interaction, including toggling, filtering, and brushing, described below. To make selection (or any 53 Lepofsky, A. “In The Next Version” 2015. <http://www.alanlepofsky.net/alepofsky/alanblog.nsf/dx/lotus-notesbasics-checkboxes-and-radio-buttons/content/M2?OpenElement> 50 application of selection) useful, it is generally linked to a backend (as with sorting) that will respond, passively or actively, based on which items are selected, making it an extremely useful technique for database manipulation. 4.5.2.1 Toggling Toggling is the action of switching between (usually two) properties or states based on which is selected. The switch is most cleanly activated using a marked toggle switch (examples shown in Figure 4-21), but can also be triggered on selecting or simply clicking an interface element which serves the same purpose as the toggle switch. Figure 4-21: An example of toggle switches, from XOO.me design directory54 When designing for error-tolerance, it is important to note that a common type of error is “mode error” (also called “state error”), and results when the same actions or displays mean different things in different states and a user confuses them. The best way to avoid mode errors is to completely eliminate modes, but if that is not possible, increasing visibility of modes or designing so that no two modes share any actions. 4.5.2.2 Filtering A very useful type of interaction that selection can aid with is filtering, or screening information by only displaying what is selected. Filters can be implemented in several different ways, but the goal is to allow users to easily choose what subset of the data they wish to see displayed. For data that can be described with any kind of variable (discrete, continuous, categorical, numerical), filtering could simply involve selecting the variables corresponding to the desired data, however they are represented (objects, checkboxes, dropdown menus, etc.) by any method above. Figure 4-22 shows some example filters used on an actual online clothing retail website. 54 XOO.me <http://xoo.me/template/details/11917-6-web-ui-toggle-switches-set-psd> 51 Sliders, allowing users to set a minimum and maximum variable value as demonstrated in Figure 4-22, are also an efficient filtering tool for primarily continuous numerical variables. Figure 4-22: Examples of filters for different types of variables: Size, Designer, and Color all allow discrete selection of respective values (numerical and categorical), and the slider (bottom right) allows selection for the continuous variable Price (as shown here, the selection allows values from $0-$750). All taken from an actual retail website, Rent the Runway55 4.5.2.3 Brushing Data brushing enables the display of the same selected data in two or more visualizations simultaneously. The selected data is generally displayed with the same appearance so that it is easily apparent to users which data points correspond to one another across visualizations. An example of data brushing across a scatterplot matrix is shown in Figure 4-23. Brushing is an effective interaction technique to link or coordinate multiple static visualizations. 55 www.renttherunway.com 52 Figure 4-23: An example of data brushing, taken from Mike Bostock’s website56. The data was selected in the top-left box, and is colored the same across all scatterplots in the matrix. The techniques mentioned in this chapter are by no means an exhaustive list of visualizations or interactions, but hopefully provide a starting point to think about how certain datasets may be represented for exploration and analysis by the visual analytics process. Different levels of Keim, et al.’s visual analytics mantra may be best represented by multiple interactive visualizations, either in combination or in sequence, so it is important to keep in mind for the rest of the discussion in this thesis that often there are multiple techniques that may work for a certain purpose, rather than one “best” visualization. Table 4-1 now summarizes the capabilities of the visualization techniques presented in this chapter to convey characteristics, strengths, and applications that they are good for at a glance. 56 http://bl.ocks.org/mbostock/4063663 53 Technique Name Capabilities Supported Supported Supported Interactions #Dims Var Type Dataset Size Supported Scatterplot Viewing trends, slope 2 Any Any Sort, Select, Filter Bubble Chart Viewing patterns/trends 4-5 Any Small-Med Sort, Select, Filter Scatterplot Matrix Linking graphs Multi Any Any Sort, Select, Filter, Brush Line Graph Time series representation 3-4 Any (esp Ordinal) Small-Med Sort, Select, Filter Parallel Coords Multidimensional trend/pattern recog Multi Any Any Sort, Filter Polar Charts/RadViz Multidimensional trend/pattern recog Multi Any Any Sort, Filter Force Diagram Displays all points in dataset N/A Discrete Any Drag, Sort, Select Sankey Diagram Volume of flow between nodes Multi Discrete (needs flow amt.) Pixel-Based Methods Encodes data with color Multi Any Any Sort, Filter Icons/Glyphs Encodes data with features Multi (to an extent) Discrete Small-Med Sort, Filter Hierarchical Axes Can view whole dataset in stacks Multi (to an extent) Any Any Sort, Filter Trees Can view whole dataset by path Multi Discrete Any Sort, Filter Treemaps View whole dataset by compartment Multi Discrete Any Sort, Filter Circle Packing View whole dataset by compartment Multi Discrete Any Sort, Filter Any (best w/ Drag, Sort, Small-Med) Select, Filter Table 4-1: Summary of visualizations presented in this chapter (Sections 4.1-4.4). Includes visualization names, brief notes about their major strengths/capabilities, the number of dimensions supported (either a number or number range, multidimensional [meaning 2+], or in the case of Force Diagrams, not 54 applicable), the variable types supported (Discrete, Continuous, or Any), the dataset size supported (Small, Med, Large, or Any), and the types of interactions supported from those presented in Section 4.5. 55 56 Chapter 5: Functionality and Usability Examination of IEEA Epoch Sampling Submodule In this chapter, we evaluate five visualizations with respect to the functionality goals for Epoch Sampling as asserted in Chapter 2. After presenting the best visualization[s] to meet these goals, we describe an implementation of the submodule and discuss its usability. 5.1 Functionality Recall from above the functionality goals for the Epoch Sampling submodule, summarized here: Goal #1: help user understand specific epoch definitions Goal #2: help user find and select epoch(s) Goal #3: help user understand: a) epoch space size, b) fraction available to explore, c) fraction already explored A selection of visualization techniques is now described. For each of these, we evaluate the strengths and weaknesses of the visualization with respect to the three listed goals. For this particular submodule, we will also evaluate the visualizations separately for their overall suitability for two dimensions and greater than two dimensions. 5.1.1 Scatterplots/Bubble Charts Scatterplots are virtually the best way to represent two-dimensional data, so if there are only two epoch variables, a scatterplot may be the best way to visually represent the entire epoch space: They clearly show the combination of two epoch variables defining each epoch (Goal #1), and based on this, a user can easily locate and select points of interest (Goal #2). All possible epoch points can be displayed, with different hues/saturation/shading representing the respective points that can and have been explored, so the user can get a sense of the whole epoch space (Goal #3). Enabling dragging over several epochs to select them all would increase selection efficiency as well. A rudimentary implementation of an interface using a scatterplot using 16 epochs is shown in Figure 5-1. 57 Figure 5-1: Example of IEEA Epoch Sampling implemented as a scatterplot. The epoch variables were “Tech Level,” with values “future” or “present,” and “User Preference,” with values 1-8. If there are more than two variables to plot, however, epochs will not all have unique locations. Even if more dimensions are encoded by size and color as in a bubble chart, points representing epochs will still be on top of each other in x-y space, so users will not have a clear way of separating them spatially to find and select, or to know how deep the epoch space actually goes, impeding Goal #1 and failing Goals #2 and #3. The same problem arises if additional dimensions are encoded in a unique icon or glyph. Though the option then presents itself to simply display all such icons in a group rather than plot them to avoid the same locations on the same axes, for higher-dimensional epoch datasets, the full enumeration of possible icons does not help users find specific epochs (Goal #2), due to the fact that shape is not a visual variable that lends to selectivity. 5.1.2 Parallel Coordinate Plots Parallel coordinate plots are quite effective at representing and revealing patterns in highdimensional data when each data point has slightly different values. However, since epochs are generally generated as a full factorial of epoch variable combinations, many share more than one variable value, causing this representation to suffer from the same spatial ambiguity problem as greater-than-two-dimensional scatterplots. In other words, the enumeration of epoch variables 58 causes each segment between adjacent axes to be shared among many epochs, again hiding epochs that share the same segment, or location, from the user. An example sketch of this is shown in Figure 5-2, using the five epoch variables from the Next Generation Combat Ship (NGCS) database developed by Schofield: VUAV, SmallBoatSize, EngineEmissions, RangeIncrease, and IceRegionUse (Schofield 2010; Schaffner 2014)57. As an example, two epochs that share the same values for VUAV and SmallBoatSize will share their first segment, but users would not be able to tell that the segment encoded more than one entry, skewing analysis. For this reason, parallel coordinate plots, as with higher-dimension scatterplots, impede Goal #1 and fail Goals #2 and #3. Figure 5-2: Example of IEEA Epoch Sampling sketched as a Parallel Coordinate Plot 5.1.3 Trees For the remainder of this functionality analysis, visualization techniques presented are hierarchical and therefore require epoch variable data to be stored in a tree structure to pass into built-in D3.js layouts. In our tree implementation, we organize the data so that each epoch variable is a fixed depth into the tree, and the epochs are the leaves. One such implementation, using the five epoch variables from the NGCS database, is shown in Figure 5-3. At each node, the variable name and value is displayed. Clicking nodes adds all descendant epochs to the list of selected epochs (e.g. Clicking Epoch #8’s parent node ‘IceRegionUse: high’ would only select Epoch #8, whereas clicking the root node ‘All Epochs’ would select all 108 enumerated epochs). The variables’ levels in the tree should be reorderable for easier mass selection (e.g. if I only want to select epochs with IceRegionUse: high, I can reorder IceRegionUse to be at the top of the tree, and only expand out that node). 57 Schofield, D.M. “A framework and methodology for enhancing operational requirements development: Unites States Coast Guard cutter project case study.” Massachusetts Institute of Technology, 2010. 59 Evaluating this interface with regard our epoch sampling goals, it does present a clearly defined pathway to every epoch, so users should easily be able to tell how epochs are defined and how to find and select epochs based on epoch variable levels (meeting Goals #1 and #2). If all of the nodes were expanded, the user would be able to see the size of the full epoch space, and further hue/saturation/shading could indicate the epochs the user is able to and already has explored (conditionally meeting Goal #3). Figure 5-3: Example of IEEA Epoch Sampling on NGSC data implemented as a Tree 5.1.4 Treemaps and Circle Packing In the case of epoch sampling, since all sibling nodes contain a copy of the exact same descendants, the treemap can be very repetitive and boring, as seen in Figure 5-4, which again uses the NGCS epochs. It is hard to distinguish because of the nodes’ shared boundaries, but the root node (the outermost rectangle) has first been divided into two parts (as the first epoch variable in the tree, VUAV, has two values), then each of those has been divided into two parts (the second variable also has two values), etc. down to the leaves of the tree, the epochs. All of the epochs (the smallest division of rectangles) are easily seen at a glance, but their hierarchy, or characteristics, are hard to distinguish. Thus, while our third goal can be easily accomplished by different hue/saturation/shading to compactly display the fraction of all possible epochs selected, static treemaps do not provide help to accomplish Goals #1 and #2 at all. 60 Figure 5-4: Treemap visualization of NGCS epochs As with treemaps, circle packing easily lends to the accomplishment of Goal #3, allowing the user to get a sense of the whole epoch space (as well as how many variables go into each epoch – encoded by the number of circle layers), as seen in the left pane of Fig. 5-5. However, recalling that users may encode quantitative data better in straight-line area than circles’ area, the treemap may actually do a better job accomplishing this goal. Our particular implementation of the circle packing visualization actually also allows users to zoom to any portion by clicking on corresponding circles, as seen in the right pane of Fig. 5-5. Through this, a user can click through to any particular epoch based on the variable values from higher levels, helping with Goal #2 (though not as effective as plain trees at accomplishing this goal). Finally, if for some reason a user is very zoomed in to a particular epoch and wants to understand the variables that went into creating it, s/he can easily zoom out layer by layer to discover them (meeting Goal #1, though a little less efficiently than plain trees do). 61 Figure 5-5: Circle packing visualization of NGCS epochs as seen at different zoom levels While both the treemaps and the circle packing visualizations provide the opportunity to view the entire epoch space at a glance, it is easier to recognize levels in circle packing, as the boundaries for rectangles overlap, whereas the boundaries of circles do not, therefore circle packing accomplishes Goal #3 more effectively than treemaps do. It should be noted that the ability to zoom can also be implemented on treemaps, but for the fully enumerated epoch data, as the rectangles are all still the same size, their shared boundaries will not make this feature as useful as it is for circle packing. 5.1.5 Evaluation Summary Table 5-1 below summarizes the relevant features of the proposed implementations from the discussion above in the context of our IEEA epoch sampling goals. To reiterate, the evaluative criteria are as follows: 1. 2. 3. 4. 5. Is the visualization good for two epoch variables? Is the visualization good for more than two epoch variables? Does the visualization help the user understand specific epoch definitions? (Goal #1) Does the visualization help the user find and select epochs? (Goal #2) Does the visualization help the user understand a) epoch space size, b) fraction available to explore, and c) fraction already explored? (Goal #3) The three possible answers to these questions are: 62 ● “Yes” – This visualization achieves the goal. ● “Fine” – This visualization is mediocre; It does not actively help nor hurt to achieve the goal. ● “No” – This visualization hinders the achievement of or does not achieve the goal. Criteria 2-5 are answered assuming there are greater than two epoch dimensions. The best alternative, as reviewed for epoch sampling, for each row is underlined and highlighted. Vis. Type: Scatterplot Parallel Coords Tree Treemap Circle Packing Good for two dims Yes Fine Fine Fine Fine Good for multi dims No Yes Yes Yes Yes Goal #1 (understanding) Fine Fine Yes No Yes Goal #2 (find & select epochs) Fine No Yes No Yes Goal #3 (view epochspace/fracs) Fine No Yes Yes Yes Goal: Table 5-1: Summary of characteristics for each visualization (Sec. 5.1.1-5.1.4), with best alternative for each row underlined. “Fine” represents a visualization is passable – not helpful, but not unhelpful. As seen, for the case of two epoch variables, the scatterplot is the best available option (note that the scatterplot meets all goals in the two-dimensional case). For multiple dimensions, both the tree and circle packing visualizations meet all three of our defined goals, making them both promising alternatives on their own. However, remembering that there need not be one “best” visualization for each purpose, as the tree visualization does a better job accomplishing goals #1 and #2 and the treemap actually does a better job with goal #3, the most promising representation for epoch sampling among these choices seemed to be a coordinated combination of the two, in order to optimize the abilities of both of these visualizations individually. Thus, this hybrid was the technique we chose to implement for the Epoch Sampling submodule. The following two sections will now describe the implementation and evaluate the usability of the resulting interface. 63 5.2 Implementation The Epoch Sampling interface we implemented was built in Javascript, making use of the JQuery, JQueryUI, and D3.js libraries. The start state of the interface is shown in Figure 5-6. Figure 5-6: Start state of Epoch Sampling interface The box on the top-left side contains instructions for the user to drag epoch variables up and down to “reorder them in the tree below,” allowing the user the option to change the hierarchy representation (data tree) in which the epoch data is stored. This box also contains a description of the box on the top-right: “the fraction of the epoch space selected.” This box is subdivided into the number of possible epochs (assuming discrete variables), and when the user selects epochs, the sub-boxes corresponding to the selected epochs are highlighted, as seen in Figure 57, so that the user can visualize the fraction of the epoch space he or she has selected. 64 Figure 5-7: IDs of selected epochs (top) along with current state of top-right box of interface displaying fraction of epochspace selected (12 epochs out of 108 total; bottom) The box taking up the bottom portion of the interface displays the actual epoch tree. The implementation of the tree follows the description in Section 5.1.3: each of the nodes is labeled by an epoch variable and value, and following a path down any branch will lead down to a leaf that represents an epoch taking all the values of the epoch variables prescribed by its respective branch. Thus all leaves under a given node will represent epochs that have the value of the epoch variable specified by that node. An example of a partially expanded tree is shown in Figure 5-8. Figure 5-8: Partially expanded tree (using NGCS database epoch variables/values) in implemented interface In this implementation, clicking nodes toggles the visibility of their children/descendants. Filled in blue circles signify that the node contains hidden children, and white circles signify that the node has been expanded (or is a leaf and has no children). To select epochs, the user must click into ‘SELECT mode,’ in which clicking nodes adds all descendant epochs to the list of selected epochs, and the variables’ levels in the tree can be reordered using the drag-and-drop functionality in the top-left box for easier mass selection, as prescribed in Section 5.1.3. Select mode is distinguished from the default mode by background color of the bottom box; The default mode gives the box a gray background as seen in Figures 5-6 and 5-8, whereas the select mode gives the box a dark red background as seen in Figure 5-9. 65 Figure 5-9: The bottom box of the Epoch Sampling interface in “SELECT mode” Finally, the “Reset Selections” button appears in both the default and select modes, and offers the ability to get rid of all previous epoch selections. 5.3 Usability In evaluating the usability of the implementation described above, we will focus on strengths and weaknesses with regard to the metrics described in Section 3.2 – Learnability, Efficiency, and Error-Tolerance – paying special attention to the italicized concepts in each section as discussed above. 5.3.1 Learnability The interface’s text immediately gives the user instructions on what the purpose of each of the three sections of the interface is and how to manipulate them (drag and drop, click on nodes, etc.), helping with immediate learnability. First-time users will most likely have to learn by doing (or playing around with the interface) rather than by watching someone else manipulate it, but as Interactive Epoch-Era Analysis is intended to be a tool for exploration, it is safe to assume users are comfortable with exploring the interface’s functionality. It is also fairly safe to assume that users will be goal-oriented (with the goal of selecting epochs for further IEEA purposes), and this directs their learning to perform the selection task, rather than exploring the interface aimlessly. An important learnability feature of the drag-and-drop epoch variables is the allowance for recognition of variable names rather than recall. This type of manipulation allows the user to focus on the cognitive task of ordering epoch variables in the hierarchy rather than on conveying knowledge to the interface and ensuring the task is carried out properly. The tree visualization automatically updates and resets every time the variable order is changed, offering the user a quick and visible response to his or her reordering action. 66 The interface is consistent with other web applications in that all clickable items (buttons, dragand-drop panels, tree nodes) offer the affordance of clickability by turning the mouse into a pointer. The interface’s color scheme is also consistent with SEAri’s, offering high-contrast differences between parts and distinguishable states. 5.3.2 Efficiency As this particular interface is geared more to provide comprehensive learnability for new users, the efficiency for returning users is slightly compromised. Returning users have the same main goal as new users: select epochs for further analysis. The action of selecting an epoch (or group of epochs) through this interface requires switching modes, or an extra click. If the user later wishes to expand another node, another click is required to switch back to the default mode where clicking nodes expands them rather than selects. While this extra click is slightly inefficient and can add up over time and interface usage, a returning user may also develop a strategy to minimize the number of extra clicks: click all nodes he may wish to expand before switching to select mode, then the only clicks necessary will be to select the epochs. This strategy encourages all epoch variable exploration to be done in one chunk before switching modes, separating exploration from the final selection in a user’s memory. Common targets (switch modes button, epoch variables, nodes) all have a maximized area on this interface to minimize pointing time. As the cursor switches to a pointer on hovering over a clickable item, the user immediately knows when he is able to click. For the nodes, the clickable area includes the text labeling the epoch variable and value, which is additionally bolded on hover over to slightly increase target area. The drag-and-drop feature helps with steering as it does not require the user to place the panel in the exact spot in which he wishes to drop it; The ordered list will show a blank panel-sized placeholder where the current drop target is, and on release, the panel will snap to the placeholder position no matter where the user’s cursor is. This allows the user to overshoot the top or bottom of the list, so he does not have to aim for these positions, and the panel will snap to the closest position in the list to the cursor. The drag-and-drop capability also helps with efficiency in terms of a slight shortcut. Previous prototypes of this interface included drop-down menus for each level of the tree, requiring the user to click and select an epoch variable for each level (in addition, making sure not to assign the same variable to two different levels). This capability cleanly removes these inefficient extra clicks (as well as cognitive burden on the user to keep track of which variables have already been assigned). 5.3.3 Error-­‐Tolerance Most likely the most common error in this interface would result from the fact that clicking nodes does two different things in the two different states: in the default mode, it expands the node; in select mode, it selects all descendant epochs. The interface aims to mitigate this by the 67 high-contrast deep red (a variant of a color used for warnings) background in select mode, but as noted above, the fairly common problem of human inattention, and therefore mode error, remains. Besides preventing errors, a key characteristic of an interface’s error-tolerance is its ability to help a user recover when errors do occur. Clicking a node while wishing to select in the default mode is fairly harmless: the price is just one extra click, and a user should be able to immediately realize the mistake (due to the interface immediately expanding the node, or its responsiveness) and easily switch to select mode to perform the task correctly. However, clicking a node while wishing to expand in select mode is a little more problematic, as it will add epoch IDs to the list of selected epochs and to the shaded fraction of the epoch space, though the user might not have wished to select them. The lack of node expansion is also immediately noticeable to the user, so the user can choose to rectify this mistake by clicking the available “Reset Selections” button. This button currently removes all selected epochs, introducing inefficiency if the user has to reselect all correct previous selections, so this feature could be enhanced with the ability to individually undo any selection or group of selections. The only other main opportunity for human error is the drag-and-drop placeholder falling somewhere other than where the user intended for it to drop. This is a very easy fix, however, only costing one extra mousedown and mouseup, as the user can repeat the action to drop the panel in the correct place in the list without damaging the tree expansion (any drag-and-drop action on the variable panels will shrink the tree back to the start state of one level either way) or already selected epochs. 68 Chapter 6: Functionality Examination for Other IEEA Submodules In this chapter, the same kind of functionality analysis as was performed in Section 5.1 will be performed on the remaining four of the aforementioned submodules: Era Sampling, Multi-Epoch Analysis, Single-Era Analysis, and Multi-Era Analysis. 6.1 Era Sampling Recall from above the functionality goals for the computational and narrative-based Era Sampling submodule, summarized here: Automatic generation: Goal #1: help user understand specific era definitions Goal #2: help user find and select era(s) Goal #3: help user understand: a) fraction of era space available to explore, b) fraction already explored Manual generation: Goal #1: help user create era by choosing constituent epochs Goal #2: help user identify eras at a glance Goal #3: help user understand: a) fraction of era space available to explore, b) fraction already explored A selection of visualization techniques is now described for each of the era generation methods. For each visualization presented, we evaluate its strengths and weaknesses with respect to all of the listed goals above. 6.1.1 Parallel Coordinates As the goals for the Automatic Era Generation Sampling method are very similar to those of Epoch Sampling, there is some overlap in visualizations presented; However, the meanings of the structures can be interpreted very differently for epoch and era representation. The parallel coordinate axes in this context can represent either different points in time (though convention warns against using the x-axis as time) or different epochs that constitute an era. For the former, the eras themselves would be represented as the lines between the axes. The distance between the axes would be the minimum amount of time between epoch shifts for any of the eras being displayed, and the lines would plot the state of each era at each specified point in time, bringing out the trajectory of epochs over time. For the latter, there would be as many axes as there are 69 unique epochs, and the lines between the axes bring out transitions between epochs (and the times at which they occur). Both of these representations, however, suffer from the same spatial ambiguity problem that the parallel coordinates representation of epochs faced in that multiple eras could share the same line segment, especially as automatic generation assumes all of the possible combinations of epochs were enumerated into these eras, so line segments show more redundancy than information. With this representation it is difficult to tell the volume of eras at a glance, impeding all three of the automatic generation goals (and not even addressing the manual generation goals), making parallel coordinates a poor choice with which to represent eras. 6.1.2 Sankey Diagrams Sankey diagrams, like parallel coordinates, offer the ability to show transitions between different epochs, but have the added benefit of showing volume, alleviating some of the issues with spatial ambiguity. With time implied on the x-axis, Sankey diagrams make it possible to get a sense of the size of the era space and distribution of epochs and transitions, aiding with Automatic Generation Goal #3. They illustrate the idea that eras are characterized by their constituent epochs and transitions between them, helping to guide the user with Goal #1. Figure 6-1 shows an example Sankey diagram representation of a set of eras. Through labeling and semitransparent flows, it is clear how many total eras there are, and how many are at which epoch during what time. Though mass transitions are easy to spot at a glance, it is difficult to pick out individual eras, impeding Goal #2. 70 Figure 6-1: Automatically enumerated era set represented as a Sankey diagram of epoch flows. 6.1.3 Tree Structures The sequential nature of era data appears to make it a good candidate for hierarchical visualizations. In such an implementation, data could be organized such that all of the possible starting epochs are children of a common root node, and a whole branch from the root down to the leaf would represent one era, similar to an individual complete left-to-right path in a Sankey diagram. Levels of the tree could be spaced by the minimum time between epoch shifts. By itself, a tree representation of an era set would present a clearly defined pathway to every era, so users can easily tell how individual eras were formed, aiding with Automatic Generation Goals #1 and #2. An implementation utilizing expandable nodes, such as in Figure 5-3, could help show the complete era space and eras the user is able to and already has explored (Goal #3), though the volume of eras is again not immediately obvious because of spatial ambiguity. Drawing from the success of the combined hierarchical Epoch Sampling visualization, an analogous interface for Era Sampling could overlay a tree visualization with circle packing or even a Sankey diagram to help solidify Goal #3. 71 6.1.4 Bar Chart Icons The last three visualization techniques have focused more on the Automatic Generation goals, but the next two will now aim to focus more on Manual Generation. The most straightforward way for a new user to think about forming an era is through the story it tells about the changing contexts and/or needs related to a system. Once the user understands that each time one of these contexts or needs changes signifies the start of a new epoch, it is clear that an era simply consists of an ordered string of epochs, each with its own duration. In visualization terms, a single era can be represented by ordered objects, each with their own value. Position along a scale is the most distinguishable visual variable on Cleveland et al.’s list, so by ordering the objects along a single scale, taking up lengths proportional to their durations, a single era can be represented by a sort of bar graph. Different epochs or types of epochs can additionally be distinguished by color or hue, as seen in Figure 6-2 (while taking care not to overuse visual variables as always). This representation tells the user what epochs and durations make up an era at a glance, helping fulfill Manual Generation Goal #2. An implementation that supports users to set the epochs and their lengths will then easily accomplish Goal #1 as well. Figure 6-2: A single era represented as a series of epochs along a single axis Figure 6-2 simply represents one era, but if a user has to pick and distinguish between multiple eras, the designer is faced with the task of arranging the eras in a spatial display. Earlier we repeatedly saw that plotting on a 2D graph was prone to problems stemming from multiple points sharing the same characteristics that determined location on the graph. Thus a display with no such arrangement is shown in Figure 6-3, where information about an era’s “type” or constituent epochs can be encoded in orientation and color/hue. Additional visual variables such as shape (e.g. rounded edges, border line thickness) can encode more information about the era depending on how much the user feels the need to distinguish between eras. Especially with manual generation, enabling users to encode mnemonics with these additional variables to identify epochs/eras can really help users identify the eras later on (Goal #2). This kind of representation is also helpful in accomplishing Goals #1-2 for automatic generation as well, though the number of eras to display is almost definitely greater than it is in manual generation, 72 so human cognitive processing power may limit this technique’s effectiveness. A bar chart icon only represents a single era, so it is difficult for the user to understand the era space, and the fraction that is available to be and has already been explored (Automatic and Manual Goal #3) without further encoding. For example, as mentioned above, a different hue or shape could represent the eras already explored and if a handful of manually constructed eras were arranged as in Figure 6-3, the proportion of eras already explored would be fairly evident at a glance, helping to achieve Goal #3. Figure 6-3: An unorganized set of 7 eras (represented by bar chart icons) with hue, color, and orientation as additional encoding. 6.1.5 Drag and Drop As introduced in Section 4.5.1, one very useful interaction technique is drag-and-drop, which allows users to easily move objects onto specified locations. This technique can be very useful in manual era generation to simply drag epochs (however they are represented) onto a panel that represents a timeline in order. As jQuery also supports element resizing, once epoch elements are displayed on the timeline, they can be stretched or squeezed to represent their respective durations as in Figure 6-2, directly achieving Manual Generation Goal #1. An example schematic of this idea is shown in Figure 6-4, where epochs are represented as circles with epoch variable values explicitly stated and encoded in color. Upon release over the era grid, the epoch circle turns into a resizable rectangle as part of the bar icon. Ideally the user would also be able to switch around or remove these rectangles, hover to view full epoch information, and change visual variables (e.g. hue, color, shape, orientation) of individual epochs or the entire era after completing creation. 73 Figure 6-4: Sketch of leveraging drag-and-drop and resizing functionalities for manual era creation 6.1.6 Evaluation Summary Recall the four methods of era creation proposed by Schaffner (2014): Human-in-the-loop, Breadth-First Search through clips, Sampling, and a Combination of these. This section has attempted to address potential interfaces for creating eras (not the methods themselves!) with human-in-the-loop (manual generation) as well as how to manually select eras after they have been automatically created through any of the other methods. Table 6-1 below summarizes the relevant features of the proposed implementations from the discussion above in the context of our IEEA era sampling goals. To reiterate, the evaluative criteria are as follows: 1. Does the visualization help the user understand specific era definitions? (Automatic Goal #1) 2. Does the visualization help the user find and select eras? (Automatic Goal #2) 3. Does the visualization help the user understand a) fraction available to explore, and b) fraction already explored? (Automatic Goal #3) 4. Does the visualization help the user create an era by choosing constituent epochs and their durations? (Manual Goal #1) 5. Does the visualization help the user identify eras he previously constructed at a glance? (Manual Goal #2) 74 6. Does the visualization help the user understand a) fraction available to explore, and b) fraction already explored? (Manual Goal #3) The four possible answers to these questions are: ● “Yes” – This visualization achieves the goal. ● “Fine” – This visualization is mediocre; It does not actively help nor hurt to achieve the goal. ● “No” – This visualization hinders the achievement of or does not achieve the goal. ● “N/A” – This visualization only addressed the automatic generation goals, so is not applicable to evaluate on manual generation goals. The best alternative, as reviewed for era sampling, for each row is underlined and highlighted. Vis. Type: Sankey Diagram Tree Bar Icons Goal: Parallel Coords AG #1 No Yes Yes Yes AG #2 No No Yes Yes AG #3 No Yes Fine Fine MG #1 N/A N/A N/A Yes MG#2 N/A N/A N/A Yes MG#3 N/A N/A N/A Fine Table 6-1: Summary of characteristics for each visualization (Sec. 6.1.1-6.1.4), with best alternative for each row underlined. “Fine” represents a visualization is passable – not helpful, but not unhelpful. Bar icons sweep other visualizations in being more generally useful in achieving both manual and automatic generation goals, however as the number of eras from which to select increases, the effectiveness of the bar icon representation decreases. The size and explorable fraction of the era space is also clearly hard to glean from the bar icons alone. It seems that for manual generation, or a small number of eras, bar icons are the best option for representing eras, but for the general automatic generation case, or a large number of eras, the overlaid Sankey/tree diagram seems to be the best available option of the techniques discussed. Additionally, as the drag-and-drop and resizing techniques do work very nicely to create the bar icon representation, this functionality can serve as the primary tool to manipulate epochs (however they are represented) to perform manual era creation. As bar icons seem to be the optimal representation 75 for manual era construction from the techniques presented, the “best” alternative for MG #3 is given to bar icons with the stipulation that there must be some further visual encoding to display which eras have data and which have already been explored. It may even be in the user’s interest to draw from the Epoch Sampling submodule and borrow the shaded-area-visualization capabilities of a treemap to help clarify the era space fractions. 6.2 Multi-­‐Epoch Analysis We now start examining submodules within the Analysis module. For the remainder of this chapter, the discussions of visualizations will incorporate considerations for interactive techniques, which would be helpful to augment the corresponding visualization. It is important to remember as the processes in the submodules get more complex that there need not be a single visualization that accomplishes all of the respective submodule goals (i.e. a coordinated view of linked visualizations may be better suited to meeting goals). In his thesis, Schaffner actually recommends a “widely-varying visualization approach… since so many aspects of the data can be presented and studied” (Schaffner, 146)58. With this in mind, visualizations and corresponding interaction techniques will be presented with the intention of offering specific capabilities to the analysis as a whole. Recall from above the functionality goals for the Multi-Epoch Analysis submodule, summarized here: Goal #1: help user compare designs across all epochs simultaneously Goal #2: enforce that epochs have no order Goal #3: allow user to explore further after performing computational analysis A selection of visualization techniques is now described for this analysis method, focusing on manual user interaction rather than specifying cost/utility models or performing any sort of computation. For each visualization presented, we evaluate its strengths and weaknesses with respect to the goals listed above. 6.2.1 Scatterplot Variants Now that the points being plotted are designs, rather than enumerated epochs or eras, spatial ambiguity becomes less of a concern (i.e. two designs with the exact same utility and cost value are far less common than epochs that share two context variable values), so scatterplots become a convenient and useful visualization again. To analyze designs in a single epoch, a simple scatterplot (or bubble chart) can easily display 2 (or 4) attributes for each design. Choosing attributes can be accomplished in a variety of manners, from selecting from a dropdown, or 58 Schaffner, M.A., Designing Systems for Many Possible Futures: The RSC-based Method for Affordable Concept Selection (RMACS), with Multi-Era Analysis, Master of Science Thesis, Aeronautics and Astronautics, MIT, June 2014. p. 146 76 coordinating the plot with another visualization tool. (Rhodes and Ross, 2015)59 demonstrated the latter by pairing an up-to-4D bubble chart with a parallel coordinate plot, shown in Figure 65, which has the added benefit of bringing out patterns in design attributes if they exist. Figure 6-5: Bubble chart paired with parallel coordinates that allow user to choose which attributes to plot (taken from Rhodes and Ross, 2015) One way of modifying this visualization for many epochs is to include options for metrics that encompass design performance across all epochs (e.g. percentage of epochs in which design FPN < 10%). Another straightforward way of extending this visualization to show the same designs in many different epochs is to replace the bubble chart with a bubble chart matrix, with each plot showing designs in a different epoch, keeping the x/y/size/color axis mappings the same for all epochs. Enabling highlighting a design point in all epochs (as well as its trajectory in the parallel coordinate plot) upon hover would easily help achieve Goal #1. If after computing the most fuzzy Pareto optimal designs a user wishes to confirm the results through exploration, this visualization would allow the user to narrow down the point by design variables and eyeball it 59 Rhodes D.H. and Ross A.M., Interactive Model-Centric Systems Engineering (IMCSE) Phase Two Technical Report SERC-2015-TR-048-2; February 2015. 77 across all of the plots to ensure it is close enough to the Pareto front in each one, helping to achieve Goal #3. Because grids are associated with having some order, human cognitive biases might impede the success of Goal #2, but allowing the user to switch around the order of epochs (perhaps through drag-and-drop or with a built-in ‘sort’ functionality) may alleviate this issue. 6.2.2 Epochs as Parallel Coordinates As straightforward as they are, rendering so many scatterplots for a matrix grid may slow down response time and cause more of a cognitive burden to sort through all of the displayed information. There is still another use for parallel coordinates in Multi-Epoch Analysis, however, by assigning epochs to the parallel axes. Each design point is then represented as a line between the axes and intersects axes at the value of the user-selected attribute it takes at that epoch. A simple sketch of this concept is shown in Figure 6-6, where the user is analyzing 7 designs across 6 epochs. The intersections show the design’s Fuzzy Pareto Number during each of the epochs, providing an easy way to eyeball the success metric and compare design performance across epochs, helping Goal #1. This visualization also provides an easy way to achieve Goal #3. For example, if the second-to-bottom design in Figure 6-6 is computationally recommended as “best” because it has the lowest total FPN, the user can then explore and notice that the bottommost design actually has a lower FPN in all epochs except for Epoch 5, which may be an epoch the user is not too concerned about in the first place, so is able to change and confirm his decision about which design is optimal. However, because parallel coordinates seem like they read left-to-right, the success of Goal #2 is still impeded, though again may be alleviated if interactive functionality is added to enable the user to switch around the order of the epoch axes. As parallel coordinates help bring out patterns, this visualization could additionally help the user spot particularly good or bad epochs depending on the metric used on the y-axis. Figure 6-6: Parallel coordinate plot showing Fuzzy Pareto Number of 7 designs (horizontal lines) being plotted over 6 epochs (vertical axes) 6.2.3 Circular Extensions Both polar charts and RadViz, introduced earlier as circular extensions to parallel coordinates, would allow users to explore designs further after computational analysis (Goal #3), and the 78 polar chart allows users to compare designs over all epochs in a more compact manner than parallel coordinates (Goal #1). The wrap-around to circular axes is a useful transformation if there are a lot of epochs being analyzed, to lessen the cognitive burden of processing all the information. Some argue that circles convey a sense of a quantity being unordered (e.g. the color wheel), but the representation of data points, especially in RadViz, depend on what axes are next to one another. Increasing interactivity and enabling the user to readily switch around axes (assuming they still represent different epochs) in these two plots would not only help enforce Goal #2, but also potentially show relationships or intricacies between one or more epochs being analyzed. 6.2.4 Evaluation Summary Table 6-2 below summarizes the relevant features of the proposed implementations from the discussion above in the context of our IEEA Multi-Epoch Analysis goals. To reiterate, the evaluative criteria are as follows: 1. Does the visualization help the user simultaneously compare designs in selected epochs (and evaluate designs by the user’s choice of metric)? 2. Does the visualization help the user understand that epochs being analyzed are not necessarily sequential? 3. Does the visualization allow the user to explore and confirm trust in computationally recommended optimal designs post-calculation? The three possible answers to these questions are: ● “Yes” – This visualization achieves the goal. ● “Fine” – This visualization is mediocre; It does not actively help nor hurt to achieve the goal. ● “No” – This visualization hinders the achievement of or does not achieve the goal. The best alternative, as reviewed for multi-epoch analysis, for each row is underlined and highlighted. 79 Vis. Type: Scatterplot Matrix Parallel [Epoch] Coords Polar Chart RadViz Compare designs across epochs? (Goal #1) Yes Yes Yes No Understand epochs are non-ordered? (Goal #2) Fine Fine Fine Fine Post-calculation exploration? Yes Yes Yes Yes Goal: (Goal #3) Table 6-2: Summary of characteristics for each visualization (Sec. 6.2.1-6.2.3). Though a single best alternative was selected for Goals #1 and #3, the margin of difference between all non-“no” alternatives is really not all that great. Because the scatterplot matrix is able to show the most information, it is a bit more comprehensive in how much it allows a user to explore it. For the rest of the proposed visualizations, only one value per design is able to be plotted (e.g. FPN), whereas the scatterplot (or bubble chart) matrix allows for up to four. Though this can be an advantage for depth of analysis, it is often much more computationally expensive than the other alternatives, so these might be preferred for quick system response or for large numbers of design or epoch choices. A best alternative was not picked for Goal #2 because none of the proposed visualizations really enforced the fact that epochs being analyzed are not sequential. However, as stated at the end of Sections 6.2.1-6.2.3, all were deemed fine if interactivity was enabled and encouraged to let the user readily switch around the order of epochs as they were represented, through a method such as drag and drop or sorting. 6.3 Single-­‐Era Analysis Recall from above the functionality goals for the Single-Era Analysis submodule, summarized here: Goal #1: help user compare designs over the whole era through any choice of a) the same or b) different metrics Goal #2: enforce epochs’ specified order and help understand path-dependence 80 Goal #3: help user see design changeability Goal #4: allow user to explore further after performing computational analysis A selection of visualization techniques is now described for this analysis method, again focusing on manual user interaction rather than specifying cost/utility models or performing any sort of computation. For each visualization presented, we evaluate its strengths and weaknesses with respect to the goals listed above. 6.3.1 Designs as Trees Before discussing visualizations for the entirety of Single-Era Analysis, it may be useful to introduce the concept of visualizing a single design as a horizontal tree, branching out or changing slope at whatever point in time the design is able to change to another design. Thus the root node would represent the starting design, subsequent nodes would represent design IDs that could be reached through all possible successive changes, and the levels would represent the point in time (measured in actual unit of time – month, year, etc. – or simply epoch). Given a changeability matrix (or any similar indication of what design transitions are possible), it should be fairly straightforward to construct such a tree for any design in the matrix up to as many levels as desired. More information in the matrix (transition cost, rules concerning in which epochs changes are valid, etc.) can obviously lead to more accurate and informative visualizations, but even a simple tree can give the user a sense of a design’s potential for change, aiding with completing Goal #3. 6.3.1.1 Sankey Diagrams The above suggestion only really lets users view one design-tree changeability representation at a time, but Sankey Diagrams offer a nice way to visualize the aggregation of all of these changes. Recall that these diagrams are particularly good at representing flow between nodes. The “flow” of designs changing into one another thus can be represented, highlighting the volume of change between designs, the most common designs to end up on, and all the possible designs that could be reached within n transitions. In his thesis, Schaffner uses D3.js to illustrate this type of concept, as seen in Figure 6-7. In this implementation, the ribbons are color-coded according to which design ID they started at (12, 14, or 128). From frame to frame, the transitions they make are clearly visible, as well as the proportion of the time (out of the generated eras) that these changes occur. This visualization can be used in conjunction with other visualizations while conducting Single-Era Analysis to identify changeability strategies before assuming them in other parts of the analysis. 81 Figure 6-7: Parallel sets (Sankey) visualization of designs following a changeability strategy from frame to frame (i.e. every transition), from (Schaffner 2014). Color coded by start design, horizontal line size “reflects the proportion of clips in which the corresponding design number appears in that frame”60 6.3.2 Line Graphs Recall that while Single-Era and Multi-Epoch Analyses both involve viewing designs in a number of different epochs, in this submodule, the epochs have a fixed order within an era. A straightforward way of conveying this order is through a time-series visualization, where the xaxis represents time, the y-axis represents the dependent variable, and more independent variables can be represented through color, hue, size, shape, etc. This describes the line graphs discussed in Section 4.1.2. In his thesis, Schaffner makes use of line graphs to show the trajectory of the Multi-Attribute Utility and Multi-Attribute Expense of six designs over the course of a 10-year era, shown in Figure 6-8. 60 Schaffner, M.A., Designing Systems for Many Possible Futures: The RSC-based Method for Affordable Concept Selection (RMACS), with Multi-Era Analysis, Master of Science Thesis, Aeronautics and Astronautics, MIT, June 2014. 82 Figure 6-8: Two line graphs showing MAU (left) and MAE (right) of 6 designs over the course of a 4epoch era (Epoch 1: 3 yrs, Epoch 2: 3 yrs, Epoch 3: 2 yrs, Epoch 4: 2yrs), from (Schaffner 2014) The fact that time is very clearly an independent variable enforces the ordered nature of the data, accomplishing Goal #2. As demonstrated in Figure 6-8, it is possible for a user to specify any choice of metric on the y-axis, though using different metrics for different epochs would be hard with this type of graph, thus accomplishing Goal #1a but not #1b. While this visualization allows the user to see the trajectory of different designs over time, it does not allow for easy visualization of design changeability, so does not meet Goal #3, at least in this form where the designs are simply represented by a single line. If designs were represented as trees, as described in the previous section, and the branches were also plotted according to the era-level evaluation metric, this would give the user a better sense of design changeability in addition to the possible trajectory over time, aiding with Goal #3 while preserving accomplishment of the other goals listed. Adding functionality to change the evaluation metric or designs being evaluated lets this type of tool allow the user to explore data further to validate any sort of computational analysis, thus possessing the ability to complete Goal #4. 6.3.3 Parallel Coordinates The concept behind visualizing this submodule as parallel coordinates is very similar to that of line graphs: different designs are represented as horizontal lines, and the parallel axes can serve to represent units of time (anything from month to year to epoch, though if using the latter, one must be careful not to interpret the epochs all as necessarily having the same duration – perhaps spacing axes a distance proportional to epoch duration apart would help in this case). The major difference is that different scales can be used at different axes, so the user can choose different metrics for different epochs if he wishes (though caution must clearly be taken to accommodate this during interpretation), helping accomplish all of the goals that line graphs did (Goals #1a, 2, 4; if designs are represented as trees, also Goal #3), as well as Goal #1b. 83 6.3.4 Scatterplot Matrices Scatterplot matrices are discussed again for this submodule as they do allow for simultaneous comparison of designs across different epochs, either with the same or different evaluation metrics for each epoch (achieving Goal #1a and #1b). However, based on the way the matrix is arranged, it could be hard to convey the fixedness and importance of the epoch order. For example, Figure 6-9 depicts sketches of two potential layouts of a scatterplot matrix. The sketch on the left does not have as clear of a sense of order (still making it perfectly suitable for MultiEpoch Analysis) as the sketch on the right, which could ostensibly help achieve Goal #2. Figure 6-9: Two layouts of a scatterplot matrix showing a four-epoch era If the matrix layout on the right of Figure 6-9 was used for Single-Era Analysis, as with MultiEpoch Analysis, the representation of designs would need to be clear enough that a user could identify the same design across epochs. This visualization does not really help a user see changeability within designs (Goal #3), but similarly to Multi-Epoch Analysis, if after computing the most optimal designs across the era, a user wishes to confirm the results through exploration, this visualization would allow the user to narrow down the point by design variables and eyeball it across all of the plots to ensure it is close enough to the most optimal design in each one, helping to achieve Goal #4. 6.3.5 Evaluation Summary Table 6-3 below summarizes the relevant features of the proposed implementations from the discussion above in the context of our IEEA Single-Era Analysis goals. To reiterate, the evaluative criteria are as follows: 1. Does the visualization help the user simultaneously compare designs in selected epochs across the era (and evaluate designs by the user’s choice of metric, either a) the same or b) different across epochs)? 2. Does the visualization help the user understand that epochs being analyzed are necessarily sequential and understand the path-dependent effects of this order? 84 3. Does the visualization allow the user to see designs’ potential changeability? 4. Does the visualization allow the user to explore and confirm trust in computationally recommended optimal designs post-calculation? The four possible answers to these questions are: ● “Yes” – This visualization achieves the goal as-is. ● “Yes with Trees” – This visualization achieves the goal (only applicable for Goal #3) if designs are represented as trees. ● “Fine” – This visualization is mediocre; It does not actively help nor hurt to achieve the goal. ● “No” – This visualization hinders the achievement of or does not achieve the goal. The best alternative, as reviewed for single-era analysis, for each row is underlined and highlighted. Vis. Type: Line Graph Parallel Coordinates Scatterplot Matrix Compare designs across era (same epoch metric)? (Goal #1a) Yes Yes Yes Compare designs across era (diff epoch metrics)? (Goal #1b) No Yes Yes Understand epochs ordered/path dependencies? Yes Yes Fine Yes with Trees Yes with Trees No Yes Yes Fine Goal: (Goal #2) See designs’ potential changeability? (Goal #3) Post-calculation exploration? (Goal #4) Table 6-3: Summary of characteristics for each visualization (Sec. 6.3.2-6.3.4). 85 From the techniques discussed here, it is evident that all three of them have different strengths, so, in line with Schaffner’s above recommendation, it would be impractical to choose a “best” technique. At this point in Interactive Epoch-Era Analysis, it is less useful for the user to try and pick a “best” design over the system lifecycle, but better to explore the nuances and properties of a handful of designs over eras of interest. Thus we conclude this section by inviting the user to use a combination of any of the techniques (including the representation of design changeability in a tree or Sankey diagram) to explore the data to get the additional knowledge he or she needs out of this analysis technique. 6.4 Multi-­‐Era Analysis Recall from above the functionality goals for the Multi-Era Analysis submodule, summarized here: Goal #1: help user compare designs across all of the eras (through any choice of metric) simultaneously Goal #2: enforce that epochs in an era do have a specified order, but eras do not Goal #3: help user see design changeability Goal #4: help user understand and compare effects of path-dependence Goal #5: allow user to explore further after performing computational analysis A selection of visualization techniques is now described for this analysis method, again focusing on manual user interaction rather than specifying cost/utility models or performing any sort of computation. For each visualization presented, we evaluate its strengths and weaknesses with respect to the goals listed above. 6.4.1 Line Graph Matrix As Multi-Era Analysis logically follows from Multi-Epoch and Single-Era Analyses, the first visualization presented is a hybrid of scatterplot matrices and line graphs: a grid of line graphs. This visualization is exactly what it sounds like, and its analysis subsequently mirrors the previous analysis done on both of its components: line graphs, as seen in Section 6.3.2, are great multidimensional time-series plots that clearly enforce the ordered nature of era data (Goal #2, part 1). Putting multiple line graphs in a grid together allows the user to compare designs across all eras, through either the same or different metrics for each era (though always the same within an era), as well as compare the effects of path-dependence across the ordered eras (Goals #1 and #4). An example of such a visualization is shown in Figure 6-10. 86 Figure 6-10: A selection of four eras displayed in a line graph matrix Again, because grids are associated with having some order, human cognitive biases might impede the success of Goal #2, part 2, but allowing the user to switch around the order of eras may help alleviate this issue. Goal #3 can be accomplished again by representing designs in each era as trees instead of mere lines, and adding functionality to change aspects of the visualization in real-time lets this type of tool allow the user to explore data further to validate any sort of computational analysis, thus possessing the ability to complete Goal #5. 6.4.2 Line Graphs to Represent One Design Nuances in path-dependence can really be brought out with the comparison of eras with small differences, either in epoch durations, order, or a few epochs themselves. The above suggestion for a line graph matrix offers at-a-glance comparison of multiple designs in multiple epochs. In order to deeply compare these effects of path-dependence, it may be useful to compare one design (and the subsequent designs it can change into) across many eras on one graph. As we have previously seen that line graphs have been a good candidate to view time-series data, we 87 introduce the possibility of viewing the same design on one line graph, as seen in Figure 6-11. Note that this representation only makes sense if all eras (and epochs within eras) can be evaluated with the same metric on the y-axis (e.g. same stakeholder preferences, or same definition of multi-attribute utility). Figure 6-11: Line graph showing trajectories (in terms of MAU) for one design (and subsequent changes/options) across 3 eras As this visualization was proposed with the intention of making path-dependence effects clearer for each individual design, it helps achieve Goal #4. The time being displayed on the x-axis reiterates that the progression of epochs within each era are very much ordered, but the lack of necessary order of the eras themselves does a better job conveying that the eras themselves do not have an order than the line graph matrix, achieving Goal #2. This graph makes it very clear, when observing just a single design, to understand where the design has change options, potentially helping a user develop strategy for design changeability and helping achieve Goal #3. Goal #1 is fulfilled for one design, though it becomes inefficient to use this visualization if the user wishes to analyze several such designs at this point. For this visualization to be truly interactive and exploratory (and help fulfill Goal #5), the user should have the option of changing the evaluation metric and eras being compared. 6.4.3 Sankey Diagrams Sankey Diagrams can again prove useful in charting the volume and nature of changes between designs, especially after developing a changeability strategy. Similar to the use case described in 88 Section 6.3.1.1, a Sankey Diagram can be used to represent an aggregation of all the change paths that a design or group of designs could take over the course of an era. Ribbons could be color-coded in a couple of different ways to bring out different features of the data: Similar to in Single-Era Analysis, they could be colored based on start-state design, to highlight the flow of multiple designs through one or more eras. For truly examining design flow across multiple eras, the levels could represent units of time and ribbons could also be colored/hued based on era, though this limits the number of designs that can be displayed meaningfully through such a diagram. Both of these schemes highlight design changeability, helping achieve Goal #3, but since it only shows changes and not performance by any evaluation metric, it does not help with Goal #1. Because the presented diagrams do not show designs’ performance, it is harder to understand the effects path-dependence has on performance (Goal #4), even though the effects on changeability are very apparent. Similar to line graphs, Sankey Diagrams ordered by time obviously show a definite order to epochs within an era, while color-coding by era does not give the sense of eras being ordered, achieving Goal #2. Sankey Diagrams have a lot of potential for allowing the user to explore them further based on the interactivity scheme and options to filter or change around color/level options, giving it the potential to accomplish Goal #5 adequately. 6.4.4 Evaluation Summary Table 6-4 below summarizes the relevant features of the proposed implementations from the discussion above in the context of our IEEA Multi-Era Analysis goals. To reiterate, the evaluative criteria are as follows: 1. Does the visualization help the user simultaneously compare designs in selected eras (and evaluate designs by the user’s choice of metric? 2. Does the visualization help the user understand that epochs within eras being analyzed are necessarily sequential but eras themselves are not? 3. Does the visualization allow the user to see designs’ potential changeability? 4. Does the visualization allow the user to understand the effects of path-dependence? 5. Does the visualization allow the user to explore and confirm trust in computationally recommended optimal designs post-calculation? The four possible answers to these questions are: ● “Yes” – This visualization achieves the goal as-is. ● “Yes with Trees” – This visualization achieves the goal (only applicable for Goal #3) if designs are represented as trees. ● “Fine” – This visualization is mediocre; It does not actively help nor hurt to achieve the goal. ● “No” – This visualization hinders the achievement of or does not achieve the goal. 89 The best alternative, as reviewed for multi-era analysis, for each row is underlined and highlighted. Vis. Type: Line Graph Matrix Line Graphs for One Design Sankey Diagrams Compare designs across eras? (Goal #1) Yes Fine No Understand epochs ordered, eras not? (Goal #2) Fine Yes Yes See designs’ potential changeability? Yes with Trees Yes Yes Understand pathdependence? (Goal #4) Yes Yes Fine Post-calculation exploration? (Goal #5) Yes Fine Fine Goal: (Goal #3) Table 6-4: Summary of characteristics for each visualization (Sec. 6.4.1-6.4.3). It becomes clear, for this most complex IEEA submodule, that it is extremely difficult to come up with one visualization that accomplishes all goals well by itself, reinforcing the point that a combination of different visualizations is optimal for complex processes like Single- and MultiEra Analyses. All three of the visualizations were presented targeting a certain goal, so naturally they offer different strengths, and would be worth looking into depending on what aspects of the data a user wishes to explore further. As with Single-Era Analysis, at this point in IEEA it is more useful for a user to be able to explore the properties of a few designs rather than picking a best design from all of the potential options, also helping to lessen the computational load and cognitive burden to process all designs over all eras being analyzed. 90 Chapter 7: Discussions and Conclusion Because such a vast amount of data can go into IEEA, there is an even greater number of insights that a user can gain out of any of these submodules individually, and more importantly, in combination. Thus, the most thorough analysis almost necessarily involves an iterative process through the modules and submodules, refining analysis based on new results, as suggested at the very beginning of this thesis. Consistent with this idea as well as Schaffner’s recommendation for a “widely-varying visualization approach,” it is justified from the above discussions that different interactive visualizations have different strengths, so using multiple in combination and/or in sequence is superior for maximum insight gain. As noted repeatedly, the visualizations presented in this thesis are not meant to be an exhaustive list, nor are they presented exactly as they must be implemented. Instead, the author hopes they will provide a foundation and spark ideas for visualizations that will be more useful and tailored to a user’s specific Interactive Epoch-Era Analysis goals. When considering any new type of visualization or interface, evaluating functionality and usability based on these specific goals, as demonstrated in this thesis, can inform how helpful the software will be for a user’s specific needs. User-centric design necessarily needs input from real users to inform the goals of the design. Though the user goals for each submodule in this thesis were confirmed by student researchers who ranged from novice to expert EEA users, they could be further refined by interviewing individuals who might actually use IEEA in a non-academic setting for real large-scale decisionmaking tasks. In addition to more authentic user studies (and subsequent implementation/evaluation of interfaces based on refined goals), there is plenty of room to expand on this work, both in theory and in practice. This thesis has only discussed mouse-based interaction techniques for on-screen visualizations, but there are many more ways to interact with and perceive data. Multimodal interfaces make it possible to command and explore on-screen data by speech, touch, gesture, etc. directly and through external technology. Visualizations do not even have to be constrained to a screen, as it is possible to project data onto a 3D space for an immersive perceptive experience (through holographic technology, for example). Lastly, the perception of data can even happen through other senses, most notably audibly, as data sonification (conveying information through non-speech audio) can be used instead of or in addition to visualization to enrich the perceptive experience. All of these methods of interaction and conveying data could have enormous potential in simplifying or further breaking down processes as complex as those in IEEA, thus could merit further exploration. As stated in the thesis overview, the work presented in this thesis aims to contribute to an ongoing effort (as put forth in Curry 2015) to demonstrate that adding interactivity to interfaces increases user satisfaction. Thus regardless of the method, hopefully it is clear in the context of 91 the visual analytics process that adding interactivity to interfaces has the potential to help users gain much better insight from analyses than static data displays. Combining the computational and visual display capabilities of computers with cognitive, decision-making capabilities of humans in the loop, the IEEA process itself has much to gain from the power of visual analytics. 92 Bibliography Bostock, M. Mike Bostock. Feb 2013. <bost.ocks.org/mike>. Cao, N. “A Survey on Multidimensional Visual Analysis Techniques.” Hong Kong University of Science and Technology, Sept 2011. Chan, W.W. “A Survey on Multivariate Data Visualization.” Department of Computer Science and Engineering, Hong Kong University of Science and Technology, June 2006. Chang, Remco. “Big Data Visual Analytics: A User-Centric Approach” [PowerPoint Slides]. Cleveland, W.S. and R. McGill, “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods,” Journal of the American Statistical Association, 79-387, 1984. Curry, M.D. and Ross, A.M., "Considerations for an Extended Framework for Interactive Epoch-Era Analysis," CSER 2015. Diller, N. P. “Utilizing Multiple Attribute Tradespace Exploration with Concurrent Design for Creating Aerospace Systems Requirements,” Master of Science Thesis, Aeronautics and Astronautics, Massachusetts Institute of Technology, June 2002. Few, Stephen. Information Dashboard Design: The Effective Visual Communication of Data. Beijing: O'Reilly, 2006. Fitzgerald, M.E. and Ross, A.M., "Mitigating Contextual Uncertainties with Valuable Changeability Analysis in the Multi-Epoch Domain," 6th Annual IEEE Systems Conference, Vancouver, Canada, March 2012. Fitzgerald, M.E. and Ross, A.M., "Sustaining Lifecycle Value: Valuable Changeability Analysis with Era Simulation," 6th Annual IEEE Systems Conference, Vancouver, Canada, March 2012. Fitzgerald, M.E., Ross, A.M., and Rhodes, D.H., "Assessing Uncertain Benefits: a Valuation Approach for Strategic Changeability (VASC)," INCOSE International Symposium 2012, Rome, Italy, July 2012. Friendly, M. “Statistical Graphics for Multivariate Data.” SAS SUGI 16 Conference, Apr 1991. Fulcoly, D.O., Ross, A.M., and Rhodes, D.H., "Evaluating System Change Options and Timing Using the Epoch Syncopation Framework," 10th Conference on Systems Engineering Research, St. Louis, MO, March 2012. Grinstein, G., Trutschl, M., and Cvek, U. “High-Dimensional Visualizations.” 7th Data Mining 93 Conference-KDD 2001. Hoffman, P.E. “Table Visualizations: A Formal Model and Its Applications”, Doctoral Dissertation, Computer Science Department, University of Massachusetts at Lowell, 1999. Keeney, R. L., & Raiffa, H. (1976). Decision with multiple objectives. Wiley, New York. Keim, D. A., Mansmann, F., Schneidewind, J., Thomas, J., & Ziegler, H. (2008). Visual Analytics : Scope and Challenges. In Visual Data Mining (pp. 76–90). Liu, Y. “Visualization of Multivariate Data” Department of Biomedical, Industrial and Human Factors Engineering, Wright State University, Fall 2014. <http://www.stat.sc.edu/~hansont/stat730/MultivariateDataVisualization.pdf> Miller, R. 6.831 User Interface Design and Implementation, Spring 2015. (Massachusetts Institute of Technology: MIT Stellar <https://stellar.mit.edu/S/course/6/sp15/6.813/materials.html>). Pina, A.L. “Applying Epoch-Era Analysis for Homeowner Selection of Distributed Generation Power Systems,” Master of Science Thesis, Engineering and Management, Massachusetts Institute of Technology, June 2014. Quesenbery, W. “Balancing the 5Es: Usability,” Cutter IT Journal, 17-2, 2004. Rader, A.A., Ross, A.M., and Rhodes, D.H., "A Methodological Comparison of Monte Carlo Methods and Epoch-Era Analysis for System Assessment in Uncertain Environments," 4th Annual IEEE Systems Conference, San Diego, CA, April 2010. Rader, A.A., Ross, A.M., and Fitzgerald, M.E., "Multi-Epoch Analysis of a Satellite Constellation to Identify Value Robust Deployment across Uncertain Futures," AIAA Space 2014, San Diego, CA, August 2014 Rhodes D.H. and Ross A.M., Interactive Model-Centric Systems Engineering (IMCSE) Phase One Technical Report. SERC-2014-TR-048-1; September 2014. Rhodes D.H. and Ross A.M., Interactive Model-Centric Systems Engineering (IMCSE) Phase Two Technical Report SERC-2015-TR-048-2; February 2015. Ricci, N., Schaffner, M.A., Ross, A.M., Rhodes, D.H., Fitzgerald, M.E., "Exploring Stakeholder Value Models Via Interactive Visualization," 12th Conference on Systems Engineering Research, Redondo Beach, CA, March 2014. Roberts, C.J., Richards, M.G., Ross, A.M., Rhodes, D.H., and Hastings, D.E., "Scenario Planning in Dynamic Multi-Attribute Tradespace Exploration," 3rd Annual IEEE Systems Conference, Vancouver, Canada, March 2009. 94 Ross A.M. Interactive Model-Centric Systems Engineering. 5th Annual SERC Sponsor Research Review. Washington, DC: Georgetown University, February 2014. Ross, A.M., and Rhodes, D.H., "Using Natural Value-centric Time Scales for Conceptualizing System Timelines through Epoch-Era Analysis," INCOSE International Symposium 2008, Utrecht, the Netherlands, June 2008. Ross, A.M. and Hastings, D.E., "Assessing Changeability in Aerospace Systems Architecting and Design Using Dynamic Multi-Attribute Tradespace Exploration," AIAA Space 2006, San Jose, CA, September 2006 Ross, A.M., “Managing Unarticulated Value: Changeability in Multi-Attribute Tradespace Exploration,” PhD thesis, Engineering Systems Division, Massachusetts Institute of Technology, June 2006. Sankey Diagrams “Sankey Definitions” <http://www.sankey-diagrams.com/sankey-definitions/> Schaffner, M.A., “Designing Systems for Many Possible Futures: The RSC-based Method for Affordable Concept Selection (RMACS), with Multi-Era Analysis,” Master of Science Thesis, Aeronautics and Astronautics, Massachusetts Institute of Technology, June 2014. Schaffner, M.A., Ross, A.M., and Rhodes, D.H., "A Method for Selecting Affordable System Concepts: A Case Application to Naval Ship Design," 12th Conference on Systems Engineering Research, Redondo Beach, CA, March 2014. Schofield, D.M. “A Framework and Methodology for Enhancing Operational Requirements Development: Unites States Coast Guard Cutter Project Case Study.” Massachusetts Institute of Technology, 2010. Smaling, R.M. “System Architecture Analysis and Selection Under Uncertainty.” PhD thesis, Engineering Systems Division, Massachusetts Institute of Technology, June 2005. Spears, W.M. “An Overview of Multidimensional Visualization Techniques.” Evolutionary Computation Visualization Workshop, 1999. Tacca, M. C. “Commonalities between Perception and Cognition.” Frontiers in Psychology, 2: 358. PMC. 2011. Tufte, Edward R. The Visual Display of Quantitative Information. Cheshire, Conn.: Graphics, 1983. Various. (2010). Mastering the Information Age: Solving Problems with Visual Analytics. (D. Keim, J. Kohlhammer, G. Ellis, & F. Mansmann, Eds.). Eurographics. Wallace, Rosa. “Graphing Resource.” NC State University. 95 <https://www.ncsu.edu/labwrite/res/gh/gh-linegraph.html> 2004. Wang, Y., Teoh, S.T., and Ma, K. “Evaluating the Effectiveness of Tree Visualization Systems for Knowledge Discovery.” Eurographics/IEEE-VGTC Symposium on Visualization, 2006. Ware, Colin. Information Visualization: Perception for Design. Elsevier, 2013. "What Is Systems Engineering?" INCOSE. International Council on Systems Engineering, 14 June 2004. <http://www.incose.org/practice/whatissystemseng.aspx>. Wong, P.C. and Bergeron, R.D. “30 Years of Multidimensional Multivariate Visualization.” Department of Computer Science, University of New Hampshire, 1997. 96 Appendix Selected parts of code for implementation described in Section 5.2 (generated and modified from template on Mike Bostock’s website): var tree = d3.layout.tree() .size([height, width]); var diagonal = d3.svg.diagonal() .projection(function(d) { return [d.y, d.x]; }); var tip = d3.tip() .attr("class", "d3-tip") .offset([-10,0]) .html(function(d) {return d.name;}); var svg = d3.select("#treeablediv").append("svg") .attr("width", width+margin) .attr("height", height+margin) .append("g") .attr("transform", "translate(" + margin + "," + margin + ")"); svg.call(tip); function update(source) { // Compute the new tree layout. var nodes = tree.nodes(root).reverse(), links = tree.links(nodes); // Normalize for fixed-depth. nodes.forEach(function(d) { d.y = d.depth * 180; }); // Update the nodes… var node = svg.selectAll("g.node") .data(nodes, function(d) { return d.id || (d.id = ++i); }); // Enter any new nodes at the parent's previous position. var nodeEnter = node.enter().append("g") .attr("class", "node") .attr("transform", function(d) { return "translate(" + source.y0 + "," + source.x0 + ")"; }) .on("click", click); ////////// MODE DEPENDENT //.on("dblclick", function(d) {if (!(d in selections)) {selections.push(d);} // newclick();}); nodeEnter.append("circle") .attr("r", 1e-6) .style("fill", function(d) { return d._children ? "lightsteelblue" : "#fff"; }); /////// FILL nodeEnter.append("text") .attr("x", function(d) { return d.children || d._children ? -10 : 10; }) .attr("dy", ".35em") .attr("text-anchor", function(d) { return d.children || d._children ? "end" : "start"; }) .text(function(d) { return d.name; }) .style("fill-opacity", 1e-6); // Transition nodes to their new position. var nodeUpdate = node.transition() .duration(duration) .attr("transform", function(d) { return "translate(" + d.y + "," + d.x + ")"; }); nodeUpdate.select("circle") .attr("r", 4.5) .style("fill", function(d) { return d._children ? "lightsteelblue" : "#fff"; }); nodeUpdate.select("text") .style("fill-opacity", 1); // Transition exiting nodes to the parent's new position. var nodeExit = node.exit().transition() .duration(duration) 97 .attr("transform", function(d) { return "translate(" + source.y + "," + source.x + ")"; }) .remove(); nodeExit.select("circle") .attr("r", 1e-6); nodeExit.select("text") .style("fill-opacity", 1e-6); // Update the links… var link = svg.selectAll("path.link") .data(links, function(d) { return d.target.id; }); // Enter any new links at the parent's previous position. link.enter().insert("path", "g") .attr("class", "link") .attr("d", function(d) { var o = {x: source.x0, y: source.y0}; return diagonal({source: o, target: o}); }); // Transition links to their new position. link.transition() .duration(duration) .attr("d", diagonal); // Transition exiting nodes to the parent's new position. link.exit().transition() .duration(duration) .attr("d", function(d) { var o = {x: source.x, y: source.y}; return diagonal({source: o, target: o}); }) .remove(); // Stash the old positions for transition. nodes.forEach(function(d) { d.x0 = d.x; d.y0 = d.y; }); } function switchmode() { SELECT_MODE = !SELECT_MODE; if (!SELECT_MODE) { //in Normal Mode document.getElementById("selectbutton").innerHTML = "Switch to SELECT mode"; document.getElementById("treeablediv").style.backgroundColor = "#EAEAEA"; document.getElementById("treeablediv").style.color = "black"; svg.selectAll("g.node").style("fill", "black"); svg.selectAll(".link").style("stroke", "#444"); document.getElementById("intro").innerHTML = "</br></br>Click on the nodes to expand them."; } else { //in Select Mode document.getElementById("selectbutton").innerHTML = "IN SELECT MODE </br> Click to switch back"; document.getElementById("treeablediv").style.backgroundColor = "#800007"; document.getElementById("treeablediv").style.color = "white"; svg.selectAll("g.node").style("fill", "white"); svg.selectAll(".link").style("stroke", "#DDD"); document.getElementById("intro").innerHTML = "</br></br>Click on the nodes to select all descendant epochs."; } } function click(d) { if (!SELECT_MODE) { //in NORMAL mode if (d.children) { d._children = d.children; d.children = null; 98 } else { d.children = d._children; d._children = null; } update(d); } else { //in SELECT MODE selections = []; listleaves(d); for (a in selections) { if (typeof selections[a]=="object") { if (!(uniqueEpochs.contains(selections[a].name))) { console.log(selections[a].name); $("#"+selections[a].name).css({"background-color":"#800007", "color":"white"}); uniqueEpochs.push(selections[a].name); SELECTED_EPOCHS.push(selections[a]); }} } uniqueEpochs.sort(function(c,b) {return parseInt(c.slice(1))-parseInt(b.slice(1))}); document.getElementById("selectedepochs").innerHTML = "Selected Epochs: "+uniqueEpochs+ ' <button type="button" onclick="resetselections()">Reset Selections</button>'; } } 99