>> Bongshin Leev: It is my great pleasure to introduce... finishing up his PhD in the InfoVis Group at the...

advertisement
>> Bongshin Leev: It is my great pleasure to introduce Matthew Brehmer. He is
finishing up his PhD in the InfoVis Group at the University of British Columbia in
Canada. So Matt is one of the very few still [indiscernible] who Natalie and I joking call
the king of InfoVis. Matt has done an impressive set of work for the design and
evaluation of visualizing systems and techniques in several application domains such as
data journalism and energy conservation. We were very lucky to have him as an intern
last summer and many of you already chatted with him. So without further adieu here is
our speaker of the day.
[Applause]
>> Matthew Brehmer: Okay, thank you. Thank you for having me. It is good to see
many of you again. I look forward to meeting those of you I don’t already know. Today
I will tell you a bit about my research. The title might have changed from when the
abstract when out, but I will tell you about a few of my projects today.
So as an outline for my talk I will begin by telling you a bit about myself and my
background. I will tell you about the breadth of my visualization projects that I have
worked on. I will followup with some depth in which I will discuss some of the projects
that I have worked on and their contributions in detail and I will finish with some future
research goals, interests and why I think MSR is the place to conduct this visualization
research.
So by now some of you might have read my brief bio, but in any case now I get to
elaborate on who I am, my background and some of the experience that I bring. So I self
identify as a visualization design and user experience researcher, a role in which I aim to
understand why and how people are using visualization tools and techniques and better
ways to design and prove these experiences.
So I will begin by telling you a bit about my academic background, which I would
describe as being multi-disciplinary. I started with an undergraduate degree in cognitive
science from Queen’s University in Kingston Ontario Canada and this is where I initially
became interested in human computer interaction research. I also got my start here as an
undergraduate research assistant. In this case it was in a lab that investigated the
potential of exercise video games or active games. I looked at the interactions between
exercise intensity and the ability to perform fun, yet challenging tasks at the same time,
similar to those that you would find in non-active video games.
Also during this period I developed a software toolkit to allow people with different input
peripherals like different models of exercise bicycles or different models of heart rate
monitors to play games together, play the same active game. And this was in a future
play paper back in 2010. I am not going to speak about this work today, but just the
reason I bring it up is just to illustrate some of the breadth of my experience and
familiarity with different areas of HCI research.
So prior to turning my attention to visualization my master’s work, it was in human and
computer interaction and it was also done at the University of British Columbia. This
was some experimental work. This was on task switching and interruptions and their
impact on older adults and younger adult’s performance on computer based tasks of
varying cognitive demand. And this was work that I had done with my advisor at the
time, Joanne McGrenere, as well as Claudia Jacova from the UBC faculty of medicine.
So like my undergrad research I am not going to speak about this work today, but if you
are curious you can definitely talk to me about this offline or we have a CHI 2012 paper
about this.
Okay, so my PhD, as Bongshin mentioned, is focused in information visualization. My
dissertation shares the same title as this talk, “Why Visualization? Task Abstraction for
Analysis and Design”. It is currently being reviewed by my examination committee. I
will be defending it at some point in late March or early April, but you can read it now if
you want at this URL. So it documents 4 visualization and research projects that I have
been working on over the past 5 years along with my advisor, Tamara Munzner and over
this time I have also benefitted from the contributions and perspectives of my committee
members, Joanna McGrenere from human-computer interaction and also Ron Rensink
from psychology who focuses on vision science and research.
So I want to emphasize that all my degrees were affiliated with computer science or
computing departments, but also during this time I had the flexibility with these cognitive
science and HCI programs to take courses outside of its field and do courses in
psychology and scene perception, psychology of reading, a few other interesting courses
in visual display design, as well as courses in linguistics, statistics and research methods.
So I have also spent some time working in industry and industry research. So as
Bongshin already mentioned I was an intern here last summer working with Bongshin,
Natalie and post doc Benjamin Bach. I worked on a design space and prototype tool for
interactive storytelling with time lines. Actually you are seeing work here on these slides
already. I produced these time lines you see on this slides using this environment. We
currently have a journal paper about this work in preparation and I will speak more about
these projects later in the talk when I go deep into this project and also I will spend some
time talking about ideas and where I want to take this work next.
So in 2013, before this, I was awarded the Graduate Research Internship Award from
Mitacs, which is a Canadian research funding agency and this was to collaborate with a
company called Pulse Energy and a software company based out of Vancouver, who
sense in the meantime has been acquired by the American company EnerNOC. And for
several months I embedded myself at this company, working alongside some of their
client services team, software developers and even some of their clients across North
America and the UK to design a visualization tool for organizational energy analysis and
management. So I presented this work at iEEE InfoVis conference last fall and you will
hear more about this as I go into this project in detail in this talk as well.
So finally, before I started grad school I spent 16 months interning with EMC and this is
where I did a mix of user experience research, user interface design, front end
development and this was on a content management application. This was intended for
use in the automotive industry. This was my first real opportunity to apply what I had
learned in my undergrad HCI curriculum to a real product and because of the length of
this internship I was able to see the product that I worked on deployed to clients during
this time, which was really cool.
So in the last part of this autobiographical section of this talk I just want to summarize the
reasons why I came to be a visualization researcher, why I was attracted to this area.
Well first I was initially attracted to this research because at the end of my undergraduate
program this was combining the topics that I found most interesting in my cognitive
science curriculum: visual perception, HCI and data mining. And while these topics
drew me in what hooked me was the opportunity to develop and improve upon methods.
So I believed 5 years ago, and I still do, that visualization design and evaluation presents
some unique challenges and requires some unique solutions and creative solutions.
So finally I am a visualization researcher because, well visualization itself is a young
field, still maturing and it’s growing, especially over the past decade and especially since
I started my graduate program. The visualization practitioner community has grown
considerably, which includes those working in data journalism and storytelling, those
working in the business intelligence domain and those using visualization tools and
techniques for discovery in various scientific domains. So all this points to a growing
literacy and demand for visualization techniques, tools and artifacts and it’s possible
through visualization research to support this growth, in fact a large and diverse
population of adopters.
So now that you know me a little bit I want to give an overview of the breadth of some of
the research I have done before I go deep into a couple of these projects. So I thought I
would briefly preface this summary with an argument for, “Well why do we need
visualization research?” So visualization in my perspective is not a singular thing, it’s
not a noun. And I feel that this quite from visualization practitioner and artist Jer Thorp
captures this mentality, “By thinking about visualization as a process instead of an
outcome, we arm ourselves with an incredibly powerful thinking tool”.
So for many people visualization may be synonymous with the artifact, with an image,
chart, an interactive graphic or a particular tool. To me visualization is this process that
can be studied and broken down to components and these components include
understanding the data that’s to be visualized, understanding the tasks of the people who
will be using the visualization tool, artifact or technique as well as the surrounding
context in which these tasks take place and given this combination of data and task
selecting and assessing appropriate visual encoding and interaction design choices.
So, visualization research exists because the parts of this process are seldom straight
forward, especially when we are confronted with complex tasks and complex data. So
while the emphasis in my dissertation as you saw on the title is certainly on
understanding the task that people use visualization for my research certainly touches on
all parts of this process. So over the past 5 years I have also had the opportunity to be
involved in various forms of visualization research, at least according to the types of
papers that appear regularly in visualization research venues. So I am perhaps most well
known within the visualization research community for a framework for visualization
task analysis. So this is a framework that I use to guide the design and evaluation aspects
of my other dissertation projects and I will discuss this as one of my selected projects in a
few moments in detail.
My work I did here in collaboration with MSR which pertains to interactive storytelling
with time lines that in my view would constitute some more technique driven research
and exploring a design space of a certain technique. I have 2 projects that would
constitute problem driven or design study as it is called in the visualization community in
which you design for a specific application domain and problem. So, one is in the
journalism domain and another in energy management. And I will speak about the
energy management project in detail. And finally I have also had the opportunity to
collaborate on a system development project in which we built an authoring tool called
time line curator for extracting and presenting time lines from unstructured texts, which
was at VAST last year.
So another theme that crosscuts all of my visualization research pertains to methodology
and in particular qualitative evaluation methods. So what I was especially interested in at
the outset of my PhD and continue to be now is how we evaluate visualization out in the
wild, studying people using visualization techniques and tools in non-laboratory, realworld context. So as a result most of my work pertains to and includes qualitative
evaluations, very central in a lot of my work. So 2 of my dissertation projects, including
the energy management design study that you will hear about involve formative or predesign evaluation. And the focus of these evaluations is on understanding why and how
current workflow’s or processes are performed so as to inform future visualization and
design.
So this includes examining: what are the current domain conventions that are happening?
What are the problems, constraints and work around that people are doing to get their
jobs done when the tools fail? And often this involves interviews, analyzing the artifacts
people are using and task analysis. So along with Sheila Carpendale and Melanie Tory,
Bongshin and I actually wrote a paper about these methods at BELIV a couple of years
ago. So I am also very much interested in studying the adoption of visualization
techniques and tools. So once something is deployed or disseminated I want to know
who the early adopters are and whether they are using the tool or technique for the same
reason that was envisioned by its designers or whether it was appropriated for some other
purpose.
So evaluating visualization adoption and appropriation is exceedingly rare in the
visualization research literature. There was a 2012 survey by Heidi [indiscernible] and
colleagues about visualization evaluation. They surveyed something like over 800 papers
and only 5 of those they found had commented at all on whether the tool or technique
was adopted by its intended user base. So with respect to my own research we have a
paper from 2014 that documents the adoption of a deployed document visualization
collection tool and this was done by 5, self initiated investigative journalists. We didn’t
ask them to adopt a tool. They took it up out of their own volition just by hearing about it
and none of them had contributed to the preceding phases of design.
And to understand these cases of adoption I relied upon interviews that involved both
retrospective recall of their analysis process and re-enactment of their investigative
process, asking them to walk me through their process. I would take screen capture
recordings, analyze their interaction logs from their use of the system, perform a task
analysis to compare these case studies against each other, but also to our preconceived
notions of what we thought the tool would support. In this case we ultimately found that
some of the journalists were using the tool for a task that we did not initially foresee and
subsequent deployments of this tool included some explicit support for this initially
unforeseen task.
So all together here the challenges of evaluating visualization techniques and the tools
provided me with an opportunity to develop my qualitative research skills. Prior to this I
had been doing mostly quantitative work on my masters and my undergrad research. So
this involved taking research methods courses and learning about different theoretical
perspectives and methodologies from the sociology, anthropology and educational
psychology traditions. So now I have told you about my work across different types of
papers in the visual community as well as how evaluation plays a large role in my work.
This diagram presents another perspective. It is how I illustrate my expertise which
pertains to task analysis for visualization. I have applied my approach to task analysis in
both visualization design and evaluation and focus projects spanning different data types,
as well as projects spanning different discovery and presentation like tasks and you will
hear about some of these today. So I don’t have time to speak about all these projects in
detail, but of course if you want to speak one on one with me afterwards I can go into all
these projects in much more detail, but I will speak about these first 3 on the list.
I will go into detail about our task topology first and then follow with our energy
management design study work. Then finally I will finish off with a time line project that
I was working on here last summer and continue to be now. Okay, so our 2014 InfoVis
paper is called, “A Multi-Level Typology of Abstract Visualization Tasks”. So this is a
framework paper that proposes a way to classify visualization tasks in a data type, or
domain agnostic or abstract way. This is the first paper published out of my PhD work
and it served as a framework for guiding my other projects, the design and evaluation
components. And if some of the material that I will present in the next few slides seems
a little bit familiar it’s because this task topology was also later used in my advisor
Tamara Munzer’s 2014 book, “Visualization Analysis and Design”, she spoke about here
I believe around the same time last year.
Anyway, so why do we need a topology of abstract visualization task? Why do we need
to classify abstract visualization tasks? So my answer is that the space of possible
visualization design choices is very, very large and while some visualization design
choices may be straightforward for instances where we have simple data or simple
straightforward tasks, it’s often the case that we are dealing with complex tasks, complex
sequences of tasks and complex data. So imagine that every one of these points in this
diagram represents a design choice. It might be a particular visual encoding technique. It
might be an interaction technique. It might be some combination of visual encoding’s or
interactions.
So an abstract understanding of tasks allows us to operationalize this visualization design
process and navigate this design space more effectively. For any combination of data and
tasks there may be a few good design choices. There may be some sub optimal
alternatives that might work in some circumstances, but not others and there are certainly
many poor choices that are incompatible with either the data, or the task or both. So if I
can only speak about tasks in the context of a single domain, so say for instance in energy
management I am speaking about energy consumption data, I might be missing out on
viable design choices initially intended to address similar tasks or data types from another
domain, say for instance financial portfolios.
So, both of these domains likely involve the abstract task of identifying extreme values or
outliers in quantitative time series data so I should be considering techniques to address
this task irrespective of what domain they were initially proposed in. So what we set out
to do in this project was to introduce a vocabulary for describing tasks abstractly, to
promote this cross pollination of visualization design choices across these domains and to
enable communication between visualization practitioners, designers and people in these
domains.
My personal motivation for doing this project was that I was just in need of an analysis
framework to make sense of the data I had been collecting in these very specific domains
relating to why and how people were using visualization tools or techniques and I wanted
to communicate these findings back to the visualization research community in such a
way that they would generalize to other application domains, other than the ones I happen
to be studying. So another motivation for this work is that there was a need for
consensus. There is an abundance of related work, just characterizing tasks, goals,
processes, activities, interactions and these are in the visualization literature, the HCI
literature, visual analytics, cartography, but they tend to vary in a number of ways.
So the first is that these prior classifications of tasks and all these things, they vary in
terms of the level of abstraction. So some of these are very specific at low level interface
events like clicking, pointing, dragging, while other’s are quite high level, sense making,
sense making, information foraging, integration of insights, and meanwhile there is
another way to look at this space. Some of these prior classifications are very domain
specific or they are specific to certain data types like task taxonomies pertaining to graph
visualization or tree visualization. And many of these prior classifications are a temporal;
they are just a list of tasks. Some of them impose hierarchical task decomposition, while
there is a more sequential form of thinking.
So in this project, what we tried to do is we attempted to unify dozens of prior
classification frameworks and their theoretical foundations to connect the low level with
the high level, to develop this common lexicon or vocabulary for describing visualization
tasks and task sequences so that researchers and practitioners could use that to
communicate. So our topology of visualization task is organized around 3 questions and
the first of these questions is, “Well why do people visualize data?” So ultimately people
visualize data in order to consume or produce information related to a domain specific
goal or interest. With regards to consuming information a person may have the goal of
discovering phenomenon within their data, which may be part of an audience to which
some phenomenon is being presented to do or which may be indulging a casual interest in
data with no other preconceived objective aside from enjoyment.
With regards to producing information, so I want to emphasize that visualizing data is an
interpretive act and many visualization tools and techniques can and should provide the
means to produce these interpretations of the data, maybe in the form of annotations on
visual encoding or recorded stories or work flows. Another case is the goal might be to
explicitly derive new data based on these interpretations, like deriving ordinal ranks from
quantitative values or deriving a set of synthetic dimensions from the use of
dimensionality reduction techniques.
So in addition to this distinction between consume and produce we distinguished the
ways between how people search for visualized elements, as well as how people query
the data in a visualization artifact. And this allows us to specify why people visualize
data at multiple levels. A person may locate and identify a visualized element as part of a
discovery task or a presenter may ask their audience to compare multiple items. So our
approach also involves characterizing how a task is supported by a visual encoding, an
interaction or a view coordination technique and these may be specific techniques or you
can maybe use abstract terms here as well to describe things like filtering, navigating,
selecting or arranging items.
And finally our approach involves describing the input and output of a task and all
together this why, what and how structure allows us to chain these task descriptions
together into sequences of interdependent tasks where the output of one task serves as the
input to a subsequent task. So this all seems very abstract and I am going to give an
example now from one of our other papers. I am going to present an example of a task
sequence from our topology, but it is based on a task sequence that we presented in our
2014 belief paper, where we interviewed people who visualized dimensionally reduced
data.
So in this one case a person who had high dimensional data used a dimensionality
reduction technique to derive a 2 dimensional projection of their data and this served as
the input to a subsequent discovery task in which the person explored and identified
clusters in his data and he visually encoded this as an interactive scatter plot that he could
navigate and select data points. And finally he recorded his interpretation of these
clusters and categories in his data by annotating selected sets of data points with
categorical labels which were then encoded as color on the points.
So I have used this task topology throughout the other projects in my dissertation and I
have used it to describe the use of existing visualization techniques and tools out in the
wild to evaluate a novel visualization tool and to inform some visualization design
choices. And I am going to speak about the next project in the next section of my talk.
So our topology has also been very well received and adopted by the visualization
research community. So at last check it has been sited over 65 times according to Google
Scholar, making it one of the most sited papers from InfoVis 2013 onwards. And I took a
look at all the papers that site it and what people are doing with it, which is pretty
interesting.
This topology has informed the novel data type task taxonomies that people have used.
This vocabulary described what people do with no link graphs or cartograms, as well as
domain specific task taxonomies using this vocabulary now to describe things that are
happening in bioinformatics or malware analysis. But now we have a common
vocabulary to which to use that. People have also used it to communicate the procedures
of experimental studies and finally it has been used to contextualize the capabilities of
some novel visualization tools and techniques, just what are these tools for and how they
might generalize across different domains.
So after developing this task topology in 2013 my next goal was to apply it in a design
project where I would choose the visual encoding interaction and view coordination
techniques to address tasks related to a concrete domain problem. And this is what I
worked on for the better part of the next year or so in the domain of energy management.
So the first step in this project was for me to understand the problems that people were
facing in this domain for the following: Given a portfolio of building, for example you
have a university campus, you have a hotel or restaurant chain, or a cities municipal
building, the energy manager or analyst person has to do several things. They have to
determine which buildings in their portfolio require energy conservation measures, like
installing new windows, lighting or insulation in these buildings. Then have to assess the
performance of these buildings and the portfolio following the implementation of energy
conservation measures. Then they have to find and diagnose anomalous, things that are
happening in their data like spikes or other forms of erratic or inconsistent behavior in
these energy portfolios.
So these very concrete domain goals were identified as part of this work domain analysis
that I had done where I learned the vocabulary of this domain. I interviewed the
stakeholders involved, which included not only our collaborators at this software
company, but also their clients who were located at organizations across North America,
including a few large universities. And our collaborators had an existing energy
management software tool that they had already deployed and that some of these clients
were already using. So I was trying to understand why, how and when this tool was
being used and what its short comings were when it broke down and when people would
do their own ad hoc analysis.
So our collaborators existing tool was fairly limited. It relied upon grouped bar charts
and super imposed line charts like these here for understanding and comparing patterns in
energy consumption and demand amongst organizations many buildings. So as you seen
in this simple diagram these choices were okay, but they didn’t scale very well if you
wanted to compare more than a handful of buildings, these group bar charts and line
charts, maybe a small handful, but certainly less than 10 before it get’s difficult to tell one
from another. And our collaborators wanted to address cases where you had dozens or
hundreds of buildings. In other words these choices were okay for some selective sets of
drilling down to a small subset of buildings in the portfolio, but certainly not for a large
overview cases. And it was also very difficult to navigate between visualizations in this
tool, which is a problem.
So based on the domain problems that we identified and further consultation with our
collaborators and their clients, and by looking at their existing tool, I then applied this
task topology of ours to identify what abstractly these tasks were and then what
sequences they might occur. So first I realized that the energy analysts need to discover
some course phenomenon in their time series data, which an analyst has to be able to look
up a portfolio or a group of buildings and summarize aggregate energy performance over
a long period of time.
So next the analyst should be able to drill down to discover some fine grain phenomenon,
typically in a smaller subset of buildings and over a shorter span of time, to be able to
locate and compare phenomenon between buildings, as well as within buildings.
Comparing one building one year to the same building the next year or the next week to
see what had changed. And finally the analyst should be able to identify the contribution
of energy consumption of one item relative to its parent, such as from a single building to
its group or from a group to entire portfolio. And as you notice typically these are done
in some sequence or another. Sometimes it starts with drill down and sometimes it starts
with the overview and we notice that some people need to be flexible to move between
these tasks.
So given the type of data that our collaborators had along with a set of tasks they wanted
to perform I was then able to expand this design space by considering possible visual
encoding, interaction and view coordination design choices that were appropriate for
summarizing, comparing and identifying phenomenon in multiple, concurrent time series
data, irrespective of what domains these tools and techniques were originally applied in.
So some of these choices were discarded as either they didn’t scale to the portfolio sizes
that we were considering or they violated some convention in the energy domain, which
is another interesting topic that I will return to at the end of this talk with some ideas for
future work or they were a mismatch for the task that we were trying to address.
So to test out these alternative design choices my design space exploration involved the
development of what I call data sketches or minimally viable yet functional interactive
prototypes containing real and representative data, in this case taking real data from client
building portfolios and working with it. So I developed this sandbox environment for
rapidly testing out visual encoding choices and view coordination design choices. So at
first glance it might seem that what you are seeing here, this is the sandbox environment
that I built, it might seem like it’s a high fidelity prototype of a tool that the energy
managers will eventually use.
This is not the case, this is an interactive environment for me to use, as a designer, to test
out different ideas. This data sketch and sandbox prototyping approach, it involves
foregoing some traditional HCI prototype approaches like paper prototyping. It
discourages the use of things like toy data or template data. The goal of this approach is
about encoding and interacting with real and representative data as soon as possible in the
design process. So this is one of the first things that you do in this project.
So I use this sandbox environment to conduct chauffeured demos with collaborators and
stakeholders to better understand task sequences. And by a chauffeured demo I mean
piloting my sandbox environment in response to and doing interviews with these people.
Asking them what do they want to see and where do they want to go next from this view
of their data. And this was as a means to gather feedback on these different visual
encoding, interaction and view coordination design choices. And what I would do is
record these chauffeured demos, screen capture and I would transcribe what they would
say and this would help me to identify what visual encodings interactions were
appropriate, but also how to coordinate these multiple visual encodings or interactions
within an interface, like such as how you would [indiscernible] multiple visual encodings
in a single display or you would sequence them in a series of displays for drill down
cases. And this is just reflected in this one flow chart example with some commentary
that I have sanitized here from our collaborators.
So ultimately through this process of iterating on visualization sandbox, visual encoding
design and chauffeured demos with a series of our collaborators and their clients, some of
them who weren’t participating in preceding phases of design, they joined later to get
new perspectives. I ultimately identified a small set of matches between visual encoding,
interaction and view coordination design choices and the tasks that we were addressing.
So the value of our research was in documenting these tradeoffs between about a dozen
different alternative design choices and why our final solution was effective.
So here is a series of screen shots from our collaborators that adopted a number of our
visualization design choices and implemented them into the production version of their
tool. Starting with the left we have a series of coordinated summary box plots and a
matrix based encoding for multiple buildings that you can scroll and see high level trends
over course periods of time. I will speak about this design later in the talk as well about
some ideas for future work here. And then being able to drill down and to the look at
small, faceted bar and line charts for this drill down task, looking at a smaller subset of
buildings in the portfolio over shorter periods of time and then finally on the right we
have stacked bar and area charts for this roll up task which allows the person to identify
contributions from one building to its parent group or from a group to the entire portfolio
of energy consumption over time. And as well, the ability to directly transition from one
task to another quickly, because the preceding tool that our collaborators had was very
difficult to navigate between one visual encoding to another.
So our paper about this is that it’s beneficial to the visualization community, not only
because it presents evidence for why these design choices work for these tasks, but
because we describe the data and tasks abstractly these design choices may very well
generalize to other domains involving the comparison of multiple time series data.
>>: Can I just get a sense of like what the really high level task for these folks are.
Would it be something like troubleshooting? Would it be something like something like
long-term planning?
>> Matthew Brehmer: It can be both. So sometimes they are trying to figure out, “Was
there an anomaly in a building? So was there a spike, was there an outage and what
caused it?” That might be one case and that’s when you would look at this drill down
case to go into a shorter period of time and maybe a smaller set of buildings to look at
these anomalies. But at the higher level it could be about planning. So it could be about,
“I just implemented an energy conservation measure. I just installed new windows in this
building. I want to see over a long period of time did this have any affect? Did my
energy consumption go down?”
So our paper actually goes into a lot more detail about our methodology. The impact of
domain convention and the topic of familiarity or literacy with different visual encodings.
So some of these, especially the one on the left here, which I will return to a bit later, they
have never seen this visual encoding before and this topic that I am very much interested
in and I will continue this discussion later. So the last project that I want to speak about
in detail today, it’s not one that appears in my dissertation, but it’s one that was largely
done here at MSR as an internship project last year and it’s continuing now.
So my motivation for doing this project was that in my preceding projects, including the
energy management design study, they were all about discovery tasks, analysis tasks.
This one was an opportunity to turn my attention to another kind of task. This was about
presentation and storytelling. And while both the energy management and design study
and this project involves time oriented data, the former involved quantitative time series
data, very large quantitative time series data, while this project involved smaller, curated
data sets of interval events time line data, data that you might associate with biographies
or historical summaries. In addition I was excited about the prospect that techniques that
I could develop in this space might be of interest and could be adopted eventually by
journalists, especially be we see a lot of time line stories appearing in news media,
especially when bringing someone up to speed on a current event.
So I like to motivate this project by showing this classic 1769 information graphic drawn
by Joseph Priestley. And what he has done here is visually encoded the life spans of
several dozen notable statesman and men of learning. He has done this along the x-axis,
just indicating when they lived and he has off set them in the Y direction so that he can
write their labels and you can see when people lived and when people were alive at the
same time. So this form of visual encoding for time lines has become the dominant way
that people present time line data for the past 2.5 centuries, but it’s certainly not the only
way to show this form of data.
So in the project we surveyed over 250 instances of tools, techniques or artifacts that
visually encode time line data. So beginning with this left most column, some of these
were hand crafted information graphics that appeared centuries ago and other’s, more
recent information graphics that you would find online or on news sites, and these tend to
been done using illustration software as opposed to being hand drawn, although there are
some interactive examples of time line information graphics that you see online. We also
looked at interactive time line visualization tools intended for discovery tasks. There are
quite a few of these, including some work done here, the third one, and a lot of this work
has been done in the domain of electronic health records. Then typically with larger data
sets you are trying to finds the patterns in, so not for presentation tasks so much, but more
for discovery and analysis.
And finally the last couple of columns there we encountered a set of interactive time line
visualization authoring tools. This is again for presentation. We found about a dozen of
these and some of them are quite popular. There is once called TimelineJS, which is this
one on the bottom here, and these are the tools that are especially popular amongst
journalists. They just allow a person to upload a spreadsheet of dates and descriptions for
these events and the tool will output a basic interactive time line that the viewer can page
through or select an event. But they all kind of look the same and when we considered
this corpus of time lines as a whole we actually found that there is quite a bit of variety, a
very rich variety of highly expressive and aesthetically pleasing design choices amongst
the historical time line information graphics as well as the modern information graphics.
But, none of these encodings including circular, spiral or even arbitrary shaped time lines,
this is by Mark Twain actually, these representations of time, none of them had made
their way into modern interactive visualization storytelling tools and we were wondering,
“Why do they all assume this chronological form of Priestley’s timeline? They all look
like this.” So we wondered, “Why is this? When is a linear chronological visual
encoding appropriate and when are other time line encodings the right choice? And what
if you have a story that isn’t well suited by one particular type of visual encoding for time
lines? What if you have more points to make that don’t suite that mold?”
So based on this extensive survey that we did we developed and proposed this design
space for time lines and then there are 3 dimensions of this design space. The first of
which along the top is the representation or the shape of the time line from linear grid,
arbitrary, spiral, grids might be a calendar time line, etc. The second dimension scale, it
corresponds to how events are mapped to positions along the time line and finally it’s
layout, whether the time line is a single time line or whether there are multiple time lines
side by side that you are meant to compare between or whether a time line is segmented
at meaningful boundaries like days or years. Sometimes you might want to separate one
time line for one year and look at the time line the next year with segments shown
separately.
But if you do the math here there are 100 possible combinations of representation to scale
and layout. This doesn’t mean that there are 100 viable time line designs. When we
looked at out survey data we only had existence proofs for a sub set of these
combinations, somewhere between 20 and 30 combinations here. And even among this
sub set there were designs that asked the viewer to make some very difficult perceptual
judgments that we deemed to be pretty ineffective for making good judgments.
So the way to constrain this large design space was actually to enumerate the tasks that
mapped these combinations of representation to scale and layout. So we ultimately were
able to identify 20 viable combinations of representations to scale and layout that mapped
to meaningful tasks that the viewer is supposed to perform or possible story points that
map to those tasks. So these are just 3 examples, there are 17 more that we describe in
our paper of examples of story points that vary either in terms of representation, scale or
layout.
So for instance on the left we have a singular linear chronological time line. This might
be effective if the task of the audience is to identify the chronology or duration of events.
In the middle a radial chronological time line, that’s segmented at meaningful, temporal,
cyclical boundaries like days or years, cycles. This might be effective if your audience is
intended to compare the periodicity of events across these segments. So naturally
occurring weather events is one example here, like comparing the periodicity of tornados,
or hurricanes over multiple years or days.
And then on the left we have an example of multiple linear time lines with a sequential
scale and a sequential scale we remove all the chronology or duration of it and we just
keep the sequence of events in their order. This might be effective for comparing the
number of events that appear in these time lines. So to then verify that these 20
meaningful combinations of representations scale and layout actually work I built an
environment that allowed me to test all of them. I built this interactive sandbox
environment that allowed me to test these combinations and I loaded it with 28
representative time line data sets and many of these data sets were actually based on or
inspired by existing time line information graphics that we had found in our survey.
So here is a gallery of time lines that we generated with this sandbox environment with 1
time line presented as an example for each of our recommended or meaningful
combinations of representation, scale or layout. There are 20 of them, it actually
continues down. We also present here, you will see some commentary about what the
narrative point or what the task that the audience, the viewer is expected to perform with
these examples, these time lines.
The next question here is: Well what if you have to tell a story that asks the viewer to
perform more then 1 type of one of these tasks or if you wanted to exclude parts of your
time line to focus on a specific interval in your data subset of the events, you wanted to
highlight things? So one option would be to present a slide show of static time line
images, but that has a limitation that your context doesn’t maintain from one story point
to the next. You can’t allow the viewer to interact with your time line data and explore
some of the events.
So for this reason we explored the possibilities for animated transitions between points in
this design, for changes of representation scale and layout. So here is an example of
representation transition from linear to radial. So representation of radial time lines here
have the aesthetic advantage of a square aspect ratio, but their drawback is you are
making arc length comparisons as opposed to comparisons of length along a linear
rectilinear access. So there are strengths and drawbacks to all of these combinations. If
you are curious about the time line data being shown here it’s actually just my
biographical data colored according to categories like academic, professional and athletic
achievements and some of it you already saw.
So here is an example of a scale transition. This is another type of transition that just
changes, in this case, from a chronological scale to a log time scale. The data shown here
is the life spans of notable philosophers spending thousands of years and they are colored
by world region. So we have all the Greeks over here and all the Germans in purple at
the end and when you shift to a log scale you are going to be emphasizing more of the
recent people on this time line data set.
Finally this is an example of a layout transition from a single time line to multiple time
lines, which we have events from each category of events and these correspond to the
colors here, but they are moved to their own dedicated time line. The data shown here is
a set of –. Yeah, Mary?
>>: Did you pick these because they are somehow more representative of the kinds of
time lines that actually exist out there and why did you pick these?
>> Matthew Brehmer: Yes, so this data set was in a time line information graphic that I
saw recently and similarly the philosopher data set I had encountered one in my survey
and also it was similar enough to the Joseph Priestley example enough for a biographical
time line of who lived when. They are fairly common, those type of biographical who
lived when time lines. This is the set of arm conflicts that America has engaged in since
independence colored by world region and the data get’s split into multiple time lines,
where each world region get’s its own little time line.
So the culmination of this work –. Yeah?
>>: So why are the animations helpful?
>> Matthew Brehmer: For maintaining context from one state to the next. So you can see
when some event moved to another form, or representation, or scale of time line.
>>: Do you think that works? Is it true?
>> Matthew Brehmer: We haven’t tested it yet, but we would like to. That’s the next
step, talking to users and I will speak about where I want to take this work next and that
includes actually showing these to people and getting feedback on them.
>>: I mean how do you relate that [indiscernible] work, like Jeffery here and George did
on [indiscernible]?
>> Matthew Brehmer: Yeah and actually we referred to this work for constraining the
space of animated transitions, because we don’t want to do any transition from one state
to another. We actually constrain it to just these types, only altering one dimension at a
time so that they are less jarring. So just changing representation or just changing scale
and if you combine that with selective highlighting, when you can dim out some of your
events in your time line you can actually follow just a small set of these events if you
have emphasized them.
So that is one additional storytelling thing that you need, another ingredient that you need
to add to this to make it work. And that’s what we did here and that’s what you are
seeing in this little time line story. It was a culmination of all of this, putting the
dimensions of our design space together, the 20 recommended meaningful designs and
over these 2 dozen real data sets. Here’s an example of selective highlighting. So if you
are just going to selectively highlight a few of them they are easier to track, because we
can’t really track that many objects when they are in transition.
So in order to actually tell a story we added some captions on the top. We have selective
highlighting and filtering, categorical legends of course are helpful for determining which
is which. By the way the data of this data set is the daily creative routines of highly
creative people, people like Dickens and Darwin. This again is based on existing time
line information graphic and I have seen several of these. I have presented this in static
form, but one of them I encountered was in these radial chronological diagrams. Another
was in a linear format as well and they have their own advantages for different tasks.
We actually produced, using this environment, 7 stories I guess. We recorded little data
videos with time line data sets like this. Here is an example of a transition from
chronological to sequential. So before you were able to see on a 24 hour clock what
people were doing at any time of the day. Now this is counting the variety of events in
their day. How people change, in this case, what they were working on or what they
were doing during the day, either a creative task, or sleeping or eating.
>>: [inaudible].
>> Matthew Brehmer: Just the number of events, the number of unique events. This is a
sequential scale that keeps the sequence, but not the chronology anymore. Of course it’s
not as ideal as looking at it like this. This linear representation, it affords the comparison
of how many unique events to a greater extent and here is an example of selective
highlighting where you can compare, in this case it was Darwin and Haruki Murakami.
And it think there is actually one final transition in this video where I transition back to
chronological scale and using this representation now you can look at synchronicities. So
what people were doing at this time in the day and actually one of the time line
information graphics of this assumes this form, where another time line information
graphic has the radial representation with little images of the people inside.
So what I would like to spend just the rest of my time here talking today is just about
what my vision for the future of my visualization research is and I will be speaking about
some of my short-term and long-term research interests and goals. So starting with this
project, and this addresses Danielle’s comment, the next step of this project is to get
feedback from people on this design space and on this sandbox environment ideally by
speaking to people who do storytelling, people who work in journalism, work in digital
humanities, law and other domains that have time line like data, ideally by populating
data or bogging the sandbox with data that’s personally relevant to them. Asking them to
provide their own data, because that’s what I have done in previous projects and I want to
do that here and working with them to produce stories that are relevant to them.
And this process will then inform the design and development of some more of a web
based storytelling authoring tool rather then this sandbox environment that I have already
built. So I believe that it’s feasible to collect this feedback and develop this authoring
environment and deploy it ideally within the time frame of a post doc. Now one of the
important research questions in this future work pertains to how we should evaluate
visualization authoring tools. What are the right metrics to use and what are the methods
that we should do to evaluate authoring environments for storytelling? What are the
goals of them? And as past of this evaluation ideally we would hope to analyze some
cases down the road of whether this tool is ultimately adopted, who adopts it and what
they are using it for.
Now another short-term research goal and this goes back to the energy management
design study for a moment, it’s to better understand the interaction between a persons
familiarity with a visual encoding and some patterns of view coordination or coordination
between multiple visual encodings in a single display. And this would be experimental
work motivated from, as I said, the findings from that energy management design study.
So one of our designs I showed a few slides ago and I will reproduce a version of it here
similar to this that our collaborators ultimately adopted into our tool was this Juxtaposed
and interactive coordination between a tabular encoding of aggregate time series values
and summary box plots for the same time series period.
So this design was actually very well received by our collaborators and they put a version
of this into their tool despite the fact that neither of these visual encodings had even been
used before in that domain. And when they were shown in isolation they weren’t
particularly well received. Something about putting them together and interactively
coordinating them made it work and made it appropriate for their task. So I want to study
this phenomenon in greater detail and in more of a controlled laboratory setting to find
out, when we have parings of visual encodings, or one is more familiar than another to a
person, or when both are totally unfamiliar and seeing where these parings make sense,
when they work and when they fall short.
Another factor to consider in this experiment would be the degree of interactive
coordination between these parings. It is enough just to show 2 static encodings next to
each other or is there some unidirectional coordination between the 2 like here, where I
am brushing over 1 encoding and it’s affecting the other? Or is by directional encoding
important? And we want to test this with different parings of visual encodings. So
another long-term research goal, one which I am already committed to in the design of
time line based stories, is just how to understand, how to create and evaluate visual
stories. Whether this is like a data video format like I showed a few moments ago or
some semi-interactive story, something that’s triggered by steppers or scrolling, but
something that allows the viewer to stop, pause and explore a little bit locally in the story
before progressing with the narrative.
And related to this goal is, “Well how do we define and measure if the story is engaging
or not? What is engagement when it comes to visual storytelling?” And as I mentioned
with the time line storytelling project how to evaluate storytelling authoring
environments. It’s another thread in this space that won’t stop with just the time line
authoring tool. And along with Bongshin, Natalie and Steve I will actually be attending
the Dagsthul seminar on data driven storytelling next week where I hope to discuss these
questions in more detail. Hopefully it initiates a new collaboration in this space and sees
where it goes.
So in the long-term I would also like to continue to conduct visualization research
motivated by problems and use cases in journalism and digital humanities, so education,
history, law and policy analysis. These are domains where there has been a notable
increase over the past decade in the adoption of visualization techniques, particularly for
visual storytelling. So over the past year I have become involved with the hacks hackers
data journalism community in Vancouver and I hope to participate in this and similar
other computer assisted reporting communities in the future where ever I am.
So I also have what happens behind the scenes in these domains, before the story is told.
How are people in journalism or other digital humanities domains doing their analysis
using visualization techniques or tools to perform these investigations or wrangle their
data? And I am particular interested in visualizing documents or text space data and this
has inspired this document visualization tool over view that I studied in a field study
several years ago.
Now a unique challenge for data analysis in many of these domains is that the goal is
seldom to discover an objective truth in the data, but it’s really just about promoting and
recording multiple interpretations of possible phenomenon. And the tools and techniques
that we build have to reflect this more interpretative approach to analysis rather than sort
of a sense making or discover based approach where there is a ground truth, something
that is motivated by more like intelligence or law enforcement.
Okay, so to conclude I just want to indicate why I think MSR is the best place to conduct
this visualization research and in particular the visualization research that I want to work
on in the near term. So first I am keen to continue this positive working relationship
working with some leading people in the areas of visual storytelling. So in addition to
Bongshin and Natalie there is Steven’s work on presenting and adding data with
SandDance, which I saw him demo here last summer. And there is of course Curtis’s
history of developing rich interactive story based media. In short there is just a wealth of
experience here in this space that I love to learn from.
Another attractive aspect of MSR is that its aims are not domain specific. Microsoft
products are not exclusive to work settings or single domains like business intelligence
and I feel like Microsoft and MSR by extension has the ability to effect the future of not
only how people work, but how people conduct research, how they create, they play, all
the ways that people consume or produce information. And I feel that visualization
research spanning many different application domains, data types and tasks is just a sure
fire way to continue this cross pollination of visualization research design and design
choices across these domains.
And finally I am drawn to the potential for multi-disciplinary collaborations. So this
could mean some more problem driven visualization and design study type work
motivated by application domain problems, like visual analytics collaborations with
statisticians and machine learning researchers here. I believe there is a great opportunity
for visualization to help people learn how this machine learning and statistical methods
work. I know that there is some work like model tracker here that is in this space.
Other possible collaborations could include work on document to text visualization with
natural language processing and computational linguistic researchers and finally I am
interested in assessing individual health and well being. This started, as you remember
from the beginning of this talk, my master’s research motivated by the development of a
self administered cognitive health assessment test. My undergrad research is all about
using technology to promote and enable physical activity. Oh and I didn’t speak, but I
want to actually return to it in this space of visualization and considering the recent
popularity of self tracking and the quantified self movement or innovations in wearable
technology I believe that there are great opportunities now to promote both cognitive and
physical health.
And I think it would be great to explore the ways in which I could collaborate with
groups here, in this area and other areas related to personal data visualization. I feel like
MSR was designed around fostering these multi-disciplinary collaborations and I look
forward to hearing your ideas about projects that we can work on together. So with that,
thank you very much for your attention and thanks’ to these people for giving feedback
on the slides. I look forward to hearing from you and meeting with you in person.
Thanks.
[Applause]
>>: I think a couple of your research projects have been sort of developing a sandbox
where you have lots of different varieties and experimental. Now the difference between
sort of paper prototyping is that is a lot more work. The difference between this and high
fidelity, is this meant primarily as a chauffeured tour as you said? It is more a tool for
you to visualize as opposed to someone else using it?
>> Matthew Brehmer: Yes, it is the divergent stage of design rather than the convergent
phase. So it’s about developing a way to quickly try out different designs really early on
in the design, but making sure you have real data in there.
>>: [inaudible].
>> Matthew Brehmer: Yeah and that’s like the limitation. I still do a little bit of
sketching in the beginning, but often it is just like Mapkin sketching and before you
actually get into needing to really put real data into and see if it works.
>>: So are they distinct sandbox tools out there or do you kind of have to build your own
each time?
>> Matthew Brehmer: In these projects I built my own in both these projects. In the first
project, in energy management there was some data wrangling packages that my
collaborators had already built and helped get the data into it which was nice and they
had done all of this work in ours, but then I had to build this sandbox environment over
the top of it and I was using a tool called shiny, made by our studio, which I found very
helpful for quickly getting an interface together and trying out different visual encoding
options. And once you are in our ecosystem there are so many R packages for visual
encoding and data that you can just hook up into this shiny web based interface. The
other one was in D3.
>>: It is a blank HTML page today and it is where sandbox begins, besides you have got
an SVG tag and you have written a little [indiscernible]. What makes it a sandbox?
>> Matthew Brehmer: I think that having the interaction, being able to choose between
different visual encodings within the interface, having those interactive controls there for
not only changing the visual encoding, but also changing the subset of the data. Being
able to filter is very helpful. Having that within the interface without having to go back
into the source code, especially during a live chauffeured demo I want to be able to
quickly change which data is being shown, highlight parts of the data and aggregate.
Having that done interactively is very helpful.
>>: I am trying to figure out which part is sandbox infrastructure and which part is you
built 5 visualizations in advance, didn’t bother making interfaces easy to use because you
are the only person, which is cool. I am totally in favor of that. I am just trying to
formalize in my mind what sandbox is, where there is a stage in my design process.
>> Matthew Brehmer: I guess probably after that phase where you develop a few stand
alone interactive prototypes, but particularly in my case I want to understand what the
task sequence is and understand how we can quickly go between them. I have found that
it is important to be able to do that interactively. Some of the design choices that are in
the sandbox do end up continuing and adopting by our collaborators. So it is helpful to
have that already envisioned and especially with the second project, the time line project,
there is that first face where you just develop a bunch of stand alone ideas. But having
the ability to filter, aggregate and highlight is very important.
>>: It maintains the state between different views so that you can select over here
[indiscernible].
>> Matthew Brehmer: Yes, yes.
>>: What would this filtered set look like [indiscernible].
>>: So a sandbox is a data management system, a state management system, plus
[indiscernible]?
>> Matthew Brehmer: Yes.
>>: So the early work on the past topology, as you mentioned it has influenced a lot of
different areas of InfoVis and part of that was the goal, to bring unification. Were there
people that have adopted it that surprised you or that sort of were expected?
>> Matthew Brehmer: I can tell you about the weirdest example. My master’s thesis was
at I think a university in Germany on human plant interaction. So it was an MFA thesis
about when do you design and grow plants in a particular fashion in response to data?
Very weird, very confusion, but he used the vocabulary of our topology described when a
certain design would be useful for a particular goal to translate some findings in the
plants and how they were grown. That was the weirdest one, but people have been using
it in a lot of the things that we did expect which was nice.
So in a design study, in a particular domain, it might not be very useful to just talk about
this very specific domain task, but now those findings can be communicated using this
vocabulary. We have seen a few instances of that, at least one in bioinformatics and a
few different domains. I mentioned Malware analysis was another domain that people
have used our vocabulary for. So, most of the time it has been in these cases where we
did expect how it would be used and what I have liked is that people have been still
continuing to taxonomize tasks in very specific data types or domains but they are now
using our vocabulary and allows this translation to happen a little bit more easily now.
>>: But still the paper is not that old, but how within the InfoVis community or main
community who well adopted do you think it has been even in sort of the last year and
seeing it go forward? Is it, “Well yes a few people are using it or are you seeing such a
growing adoption?”
>> Matthew Brehmer: At the InfoVis conference this past year I saw quite a few people
siting it and actually building off of it or saying, “This is our new tool and according to
the past topology of [indiscernible] and [indiscernible], this is now the tasks that our tool
supports and we use this vocabulary and some people are actually adopting some of our
little diagrams that we have too, just the visual style of it to, which is really amusing.
Yeah, Selima?
>>: This is a followup to Craig’s question: Have you found that it has broken down in
any way based on people using it or have you had to iterate on it and change things as a
result of people trying to use it?
>> Matthew Brehmer: Yeah, we have done a little bit of iteration on it in our BELIV
paper the next year. We did sort of do a little bit of rearrangement based on what we
thought made more sense, especially around the biggest produced tasks. And there has
been a few instances where some people have said, “We couldn’t really fit our task into
this mold, particularly on this visual analytics side,” because there are a whole bunch of
other tasks that are involved in that process that may not really fit this visualization
centric task topology.
So there was a paper by Michael Settlemyer and that group out of Vienna on visual
parameter space analysis and they actually admitted it in their paper. We tried to fit their
task to this topology, but things like sensitivity analysis and a couple of other tasks in this
paper, but we couldn’t really find the right terms to fit it, which I think is fine. It is
probably appropriate in some cases, but not all cases and especially with these more
visualization centric projects. They may work better than if you are looking a visual
analytics or things where visualization is part of, but not the entire picture.
>> Bongshin Leev: Okay, thank you.
>> Matthew Brehmer: Thanks.
[Applause]
Download