- Simon Fraser University

advertisement
A Focus + Context Technique for Visualizing a Document Collection
Dustin Dunsmuir, Eric Lee, Chris D. Shaw, Maureen Stone, Robert Woodbury, John Dill
School of Interactive Arts and Technology, Simon Fraser University
dtd@alumni.sfu.ca, ela10@sfu.ca, shaw@sfu.ca, stone@stonesc.com, robw@sfu.ca, dill@sfu.ca
Abstract
Investigative analysts need overviews of large
amounts of data, which is a challenge when working
with non-numerical data such as document collections.
We present Semantic Zoom View (SZV), an interactive
document collection visualization implemented as part
of the CZSaw visual analytics system. SZV uses a focus
+ context technique to provide an overview with
details on demand through interactive semantic
zooming. SZV lets an analyst easily and quickly see the
main topics of a document collection while keeping
surrounding documents visible for context. Working
within a single integrated visualization, an analyst can
also quickly find related documents and break a large
document collection into smaller meaningful groups.
SZV’s focus + context technique was compared to an
overview + detail version for finding answers within a
document collection and results indicated its strength
for maintaining visibility of a full overview when
document contents are accessed.
1. Introduction
An overview of a document collection can greatly
improve the analysis process by providing the analyst
with the key themes within the collection as a starting
point and as context for their query results [6, 16]. The
traditional approach without an overview involves an
iterative process of queries and reading of documents.
This is very time consuming. Analysts may
prematurely form hypotheses because they focus too
narrowly and do not have time to consider alternatives.
This may lead to a bias towards these hypotheses in
further investigation. Overviews of a document
collection must provide a quick method for analysts to
chose documents and view their content while
maintaining contextual overview. We present Semantic
Zoom View (SZV), an interactive overview of a
document collection that allows quick access to
document content directly from the overview using a
focus + context technique and semantic zoom.
Many methods have been developed for visualizing
a document collection by building a model using
natural language processing or text analytics methods.
Entity extraction tries to automatically identify
keywords such as people, places and organizations.
Documents can contain multiple entities and any entity
can occur in multiple documents, so the result of entity
extraction is a network of documents and entities
where edges represent relationships between entities
and documents such as co-citation of entities in a
document.
CZSaw, a visual analytics tool introduced in 2009,
uses such a document-entity model [7]. It focuses on
capturing and supporting the analysis process using an
underlying script as well as state and process
visualizations. The initial data visualization created for
CZSaw was a Hybrid view – a graph view where nodes
can be viewed at varying levels of detail or
aggregation. This view lets analysts investigate
networks of entities and documents by focusing on
their relationships, displaying them as edges in a nodelink visualization with which an analyst iteratively
studies the network. SZV’s first objective is to provide
a useful overview of a document collection to indicate
its most salient entities along with the distribution of
documents containing these and related entities. A
second objective is to provide quick access to
document content so analysts can readily move from
investigating overall themes to viewing documents in
detail. To support this, SZV lets analysts semantically
zoom documents to see more detail, including the full
set of entities within the document, and at full zoom,
the document’s full text.
The third objective is to support document
exploration in context; SZV’s semantic zoom uses a
focus + context technique to show the expanded
document in the context of the overview. Using
brushing and linking of entities, SZV also provides the
analyst with a quick method of finding documents
related to those being viewed. Brushing and linking of
entities is a direct manipulation method that instantly
highlights all other documents containing the brushed
entity, reducing analyst reliance on textual queries.
Finally, since some documents are more useful than
others or more applicable to specific hypotheses
analysts need to categorize documents. SZV allows the
analyst to create a hierarchy of document groups,
i.e.structuring a collection by organizing it into more
easily understood subsets.
The remainder of this paper is organized as follows.
We first present related work in visualizations for
analyzing document collections as well as similar focus
+ context interfaces. We then describe the design of
SZV and present an evaluation comparing it to an
alternative overview + detail version. Finally, we
describe future work and conclude the paper.
2. Related work
The research agenda for Visual Analytics,
Illuminating the Path [14], serves as a roadmap for the
challenges facing the field and summarizes related
work of potential use in visual analytics. This section
focuses on earlier approaches to document collection
analysis of document collections and past uses of focus
+ context.
In-Spire [6], developed from SPIRE [16], provides
an overview of a document collection using statistical
properties of the text. Its Galaxy and Theme views
display documents as glyphs in a spatial layout in 2D
and 3D respectively, placing documents with many
words in common closer together. This results in
clusters of similar documents that In-Spire labels with
their most frequent words. Document content is viewed
in another window. Groups can be created to store the
results of a query and these groups are colour-coded.
Outlier documents can be removed and the view layout
recalculated, providing dynamic view control.
SZV resembles In-Spire’s Galaxy view since it also
displays an overview with flexible controls to
recalculate the visualization. The major differences
between the layout algorithms are use of a documententity model and a semantic zoom directly within the
main view of the latter. SZV and In-Spire both contain
grouping mechanisms, but SZV uses containment
instead of colour (Section 3.4).
Starlight, another Visual Analytics tool developed
at Pacific Northwest National Laboratory, visualizes
structured and unstructured text, images, maps, and
relationships between them [10]. Its Similarity Plot is
similar to In-Spire’s Galaxy view, except it is 3D. It
also has a Data Sphere view showing data items as
glyphs, spatially grouped using a chosen field’s value.
SZV’s layout is continuous, like the Similarity Plot and
Galaxy view; however, the analyst can create groups
that are the results of a search for specific entities.
Georgia Tech’s Jigsaw [12] was a main precedent
for the CZSaw system [7]. Its Document Cluster view,
similar to Starlight’s Data Sphere, displays documents
as glyphs and allows spatial grouping depending upon
whether documents are in a query result or contain
certain entities. Documents can be highlighted by
brushing in other data views. SZV also has brushing to
find all documents containing an entity of interest.
In SZV, analysts access document details by
semantically zooming document glyphs. In this
technique, the amount of object detail is adjusted as it
is zoomed to always display a useful amount for the
space available. Thus, at a smaller size, a document
may be represented by a simple rectangle, at a medium
size by a summary, and at its largest size by its full
text. SZV also uses a focus + context technique to
maintain visibility of all glyphs in the view when some
are zoomed.
SZV is similar to some earlier focus + context
visualizations of hierarchical graphs. One such
algorithm is the Continuous Zoom (CZ), which allows
users to open and close cluster nodes to see their
internal nodes [11]. Opened nodes expand to take up
more space and the algorithm shrinks the rest of the
graph as needed to keep the entire graph onscreen. The
CZ algorithm has been used in a browser history
program, CZWeb [2], and a discussion thread program,
CZTalk [8]. The Simple Hierarchical MultiPerspective (SHriMP) system visualizes software
systems architecture as a nested graph [13] and uses a
layout adjustment algorithm similar to CZ. Its
adjustment algorithm is also used by SZV and is
designed to preserve proximities between objects in the
layout, which helps maintain clusters.
3. Design
We describe SZV’s design goals (how we intend an
analyst to use the system) rather than performance
claims (how an analyst actually uses the system).
Please see the accompanying video for a demonstration
of the features described.
3.1. Overview visualization
Analysts use an overview of a document collection
to understand key topics and entities, to look for what
is unusual and to discover interesting entities to
explore. SZV’s overview provides just such a starting
point for an analysis and a context for the analyst’s
queries of the document collection. Each document is
initially displayed as a small grey rectangle and all
documents are shown onscreen at once. The entities
represent the who, where, and when properties of the
document, so this set of entities in each document
determines the overview layout. Documents are placed
near other documents with which they share multiple
entities. Thus an analyst can expect that documents
closest to the one she is viewing are the ones most
likely to contain the same people, places, dates or other
entities. In most document collections, there will be
multiple documents containing related entities, so these
form clusters within the view. SZV displays a short
summary of each cluster by labeling it with its three
most frequent entities. Figure 1 shows an overview
layout with cluster labels.
An analyst may not be interested in all entity types,
wanting instead to focus for example only on locations
and dates within the document collection. Furthermore,
an analyst may be interested in analyzing only some
documents within the collection. SZV’s layout
algorithm can be set to use only some of the entity
types and/or apply to only some of the document
collection. The analyst can create a layout of all
documents clustering them only by location and date.
Later, when investigating an interesting cluster of
documents, an analyst can create a new layout for the
subset of documents in the cluster of interest, perhaps
only using person and organization entities. Thus, the
documents in the original cluster would become subclusters based on person and organization. In this way,
the analyst can flexibly investigate document aspects
relevant to their current interest.
The layout algorithm was designed to be simple
while clearly and accurately visualizing the similarities
between the documents’ entity sets. The algorithm
comprises three stages. We first measure the similarity
between documents using weighted edges created
between every pair of documents that have at least one
entity in common. The weights of these edges are
determined by the minimum percentage of entities
common to the two documents. All entities count
equally in a document. For example, if a document
contains only four entities, each entity contributes 25%
to the document. If documents A and B have a single
entity in common and document A has 4 entities and
document B has 3 entities, then the entity is 25% of A
and 33.3% of B. The weight of the edge is the
minimum of 25%. This simple algorithm was used
because we have no reason to assume one entity is
more important than another.
In the second stage of the algorithm, a standard
force-directed graph layout algorithm places the
documents in the view based on their weighted edges.
The algorithm, based on the Fruchterman-Reingold
algorithm [3] applies repulsive forces between any pair
of close documents and attractive forces between any
pair of documents connected by an edge. A higher
edge weight results in a stronger attractive force. This
algorithm does not explicitly create clusters; however,
the forces pulling together documents usually result in
perceived clusters. Also, note that the axes of the
resulting view do not have meaning; rather what is
important is the relative distance between documents.
In the final stage of the algorithm, clusters are
created based on the new proximities of documents. No
Figure 1. SZV's overview showing document clusters containing similar entities, the clusters are
labelled by their three most frequent entities.
edges are displayed in this layout. SZV instead uses
Ward’s hierarchical clustering method to determine
cluster membership [15]. Finally, each cluster’s three
most frequent entities are determined and displayed as
a label, centered at the cluster’s centroid (to reduce
clutter, clusters of less than 3 documents are not
labelled) (Figure 1).
We next describe document semantic zooming and
the adjustment algorithm that maintains documents’
relative positions by moving them so they are not
covered by the expanding document.
3.2. Semantic zooming
Semantic zooming in SZV is designed to provide
quick, tiered, in-place revelation of a document’s
content, displaying a useful level of detail throughout
the zooming process (Figure 2). Using semantic
zooming, an analyst can quickly access only the level
of detail he needs. As a document glyph increases in
size at each zoom level increase, the growing space
displays first the name of the document, then the
number of types of entities it contains, then the text of
these entities, and finally the full text of the document.
Intermediate zoom levels (2, 3 & 4) provide
document summaries that help an analyst quickly
determine if a document is useful to them. Entities are
colour coded by type (throughout CZSaw), so semantic
zoom levels 3 and up show the number of entities and
their colour-coded type. For example, if analysts are
interested in people in a set of documents, by zooming
to level 3 they can easily tell which documents contain
people entities by the colour-coding, as we observed in
the evaluation (Section 4). This reduces the time, space
and effort needed to investigate the people in
documents. A grid of entities is used instead of a list
because a long list could take up much vertical space
while using little horizontal space (and need a scroll
bar). A grid maintains a closer to square aspect ratio
(matching other semantic zoom levels) while
Figure 2. A document’s 5 levels of zoom.
displaying at least part of the value of all entities (e.g.
for a person, value = ‘name’). We keep all entity
rectangles the same size, but some entities have a long
value text-string, so their values are truncated - the full
value is available via mouse over. Also, documents
with more entities will be larger since the entity grid
will be larger.
Analysts can use the scroll wheel to zoom in and
out of documents in SZV (a common zooming method
in applications such as online map websites [4]),
providing the analyst with fine control over the level of
detail for each document. Either a single or multiple
documents may be zoomed. The latter is essential for
quick contextual comparison of multiple documents.
The zooming mechanism is implemented using
Zoomable Visual Transformation Machine (ZVTM)
[9], a Java toolkit. Each document consists of glyphs
on its own “virtual canvas”. A “virtual camera” points
at each canvas and the current view from each camera
is displayed in its own onscreen portal. There is a
mapping from camera altitude to the semantic zoom
levels of a document. When the semantic zoom level
changes (Figure 2), the visibility of the glyphs that
make up the document on the canvas also change
accordingly. As one or more documents are zoomed,
the surrounding documents must be moved so that they
remain visible to provide context.
3.2.1. Focus + context. In many applications, zooming
causes the entire view to grow and consequently much
of the view moves off the sides of the display. SZV’s
initial overview (Section 3.1) has value that we do not
wish to lose, so we chose to simultaneously show both
focus documents and the overview. This allows us to
perform queries and see the results across the entire
view even when looking at some documents in detail.
In order to keep the analyst’s focus in one place, we
zoom documents in-place rather than show details in a
separate window.
To make room for zoomed documents, other
documents are moved aside (Figure 3). Clusters are
maintained spatially, so the context remains visible
when zooming. To compute the movement of
document glyphs, we use ShriMP’s layout algorithm
for nested graphs [13]. The algorithm is designed to
preserve the relative proximities of visible items. Each
document (not being zoomed) is moved along a line
through its center and the center of the expanding or
shrinking document. Each document is moved away
from the focus document if it is expanding or towards
the focus document if it is shrinking using a linear and
reversible transformation. The distance along its line
that each document moves is equal to the distance the
document expands or shrinks along the line. This is the
distance along the line that the expanding or shrinking
Figure 3. Left: A cluster layout before any zooming; document to zoom is circled. Right: The
layout with the document zoomed in. Note the change in location of the coloured documents and
that layout clusters are maintained.
document’s closest boundary travels. Since the
algorithm applies to zooming both in and out,
documents can be returned to their original locations,
helping support the user’s mental model of the location
of relevant documents. The movement of documents is
also animated to provide a visually smooth visually
trackable change in the layout as the analyst zooms a
document.
Once all the documents have been moved outward
from the expanding document, the entire layout is then
scaled down to keep it within the bounds of the SZV
panel. This last step moves documents but does not
zoom them - documents are never automatically
zoomed without the analyst requesting it. A side effect
of this decision is that some documents may overlap if
there is not enough space in the panel for all of the
zoomed-in documents. If this happens, the analyst can
zoom out some documents to free up space.
Analysts may want to zoom multiple documents
simultaneously in order to compare their contents.
Currently in SZV, we simply apply our adjustment
algorithm multiple times, once for each document
being zoomed, resulting in a net position change for
each document, which is animated to maintain a
smooth change in the layout.
Zooming into a document both shows its content
and gives the analyst access to a direct query
mechanism called brushing and linking. By clicking
any entity, either in the full text or the grid of entities,
all documents that contain that entity are highlighted.
collection. In SZV, the results of queries are shown as
highlighted documents within the main view so they
remain within their context. Multiple highlighting
colours can be used to identify and compare multiple
queries. The active highlighting colour is chosen from
a toolbar at the top of the view.
SZV’s search feature allows an analyst to apply
outside knowledge to find related content in the current
document collection. Using search, he can quickly find
documents within a specific date range, containing
specific text, containing entities of a specific type, etc.
Documents are highlighted using colour; the results of
multiple searches can easily be compared using
multiple colours. If a document is in the results of
multiple queries it will only be highlighted in the
colour of the most recent query. Each document,
however, remembers its previous queries and colours.
3.3. Query techniques
The ability to perform queries such as search is an
important part of the analysis of a large document
Figure 4. Clicking on an entity brushes it to
highlight all the other documents that
contain the entity.
Therefore, if the highlighting from the most recent
search is removed, documents revert to previous query
highlight colours. Thus, the full results of a previous
query can be easily recovered.
SZV uses brushing and linking to implement a
simple one-term search query. If an analyst discovers
an interesting entity within a document and wants to
locate this entity in the rest of the document collection,
he will need a quick method to do so. SZV offers
brushing and linking of entities for this purpose, which
is much faster than performing a search (Figure 4).
Without moving his focus to a control panel or having
to type a command, the analyst can instantly highlight
all the other documents that contain the given entity.
To perform brushing, he simply clicks the entity as it
appears in its rectangle within the grid of entities or
within the full text of the document. This causes all of
the documents that contain the entity to be highlighted
in the currently active highlight colour.
Just as with search, multiple entities can be brushed
at once, using the same or different highlighting
colours, in order to compare the documents they are in.
The clusters and their labels in the overview can also
provide context because the analyst can see which
clusters of documents contain the brushed entities.
These two query methods can also show the number of
documents in each analyst-created group containing
the entities or search results.
3.4. Grouping documents
The clusters in the overview of a document
collection provide structure to the collection to help an
analyst cope with a large document collection and
determine which parts to investigate. After some
analysis, the analyst may need a method of keeping
track of interesting subsets of documents. In SZV, an
analyst can create a new group from any subset of
documents, which then will be kept together onscreen
and can be visualized in different ways to see the
combined set of entities (Figure 6) or the full text of
each document (Figure 7). The analyst also can create
Figure 5. A group’s document tab.
a hierarchy of groups in order to further structure the
document collection, based on any combination of the
contained text, its usefulness to her, and its role in
different hypotheses she may be pursuing in her
analyses.
Each grouping action is recorded in CZSaw’s script
language, which allows for the same groups to be
recreated during a later session or by other
collaborating analysts. This recording allows an analyst
to break a large document collection down into
meaningful groups that a team of analysts can then
investigate by assigning each group to one or more
analysts. This approach was taken by some of the
Simon Fraser University team for an award winning
entry to the VAST Mini Challenge 1 in 2010 [1]. The
Mini Challenge required a description of illegal arms
dealing activity by country [5], so a group was created
for each country. These groups were formed following
a student’s analysis using searches, brushing of
entities, and quick skimming of document text. Once
this grouping process was completed, each group was
more thoroughly investigated by another student from
the team by running the previously recorded script
within CZSaw. Thus, students took on a divide and
conquer approach rather then analyzing the entire
document collection individually.
To perform grouping of a set of documents, an
analyst draws a rectangle around the desired
documents. After entering a name for the selection, the
new group is created as shown (Figure 5). Groups can
be moved as a unit or closed to hide documents from
view. Each group has three tabs to show different
aspects of contained documents. A document tab
displays documents normally, i.e., the same as when
ungrouped, and allows each to be zoomed. The other
tabs allow browsing the contents of a group of
documents using less effort, time and space than
zooming into all of the grouped documents. These
other visualizations of the group’s documents provide
an advantage over In-Spire’s grouping mechanism
which uses colour to display group membership. In
addition, In-Spire cannot contain groups within other
groups.
The entity tab (Figure 6) displays the combined set
of all entities within all of the group’s documents. This
acts as a more compact and faster-to-access summary
of the “who, where, and when” of the documents than
zooming into all contained documents. Entities in the
grid can be brushed.
The text tab (Figure 7 ) allows an analyst to read
the text of each document, one by one. Thus she can
get the full details of the events described by this
subset of documents. She can instantly sort the
document list by date to read documents in the order
they were created.
Groups can be created for a variety of analytical
tasks, ranging from gathering outliers in order to close
it and hide them, to collecting documents that support a
Figure 6. A group’s entity tab.
Figure 7. A group’s text tab.
particular hypothesis Regardless of the use, document
membership within groups can be easily updated by
dragging and dropping. This action updates all tabs of
that group, keeping these group perspectives live.
relevant documents. We wanted to know if placing
document content within its context and making it
accessible through a semantic zoom led to faster or
more accurate performance.
The study compared zoom and popup versions of
the interface. This was a between-subjects study design
so each participant was only trained to use one of the
interfaces and thus could spend more time with it. The
zoom condition consisted of a simplified version of
SZV in which all control panels had been removed
leaving only the main view panel. In both conditions,
an overview layout and initial search were completed
for the participants before each task and the
participants could not perform their own search, layout,
or grouping.
The popup condition initially looked the same as
the zoom condition, but double clicking document
glyphs opened their contents in separate popup frames
rather than zooming them (Figure 8). These popups
were displayed in a layer above the unaltered
overview, placed along the top of the view and
covering that part of it. Participants could move and
resize the popups to see what was under them and
control how much space was used for the document
contents. Instead of a using a semantic zoom, all
document content, including the entity grid and full
text, were displayed together in the popup. Brushing
and linking of entities was the same in both conditions
(Section 3.3).
This alternative was chosen to be very similar to
the zoom condition in all aspects except the ones we
wished to test – the focus + context and the semantic
zoom. For example, if we had instead displayed
document content in a completely different window (as
in In-Spire) this would have led to more differences
between interfaces. Using completely different
windows would leave the problem of how to inform
the participant of which document’s content she was
currently viewing, a problem addressed here using the
lines connecting document glyphs to their content
4. Evaluation
To determine if this new technique was beneficial
for the analysis process and to identify issues that need
to be addressed, an evaluation was performed
comparing a simplified version of the new technique to
an overview + detail version. This comparison allowed
us to investigate a basic version of the technique before
establishing all of the features needed for a future
evaluation. For this evaluation, we compared the focus
+ context and semantic zoom techniques (“zoom”) to
an alternative approach (“popup”) for simple analysis
tasks consisting of opening documents, viewing their
contents, and performing brushing to locate other
Figure 8. A view of the popup condition
interface immediately following brushing of
an entity.
(Figure 8).
Twenty students (graduate and undergraduate) at
Simon Fraser University participated in the study, ten
per condition. Each participant was given a short
training session (approximately 5 minutes) during
which s/he was shown:
1 How to zoom into or open documents.
2 The meaning of the different zoom levels or popup
parts, i.e., that entity rectangles were entities in the
document.
3 A description of the layout of documents, e.g., how
documents closer to each other were similar.
4 How to perform brushing.
5 How to select multiple documents and zoom into
(“zoom”) or open (“popup”) them together.
At the end of the training, we demonstrated how to
solve two example tasks. For each task, the answer to
the question asked could be found within the text of
one or more documents. For each task, participants had
to open documents after a search was done for her/him
based on the cluster labels. For four of the task
questions (1, 2, 3, and 5), the answer was directly
within this first set of documents. For the other 6 tasks,
questions had two parts. The answer to the first part
was an entity that had to be brushed to highlight the
documents that contained the answer to the second
part. Task questions differed by the number of
documents that were initially highlighted, the number
of documents that contained the answer, whether the
answer was an entity or not, and whether the answer
required comparing multiple documents. Each task
question involved a different subset of documents from
the collection of 103 documents used for the VAST
2010 Mini Challenge 1 [5]. Below is an example of a
two part question.
7a. All documents containing “Lashkar-eJhangvi” are highlighted. Find Maulana Haq
Bukhari, who is a suspected leader of this
terrorist group.
7b. There is a bank account suspected of being
owned by him. What are the first 3 letters of the
account?
To correctly determine the answer to this question,
participants had to open the initial highlighted set of
seven documents, and then brush the Maulana Haq
Bukhari person entity, which would highlight four of
the initial documents plus two new (unopened)
documents. Participants had to open one or both of
these new documents in order to find the answer
within. Fewer people in the popup condition answered
question 7 correctly. The analysis of this question
resulted in the only statistically significant difference
between conditions: a one-sided Fisher’s exact test
resulted in a p-value of 0.043 (see Figure 9). This was
also the question with the largest difference in mean
completion time between conditions with a mean (std.
dev.) in seconds of 84 (26) for zoom and 155 (126) for
popup. No significant differences in time were found.
Other useful information from the evaluation
resulted from feedback obtained on the use of the tools
by directly observing the participants, recording the
screen while they solved the tasks, and reading
feedback from a post-study questionnaire given to
them.
By observing participants in the popup condition, it
was clear that question 7 involved a problem that other
questions did not, thus leading to wrong answers by
participants who failed to locate the brushed
documents. For question 7, the two new documents
highlighted from brushing were located near the top of
the view. For those participants that resized popups to
see more of the text, these two documents were
covered by the open popups. In this situation, it was
necessary to move the popups, reduce their size, or
close them to reveal the highlighted document glyphs
behind. Participants who failed to do so answered the
question incorrectly. This situation demonstrated the
benefit of having the document contents embedded
directly in the main view’s context instead of covering
parts of the view. We can presume that the accuracy
would have decreased further had the highlighted
documents been closer to the top of the screen, where
they would have been covered by any open popups by
default.
Question 5 was the only question where accuracy
was worse for the zoom condition. In this question
participants had to look for cluster labels matching
parts of the question since they were not given an
initial search. Once the correct cluster was found
participants needed to zoom into at least one of the
documents and brush the correct entity to determine
how many documents it was in. The 3 wrong answers
were the result of participants not taking this step, but
Figure 9. Participant’s accuracy in answering
the 10 questions, for both conditions. All
zoom condition participants answered
question 7 correctly, but only 6 popup
participants answered it correctly.
instead assuming the entity was in all the documents
that they considered to be in the cluster and none of the
ones outside it. This was a false assumption however,
since entities can be shared across clusters. More
research would be needed to determine if this mistake
is at all related to the interface condition.
Such qualitative results of the study were valuable
for understanding how the SZV tool can be used and
demonstrating strengths and weaknesses of both
interfaces. The popup interface offered participants
more freedom in resizing or moving the popups;
however, this led to more hiding of the overview
beneath. One participant covered almost half of the
overview with open documents in order to read many
at once. Two participants in the zoom condition wanted
more freedom to move documents to compare them
more easily (side by side), a feature available in the full
SZV along with the grouping of documents.
Participants in both conditions had minor problems
opening documents. In the popup condition,
participants confused clicking and double clicking for
the opening and brushing actions, causing some
frustration. In the zoom condition, four participants
commented that the zoom was too slow and that not all
the semantic zoom levels were useful to them. For
example, they did not find the name of the document
(level 2) useful. This level provides minimal
information about a document that should be found
more useful during a longer analysis process when the
analyst may recognize the name from having seen it
before.
This study had some limitations. First, we are
aware that the use of real analysts would have been
more useful for assessing the new technique; but we
still received valuable feedback that can be used to
improve the tool before we put it in the hands of such
analysts. Second, we used a contrived data set and
questions with definite answers easily available in the
data. This may be quite removed from analyst’s typical
tasks; but it was necessary in order to measure
accuracy and keep the tasks short and able to be done
by students. Finally, participants may have been more
successful at using the interfaces if we had given them
time to play freely with the tool before they began the
main tasks.
5. Future work
The most useful results from the evaluation were
from direct observations of participants, screen
recordings while participants solved the tasks, and
feedback from the post-study questionnaire.
We will investigate alternative controls for the
zooming within SZV because some participants
considered the scroll wheel too slow. In addition, the
use of the scroll wheel for zooming meant it could not
be used for scrolling the text pane of a document
(Figure 2). An alternative that could be compared to
the original controls in a study is to use the left and
right mouse buttons or arrow keys to move between
semantic zoom levels.
Visual analytics applications need to handle large
data sets. SZV’s layout algorithm must be improved so
that it can lay out thousands of documents at an
acceptable speed. Currently, it cannot accomplish this
because the force-directed layout it uses does not run in
linear time. We need to investigate the use of faster
algorithms that still make use of CZSaw’s document
and entity model. We also must investigate methods
for handling larger datasets than what can be displayed
onscreen at once. The use of the group hierarchy will
aid in solving this problem by displaying groups of
documents rather than all individual documents.
We also plan to push for access to actual analysts
for future studies, who would give us valuable
feedback by directly using the SZV tool on their own
data, solving their own tasks.
We plan to further explore the affordances of rapid
lightweight organizational tools so that analysts can
make incremental commitments to analytical findings.
We intend to introduce more formal structures that can
be easily attached to groups, so that as analysts develop
more certainty about their findings, they can impose
more structure that can be communicated to other team
members, and can potentially be used by the rest of the
system to enable semi-automated reasoning techniques.
Semantic Zoom View will also be improved based
on feedback from the initial evaluation in order to offer
a more integrated and useful environment for the
analysis of large text document collections.
6. Conclusion
This paper has introduced a focus + context
technique for providing an overview of a document
collection with a semantic zoom into any subset of
documents. It provides quick access to document
contents, e.g., revealing full text and entities, through
the semantic zoom. Analysts can rapidly find related
documents by brushing entities, with no need to type in
queries or change focus outside of one integrated view.
To organize documents related to their hypotheses and
structure the document collection, analysts can create
new groups of documents, view their combined set of
entities, or read the documents one by one.
We view the role of the grouping mechanism as
one of central importance in the act of applying
analytical findings during the document analysis
process. Our intent with this design was to provide a
lightweight and highly flexible means of developing
analytical findings of document relatedness. This
grouping mechanism enables the analyst to impose a
tentative order on a part of the document collection so
that s/he can 1) focus attention on the document subset,
2) assert that the grouped documents are related, and 3)
share that assertion with other members of the analysis
team. As the analysis of a particular document set
matures, such groups may be reworked to reflect the
analyst’s improved understanding. Our grouping
mechanism helps minimize the costs of such necessary
refactoring.
Our experience using SZV to solve the 2010 VAST
challenge showed the rapid grouping and sharing
afforded by this system enabled our geographically
distributed team to conduct a shared analysis without
excessive rebuilding of the work of other team
members.
7. References
[1] V. Chen, D. Dunsmuir, S. Alimadadi, E. Lee, J. Guenther,
J. Dill, C. Qian, C.D. Shaw, M. Stone, and R. Woodbury.
Model based Interactive Analysis of Interwoven, Imprecise
Narratives. Proceedings of IEEE Symposium on Visual
Analytics Science & Technology. pp. 275-276. 2010.
[2] G. Collaud, J. Dill, C.V. Jones, and P. Tan. The
Continuously Zoomed Web - A Graphical Navigation Aid for
WWW. IEEE Visualization Late Breaking Hot Topics
Papers, 1-3. 1996.
[3] T.M.J. Fruchterman and E.M. Reingold. Graph Drawing
by Force-directed Placement. Software: Practice and
Experience, vol. 21, no. 11, pp. 1129-1164. 1991.
[4] Google Maps. Accessed on March 18, 2011.
http://maps.google.com/. 2011.
[5] G. Grinstein, C. Plaisant, J. Scholtz, and M. Whiting.
Text Records – Investigations into Arms Dealing. Visual
Analytics Benchmark Repository: VAST Challenge 2010.
Accessed on November 24, 2010.
http://hcil.cs.umd.edu/localphp/hcil/vast/archive/task.php?ts_
id=148.
[6] E. Hetzler, and A. Turner Analysis Experience using
Information Visualization. IEEE Computer Graphics and
Applications, vol. 24, no. 5, pp. 22-26. 2004.
[7] N, Kadivar, V. Chen, D. Dunsmuir, E. Lee, C. Qian, J.
Dill, C. Shaw, and R. Woodbury. Capturing and Supporting
the Analysis Process. Proceedings of IEEE Visual Analytics
Science & Technology, pp. 131-138. 2009.
[8] H. Lam, B. Fisher, and J. Dill. A Pilot Study of CZTalk:
A Graphical Tool for Collaborative Knowledge Work.
Proceedings of the Hawaii International Conference on
System Sciences. 2005.
[9] E. Pietriga. A Toolkit for Addressing HCI Issues in
Visual Language Environments, IEEE Symposium on Visual
Languages and Human-Centric Computing, pp. 145-152.
2005.
[10] J.S. Risch, D.B. Rex, S.T. Dowson, T.B. Walters, R.A.
May, and B.D. Moon. The Starlight Information
Visualization System. IEEE Proceedings of the Conference
on Information Visualization, pp. 42-49. 1997.
[11] D. Schaffer, Z. Zuo, S. Greenberg, L. Bartram, J. Dill, S.
Dubs, and M. Roseman. Navigating Hierarchically Clustered
Networks through Fisheye and Full-Zoom Methods. ACM
Transactions on Computer-Human Interaction, vol. 3, no. 2,
pp. 162-188. 1996.
[12] J. Stasko, C. Görg, and Z. Liu. Jigsaw: Supporting
Investigative Analysis through Interactive Visualization.
Information Visualization. vol. 7, no. 2, 118-132. 2008.
[13] M-A.D. Storey and H. Müller. Graph Layout
Adjustment Strategies. Proceedings of the Symposium on
Graph Drawing, vol. 1027. pp. 487-499. 1996.
[14] J.J. Thomas, and K.A. Cook. Illuminating the Path. The
Research and Development Agenda for Visual Analytics.
IEEE. 2005.
[15] J.H. Ward Jr. Hierarchical Grouping to Optimize an
Objective Function. Journal of the American Statistical
Association vol. 58, no. 301, pp. 236-244. 1963.
[16] J.A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M.
Pottier, A. Schur, and V. Crow Visualizing the Non-Visual:
Spatial Analysis and Interaction with Information from Text
Documents. IEEE Proceedings of Information Visualization.
pp. 51-58. 1995.
Download