A Focus + Context Technique for Visualizing a Document Collection Dustin Dunsmuir, Eric Lee, Chris D. Shaw, Maureen Stone, Robert Woodbury, John Dill School of Interactive Arts and Technology, Simon Fraser University dtd@alumni.sfu.ca, ela10@sfu.ca, shaw@sfu.ca, stone@stonesc.com, robw@sfu.ca, dill@sfu.ca Abstract Investigative analysts need overviews of large amounts of data, which is a challenge when working with non-numerical data such as document collections. We present Semantic Zoom View (SZV), an interactive document collection visualization implemented as part of the CZSaw visual analytics system. SZV uses a focus + context technique to provide an overview with details on demand through interactive semantic zooming. SZV lets an analyst easily and quickly see the main topics of a document collection while keeping surrounding documents visible for context. Working within a single integrated visualization, an analyst can also quickly find related documents and break a large document collection into smaller meaningful groups. SZV’s focus + context technique was compared to an overview + detail version for finding answers within a document collection and results indicated its strength for maintaining visibility of a full overview when document contents are accessed. 1. Introduction An overview of a document collection can greatly improve the analysis process by providing the analyst with the key themes within the collection as a starting point and as context for their query results [6, 16]. The traditional approach without an overview involves an iterative process of queries and reading of documents. This is very time consuming. Analysts may prematurely form hypotheses because they focus too narrowly and do not have time to consider alternatives. This may lead to a bias towards these hypotheses in further investigation. Overviews of a document collection must provide a quick method for analysts to chose documents and view their content while maintaining contextual overview. We present Semantic Zoom View (SZV), an interactive overview of a document collection that allows quick access to document content directly from the overview using a focus + context technique and semantic zoom. Many methods have been developed for visualizing a document collection by building a model using natural language processing or text analytics methods. Entity extraction tries to automatically identify keywords such as people, places and organizations. Documents can contain multiple entities and any entity can occur in multiple documents, so the result of entity extraction is a network of documents and entities where edges represent relationships between entities and documents such as co-citation of entities in a document. CZSaw, a visual analytics tool introduced in 2009, uses such a document-entity model [7]. It focuses on capturing and supporting the analysis process using an underlying script as well as state and process visualizations. The initial data visualization created for CZSaw was a Hybrid view – a graph view where nodes can be viewed at varying levels of detail or aggregation. This view lets analysts investigate networks of entities and documents by focusing on their relationships, displaying them as edges in a nodelink visualization with which an analyst iteratively studies the network. SZV’s first objective is to provide a useful overview of a document collection to indicate its most salient entities along with the distribution of documents containing these and related entities. A second objective is to provide quick access to document content so analysts can readily move from investigating overall themes to viewing documents in detail. To support this, SZV lets analysts semantically zoom documents to see more detail, including the full set of entities within the document, and at full zoom, the document’s full text. The third objective is to support document exploration in context; SZV’s semantic zoom uses a focus + context technique to show the expanded document in the context of the overview. Using brushing and linking of entities, SZV also provides the analyst with a quick method of finding documents related to those being viewed. Brushing and linking of entities is a direct manipulation method that instantly highlights all other documents containing the brushed entity, reducing analyst reliance on textual queries. Finally, since some documents are more useful than others or more applicable to specific hypotheses analysts need to categorize documents. SZV allows the analyst to create a hierarchy of document groups, i.e.structuring a collection by organizing it into more easily understood subsets. The remainder of this paper is organized as follows. We first present related work in visualizations for analyzing document collections as well as similar focus + context interfaces. We then describe the design of SZV and present an evaluation comparing it to an alternative overview + detail version. Finally, we describe future work and conclude the paper. 2. Related work The research agenda for Visual Analytics, Illuminating the Path [14], serves as a roadmap for the challenges facing the field and summarizes related work of potential use in visual analytics. This section focuses on earlier approaches to document collection analysis of document collections and past uses of focus + context. In-Spire [6], developed from SPIRE [16], provides an overview of a document collection using statistical properties of the text. Its Galaxy and Theme views display documents as glyphs in a spatial layout in 2D and 3D respectively, placing documents with many words in common closer together. This results in clusters of similar documents that In-Spire labels with their most frequent words. Document content is viewed in another window. Groups can be created to store the results of a query and these groups are colour-coded. Outlier documents can be removed and the view layout recalculated, providing dynamic view control. SZV resembles In-Spire’s Galaxy view since it also displays an overview with flexible controls to recalculate the visualization. The major differences between the layout algorithms are use of a documententity model and a semantic zoom directly within the main view of the latter. SZV and In-Spire both contain grouping mechanisms, but SZV uses containment instead of colour (Section 3.4). Starlight, another Visual Analytics tool developed at Pacific Northwest National Laboratory, visualizes structured and unstructured text, images, maps, and relationships between them [10]. Its Similarity Plot is similar to In-Spire’s Galaxy view, except it is 3D. It also has a Data Sphere view showing data items as glyphs, spatially grouped using a chosen field’s value. SZV’s layout is continuous, like the Similarity Plot and Galaxy view; however, the analyst can create groups that are the results of a search for specific entities. Georgia Tech’s Jigsaw [12] was a main precedent for the CZSaw system [7]. Its Document Cluster view, similar to Starlight’s Data Sphere, displays documents as glyphs and allows spatial grouping depending upon whether documents are in a query result or contain certain entities. Documents can be highlighted by brushing in other data views. SZV also has brushing to find all documents containing an entity of interest. In SZV, analysts access document details by semantically zooming document glyphs. In this technique, the amount of object detail is adjusted as it is zoomed to always display a useful amount for the space available. Thus, at a smaller size, a document may be represented by a simple rectangle, at a medium size by a summary, and at its largest size by its full text. SZV also uses a focus + context technique to maintain visibility of all glyphs in the view when some are zoomed. SZV is similar to some earlier focus + context visualizations of hierarchical graphs. One such algorithm is the Continuous Zoom (CZ), which allows users to open and close cluster nodes to see their internal nodes [11]. Opened nodes expand to take up more space and the algorithm shrinks the rest of the graph as needed to keep the entire graph onscreen. The CZ algorithm has been used in a browser history program, CZWeb [2], and a discussion thread program, CZTalk [8]. The Simple Hierarchical MultiPerspective (SHriMP) system visualizes software systems architecture as a nested graph [13] and uses a layout adjustment algorithm similar to CZ. Its adjustment algorithm is also used by SZV and is designed to preserve proximities between objects in the layout, which helps maintain clusters. 3. Design We describe SZV’s design goals (how we intend an analyst to use the system) rather than performance claims (how an analyst actually uses the system). Please see the accompanying video for a demonstration of the features described. 3.1. Overview visualization Analysts use an overview of a document collection to understand key topics and entities, to look for what is unusual and to discover interesting entities to explore. SZV’s overview provides just such a starting point for an analysis and a context for the analyst’s queries of the document collection. Each document is initially displayed as a small grey rectangle and all documents are shown onscreen at once. The entities represent the who, where, and when properties of the document, so this set of entities in each document determines the overview layout. Documents are placed near other documents with which they share multiple entities. Thus an analyst can expect that documents closest to the one she is viewing are the ones most likely to contain the same people, places, dates or other entities. In most document collections, there will be multiple documents containing related entities, so these form clusters within the view. SZV displays a short summary of each cluster by labeling it with its three most frequent entities. Figure 1 shows an overview layout with cluster labels. An analyst may not be interested in all entity types, wanting instead to focus for example only on locations and dates within the document collection. Furthermore, an analyst may be interested in analyzing only some documents within the collection. SZV’s layout algorithm can be set to use only some of the entity types and/or apply to only some of the document collection. The analyst can create a layout of all documents clustering them only by location and date. Later, when investigating an interesting cluster of documents, an analyst can create a new layout for the subset of documents in the cluster of interest, perhaps only using person and organization entities. Thus, the documents in the original cluster would become subclusters based on person and organization. In this way, the analyst can flexibly investigate document aspects relevant to their current interest. The layout algorithm was designed to be simple while clearly and accurately visualizing the similarities between the documents’ entity sets. The algorithm comprises three stages. We first measure the similarity between documents using weighted edges created between every pair of documents that have at least one entity in common. The weights of these edges are determined by the minimum percentage of entities common to the two documents. All entities count equally in a document. For example, if a document contains only four entities, each entity contributes 25% to the document. If documents A and B have a single entity in common and document A has 4 entities and document B has 3 entities, then the entity is 25% of A and 33.3% of B. The weight of the edge is the minimum of 25%. This simple algorithm was used because we have no reason to assume one entity is more important than another. In the second stage of the algorithm, a standard force-directed graph layout algorithm places the documents in the view based on their weighted edges. The algorithm, based on the Fruchterman-Reingold algorithm [3] applies repulsive forces between any pair of close documents and attractive forces between any pair of documents connected by an edge. A higher edge weight results in a stronger attractive force. This algorithm does not explicitly create clusters; however, the forces pulling together documents usually result in perceived clusters. Also, note that the axes of the resulting view do not have meaning; rather what is important is the relative distance between documents. In the final stage of the algorithm, clusters are created based on the new proximities of documents. No Figure 1. SZV's overview showing document clusters containing similar entities, the clusters are labelled by their three most frequent entities. edges are displayed in this layout. SZV instead uses Ward’s hierarchical clustering method to determine cluster membership [15]. Finally, each cluster’s three most frequent entities are determined and displayed as a label, centered at the cluster’s centroid (to reduce clutter, clusters of less than 3 documents are not labelled) (Figure 1). We next describe document semantic zooming and the adjustment algorithm that maintains documents’ relative positions by moving them so they are not covered by the expanding document. 3.2. Semantic zooming Semantic zooming in SZV is designed to provide quick, tiered, in-place revelation of a document’s content, displaying a useful level of detail throughout the zooming process (Figure 2). Using semantic zooming, an analyst can quickly access only the level of detail he needs. As a document glyph increases in size at each zoom level increase, the growing space displays first the name of the document, then the number of types of entities it contains, then the text of these entities, and finally the full text of the document. Intermediate zoom levels (2, 3 & 4) provide document summaries that help an analyst quickly determine if a document is useful to them. Entities are colour coded by type (throughout CZSaw), so semantic zoom levels 3 and up show the number of entities and their colour-coded type. For example, if analysts are interested in people in a set of documents, by zooming to level 3 they can easily tell which documents contain people entities by the colour-coding, as we observed in the evaluation (Section 4). This reduces the time, space and effort needed to investigate the people in documents. A grid of entities is used instead of a list because a long list could take up much vertical space while using little horizontal space (and need a scroll bar). A grid maintains a closer to square aspect ratio (matching other semantic zoom levels) while Figure 2. A document’s 5 levels of zoom. displaying at least part of the value of all entities (e.g. for a person, value = ‘name’). We keep all entity rectangles the same size, but some entities have a long value text-string, so their values are truncated - the full value is available via mouse over. Also, documents with more entities will be larger since the entity grid will be larger. Analysts can use the scroll wheel to zoom in and out of documents in SZV (a common zooming method in applications such as online map websites [4]), providing the analyst with fine control over the level of detail for each document. Either a single or multiple documents may be zoomed. The latter is essential for quick contextual comparison of multiple documents. The zooming mechanism is implemented using Zoomable Visual Transformation Machine (ZVTM) [9], a Java toolkit. Each document consists of glyphs on its own “virtual canvas”. A “virtual camera” points at each canvas and the current view from each camera is displayed in its own onscreen portal. There is a mapping from camera altitude to the semantic zoom levels of a document. When the semantic zoom level changes (Figure 2), the visibility of the glyphs that make up the document on the canvas also change accordingly. As one or more documents are zoomed, the surrounding documents must be moved so that they remain visible to provide context. 3.2.1. Focus + context. In many applications, zooming causes the entire view to grow and consequently much of the view moves off the sides of the display. SZV’s initial overview (Section 3.1) has value that we do not wish to lose, so we chose to simultaneously show both focus documents and the overview. This allows us to perform queries and see the results across the entire view even when looking at some documents in detail. In order to keep the analyst’s focus in one place, we zoom documents in-place rather than show details in a separate window. To make room for zoomed documents, other documents are moved aside (Figure 3). Clusters are maintained spatially, so the context remains visible when zooming. To compute the movement of document glyphs, we use ShriMP’s layout algorithm for nested graphs [13]. The algorithm is designed to preserve the relative proximities of visible items. Each document (not being zoomed) is moved along a line through its center and the center of the expanding or shrinking document. Each document is moved away from the focus document if it is expanding or towards the focus document if it is shrinking using a linear and reversible transformation. The distance along its line that each document moves is equal to the distance the document expands or shrinks along the line. This is the distance along the line that the expanding or shrinking Figure 3. Left: A cluster layout before any zooming; document to zoom is circled. Right: The layout with the document zoomed in. Note the change in location of the coloured documents and that layout clusters are maintained. document’s closest boundary travels. Since the algorithm applies to zooming both in and out, documents can be returned to their original locations, helping support the user’s mental model of the location of relevant documents. The movement of documents is also animated to provide a visually smooth visually trackable change in the layout as the analyst zooms a document. Once all the documents have been moved outward from the expanding document, the entire layout is then scaled down to keep it within the bounds of the SZV panel. This last step moves documents but does not zoom them - documents are never automatically zoomed without the analyst requesting it. A side effect of this decision is that some documents may overlap if there is not enough space in the panel for all of the zoomed-in documents. If this happens, the analyst can zoom out some documents to free up space. Analysts may want to zoom multiple documents simultaneously in order to compare their contents. Currently in SZV, we simply apply our adjustment algorithm multiple times, once for each document being zoomed, resulting in a net position change for each document, which is animated to maintain a smooth change in the layout. Zooming into a document both shows its content and gives the analyst access to a direct query mechanism called brushing and linking. By clicking any entity, either in the full text or the grid of entities, all documents that contain that entity are highlighted. collection. In SZV, the results of queries are shown as highlighted documents within the main view so they remain within their context. Multiple highlighting colours can be used to identify and compare multiple queries. The active highlighting colour is chosen from a toolbar at the top of the view. SZV’s search feature allows an analyst to apply outside knowledge to find related content in the current document collection. Using search, he can quickly find documents within a specific date range, containing specific text, containing entities of a specific type, etc. Documents are highlighted using colour; the results of multiple searches can easily be compared using multiple colours. If a document is in the results of multiple queries it will only be highlighted in the colour of the most recent query. Each document, however, remembers its previous queries and colours. 3.3. Query techniques The ability to perform queries such as search is an important part of the analysis of a large document Figure 4. Clicking on an entity brushes it to highlight all the other documents that contain the entity. Therefore, if the highlighting from the most recent search is removed, documents revert to previous query highlight colours. Thus, the full results of a previous query can be easily recovered. SZV uses brushing and linking to implement a simple one-term search query. If an analyst discovers an interesting entity within a document and wants to locate this entity in the rest of the document collection, he will need a quick method to do so. SZV offers brushing and linking of entities for this purpose, which is much faster than performing a search (Figure 4). Without moving his focus to a control panel or having to type a command, the analyst can instantly highlight all the other documents that contain the given entity. To perform brushing, he simply clicks the entity as it appears in its rectangle within the grid of entities or within the full text of the document. This causes all of the documents that contain the entity to be highlighted in the currently active highlight colour. Just as with search, multiple entities can be brushed at once, using the same or different highlighting colours, in order to compare the documents they are in. The clusters and their labels in the overview can also provide context because the analyst can see which clusters of documents contain the brushed entities. These two query methods can also show the number of documents in each analyst-created group containing the entities or search results. 3.4. Grouping documents The clusters in the overview of a document collection provide structure to the collection to help an analyst cope with a large document collection and determine which parts to investigate. After some analysis, the analyst may need a method of keeping track of interesting subsets of documents. In SZV, an analyst can create a new group from any subset of documents, which then will be kept together onscreen and can be visualized in different ways to see the combined set of entities (Figure 6) or the full text of each document (Figure 7). The analyst also can create Figure 5. A group’s document tab. a hierarchy of groups in order to further structure the document collection, based on any combination of the contained text, its usefulness to her, and its role in different hypotheses she may be pursuing in her analyses. Each grouping action is recorded in CZSaw’s script language, which allows for the same groups to be recreated during a later session or by other collaborating analysts. This recording allows an analyst to break a large document collection down into meaningful groups that a team of analysts can then investigate by assigning each group to one or more analysts. This approach was taken by some of the Simon Fraser University team for an award winning entry to the VAST Mini Challenge 1 in 2010 [1]. The Mini Challenge required a description of illegal arms dealing activity by country [5], so a group was created for each country. These groups were formed following a student’s analysis using searches, brushing of entities, and quick skimming of document text. Once this grouping process was completed, each group was more thoroughly investigated by another student from the team by running the previously recorded script within CZSaw. Thus, students took on a divide and conquer approach rather then analyzing the entire document collection individually. To perform grouping of a set of documents, an analyst draws a rectangle around the desired documents. After entering a name for the selection, the new group is created as shown (Figure 5). Groups can be moved as a unit or closed to hide documents from view. Each group has three tabs to show different aspects of contained documents. A document tab displays documents normally, i.e., the same as when ungrouped, and allows each to be zoomed. The other tabs allow browsing the contents of a group of documents using less effort, time and space than zooming into all of the grouped documents. These other visualizations of the group’s documents provide an advantage over In-Spire’s grouping mechanism which uses colour to display group membership. In addition, In-Spire cannot contain groups within other groups. The entity tab (Figure 6) displays the combined set of all entities within all of the group’s documents. This acts as a more compact and faster-to-access summary of the “who, where, and when” of the documents than zooming into all contained documents. Entities in the grid can be brushed. The text tab (Figure 7 ) allows an analyst to read the text of each document, one by one. Thus she can get the full details of the events described by this subset of documents. She can instantly sort the document list by date to read documents in the order they were created. Groups can be created for a variety of analytical tasks, ranging from gathering outliers in order to close it and hide them, to collecting documents that support a Figure 6. A group’s entity tab. Figure 7. A group’s text tab. particular hypothesis Regardless of the use, document membership within groups can be easily updated by dragging and dropping. This action updates all tabs of that group, keeping these group perspectives live. relevant documents. We wanted to know if placing document content within its context and making it accessible through a semantic zoom led to faster or more accurate performance. The study compared zoom and popup versions of the interface. This was a between-subjects study design so each participant was only trained to use one of the interfaces and thus could spend more time with it. The zoom condition consisted of a simplified version of SZV in which all control panels had been removed leaving only the main view panel. In both conditions, an overview layout and initial search were completed for the participants before each task and the participants could not perform their own search, layout, or grouping. The popup condition initially looked the same as the zoom condition, but double clicking document glyphs opened their contents in separate popup frames rather than zooming them (Figure 8). These popups were displayed in a layer above the unaltered overview, placed along the top of the view and covering that part of it. Participants could move and resize the popups to see what was under them and control how much space was used for the document contents. Instead of a using a semantic zoom, all document content, including the entity grid and full text, were displayed together in the popup. Brushing and linking of entities was the same in both conditions (Section 3.3). This alternative was chosen to be very similar to the zoom condition in all aspects except the ones we wished to test – the focus + context and the semantic zoom. For example, if we had instead displayed document content in a completely different window (as in In-Spire) this would have led to more differences between interfaces. Using completely different windows would leave the problem of how to inform the participant of which document’s content she was currently viewing, a problem addressed here using the lines connecting document glyphs to their content 4. Evaluation To determine if this new technique was beneficial for the analysis process and to identify issues that need to be addressed, an evaluation was performed comparing a simplified version of the new technique to an overview + detail version. This comparison allowed us to investigate a basic version of the technique before establishing all of the features needed for a future evaluation. For this evaluation, we compared the focus + context and semantic zoom techniques (“zoom”) to an alternative approach (“popup”) for simple analysis tasks consisting of opening documents, viewing their contents, and performing brushing to locate other Figure 8. A view of the popup condition interface immediately following brushing of an entity. (Figure 8). Twenty students (graduate and undergraduate) at Simon Fraser University participated in the study, ten per condition. Each participant was given a short training session (approximately 5 minutes) during which s/he was shown: 1 How to zoom into or open documents. 2 The meaning of the different zoom levels or popup parts, i.e., that entity rectangles were entities in the document. 3 A description of the layout of documents, e.g., how documents closer to each other were similar. 4 How to perform brushing. 5 How to select multiple documents and zoom into (“zoom”) or open (“popup”) them together. At the end of the training, we demonstrated how to solve two example tasks. For each task, the answer to the question asked could be found within the text of one or more documents. For each task, participants had to open documents after a search was done for her/him based on the cluster labels. For four of the task questions (1, 2, 3, and 5), the answer was directly within this first set of documents. For the other 6 tasks, questions had two parts. The answer to the first part was an entity that had to be brushed to highlight the documents that contained the answer to the second part. Task questions differed by the number of documents that were initially highlighted, the number of documents that contained the answer, whether the answer was an entity or not, and whether the answer required comparing multiple documents. Each task question involved a different subset of documents from the collection of 103 documents used for the VAST 2010 Mini Challenge 1 [5]. Below is an example of a two part question. 7a. All documents containing “Lashkar-eJhangvi” are highlighted. Find Maulana Haq Bukhari, who is a suspected leader of this terrorist group. 7b. There is a bank account suspected of being owned by him. What are the first 3 letters of the account? To correctly determine the answer to this question, participants had to open the initial highlighted set of seven documents, and then brush the Maulana Haq Bukhari person entity, which would highlight four of the initial documents plus two new (unopened) documents. Participants had to open one or both of these new documents in order to find the answer within. Fewer people in the popup condition answered question 7 correctly. The analysis of this question resulted in the only statistically significant difference between conditions: a one-sided Fisher’s exact test resulted in a p-value of 0.043 (see Figure 9). This was also the question with the largest difference in mean completion time between conditions with a mean (std. dev.) in seconds of 84 (26) for zoom and 155 (126) for popup. No significant differences in time were found. Other useful information from the evaluation resulted from feedback obtained on the use of the tools by directly observing the participants, recording the screen while they solved the tasks, and reading feedback from a post-study questionnaire given to them. By observing participants in the popup condition, it was clear that question 7 involved a problem that other questions did not, thus leading to wrong answers by participants who failed to locate the brushed documents. For question 7, the two new documents highlighted from brushing were located near the top of the view. For those participants that resized popups to see more of the text, these two documents were covered by the open popups. In this situation, it was necessary to move the popups, reduce their size, or close them to reveal the highlighted document glyphs behind. Participants who failed to do so answered the question incorrectly. This situation demonstrated the benefit of having the document contents embedded directly in the main view’s context instead of covering parts of the view. We can presume that the accuracy would have decreased further had the highlighted documents been closer to the top of the screen, where they would have been covered by any open popups by default. Question 5 was the only question where accuracy was worse for the zoom condition. In this question participants had to look for cluster labels matching parts of the question since they were not given an initial search. Once the correct cluster was found participants needed to zoom into at least one of the documents and brush the correct entity to determine how many documents it was in. The 3 wrong answers were the result of participants not taking this step, but Figure 9. Participant’s accuracy in answering the 10 questions, for both conditions. All zoom condition participants answered question 7 correctly, but only 6 popup participants answered it correctly. instead assuming the entity was in all the documents that they considered to be in the cluster and none of the ones outside it. This was a false assumption however, since entities can be shared across clusters. More research would be needed to determine if this mistake is at all related to the interface condition. Such qualitative results of the study were valuable for understanding how the SZV tool can be used and demonstrating strengths and weaknesses of both interfaces. The popup interface offered participants more freedom in resizing or moving the popups; however, this led to more hiding of the overview beneath. One participant covered almost half of the overview with open documents in order to read many at once. Two participants in the zoom condition wanted more freedom to move documents to compare them more easily (side by side), a feature available in the full SZV along with the grouping of documents. Participants in both conditions had minor problems opening documents. In the popup condition, participants confused clicking and double clicking for the opening and brushing actions, causing some frustration. In the zoom condition, four participants commented that the zoom was too slow and that not all the semantic zoom levels were useful to them. For example, they did not find the name of the document (level 2) useful. This level provides minimal information about a document that should be found more useful during a longer analysis process when the analyst may recognize the name from having seen it before. This study had some limitations. First, we are aware that the use of real analysts would have been more useful for assessing the new technique; but we still received valuable feedback that can be used to improve the tool before we put it in the hands of such analysts. Second, we used a contrived data set and questions with definite answers easily available in the data. This may be quite removed from analyst’s typical tasks; but it was necessary in order to measure accuracy and keep the tasks short and able to be done by students. Finally, participants may have been more successful at using the interfaces if we had given them time to play freely with the tool before they began the main tasks. 5. Future work The most useful results from the evaluation were from direct observations of participants, screen recordings while participants solved the tasks, and feedback from the post-study questionnaire. We will investigate alternative controls for the zooming within SZV because some participants considered the scroll wheel too slow. In addition, the use of the scroll wheel for zooming meant it could not be used for scrolling the text pane of a document (Figure 2). An alternative that could be compared to the original controls in a study is to use the left and right mouse buttons or arrow keys to move between semantic zoom levels. Visual analytics applications need to handle large data sets. SZV’s layout algorithm must be improved so that it can lay out thousands of documents at an acceptable speed. Currently, it cannot accomplish this because the force-directed layout it uses does not run in linear time. We need to investigate the use of faster algorithms that still make use of CZSaw’s document and entity model. We also must investigate methods for handling larger datasets than what can be displayed onscreen at once. The use of the group hierarchy will aid in solving this problem by displaying groups of documents rather than all individual documents. We also plan to push for access to actual analysts for future studies, who would give us valuable feedback by directly using the SZV tool on their own data, solving their own tasks. We plan to further explore the affordances of rapid lightweight organizational tools so that analysts can make incremental commitments to analytical findings. We intend to introduce more formal structures that can be easily attached to groups, so that as analysts develop more certainty about their findings, they can impose more structure that can be communicated to other team members, and can potentially be used by the rest of the system to enable semi-automated reasoning techniques. Semantic Zoom View will also be improved based on feedback from the initial evaluation in order to offer a more integrated and useful environment for the analysis of large text document collections. 6. Conclusion This paper has introduced a focus + context technique for providing an overview of a document collection with a semantic zoom into any subset of documents. It provides quick access to document contents, e.g., revealing full text and entities, through the semantic zoom. Analysts can rapidly find related documents by brushing entities, with no need to type in queries or change focus outside of one integrated view. To organize documents related to their hypotheses and structure the document collection, analysts can create new groups of documents, view their combined set of entities, or read the documents one by one. We view the role of the grouping mechanism as one of central importance in the act of applying analytical findings during the document analysis process. Our intent with this design was to provide a lightweight and highly flexible means of developing analytical findings of document relatedness. This grouping mechanism enables the analyst to impose a tentative order on a part of the document collection so that s/he can 1) focus attention on the document subset, 2) assert that the grouped documents are related, and 3) share that assertion with other members of the analysis team. As the analysis of a particular document set matures, such groups may be reworked to reflect the analyst’s improved understanding. Our grouping mechanism helps minimize the costs of such necessary refactoring. Our experience using SZV to solve the 2010 VAST challenge showed the rapid grouping and sharing afforded by this system enabled our geographically distributed team to conduct a shared analysis without excessive rebuilding of the work of other team members. 7. References [1] V. Chen, D. Dunsmuir, S. Alimadadi, E. Lee, J. Guenther, J. Dill, C. Qian, C.D. Shaw, M. Stone, and R. Woodbury. Model based Interactive Analysis of Interwoven, Imprecise Narratives. Proceedings of IEEE Symposium on Visual Analytics Science & Technology. pp. 275-276. 2010. [2] G. Collaud, J. Dill, C.V. Jones, and P. Tan. The Continuously Zoomed Web - A Graphical Navigation Aid for WWW. IEEE Visualization Late Breaking Hot Topics Papers, 1-3. 1996. [3] T.M.J. Fruchterman and E.M. Reingold. Graph Drawing by Force-directed Placement. Software: Practice and Experience, vol. 21, no. 11, pp. 1129-1164. 1991. [4] Google Maps. Accessed on March 18, 2011. http://maps.google.com/. 2011. [5] G. Grinstein, C. Plaisant, J. Scholtz, and M. Whiting. Text Records – Investigations into Arms Dealing. Visual Analytics Benchmark Repository: VAST Challenge 2010. Accessed on November 24, 2010. http://hcil.cs.umd.edu/localphp/hcil/vast/archive/task.php?ts_ id=148. [6] E. Hetzler, and A. Turner Analysis Experience using Information Visualization. IEEE Computer Graphics and Applications, vol. 24, no. 5, pp. 22-26. 2004. [7] N, Kadivar, V. Chen, D. Dunsmuir, E. Lee, C. Qian, J. Dill, C. Shaw, and R. Woodbury. Capturing and Supporting the Analysis Process. Proceedings of IEEE Visual Analytics Science & Technology, pp. 131-138. 2009. [8] H. Lam, B. Fisher, and J. Dill. A Pilot Study of CZTalk: A Graphical Tool for Collaborative Knowledge Work. Proceedings of the Hawaii International Conference on System Sciences. 2005. [9] E. Pietriga. A Toolkit for Addressing HCI Issues in Visual Language Environments, IEEE Symposium on Visual Languages and Human-Centric Computing, pp. 145-152. 2005. [10] J.S. Risch, D.B. Rex, S.T. Dowson, T.B. Walters, R.A. May, and B.D. Moon. The Starlight Information Visualization System. IEEE Proceedings of the Conference on Information Visualization, pp. 42-49. 1997. [11] D. Schaffer, Z. Zuo, S. Greenberg, L. Bartram, J. Dill, S. Dubs, and M. Roseman. Navigating Hierarchically Clustered Networks through Fisheye and Full-Zoom Methods. ACM Transactions on Computer-Human Interaction, vol. 3, no. 2, pp. 162-188. 1996. [12] J. Stasko, C. Görg, and Z. Liu. Jigsaw: Supporting Investigative Analysis through Interactive Visualization. Information Visualization. vol. 7, no. 2, 118-132. 2008. [13] M-A.D. Storey and H. Müller. Graph Layout Adjustment Strategies. Proceedings of the Symposium on Graph Drawing, vol. 1027. pp. 487-499. 1996. [14] J.J. Thomas, and K.A. Cook. Illuminating the Path. The Research and Development Agenda for Visual Analytics. IEEE. 2005. [15] J.H. Ward Jr. Hierarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association vol. 58, no. 301, pp. 236-244. 1963. [16] J.A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow Visualizing the Non-Visual: Spatial Analysis and Interaction with Information from Text Documents. IEEE Proceedings of Information Visualization. pp. 51-58. 1995.