Visualizing the Non-Visual Spatial Analysis and Interaction with Information from Text Documents Wise, Thomas, Pennock, Lantrip, Pottier, Schur, and Crow Presented By: Cyntrica Eaton Presentation Overview Paper Description Contributions Current State Critique References Paper Description Motivation Approach Visualization Paradigms Galaxies Themescapes MVAB Multidimensional Visualization and Advanced Browsing Project Researchers at the Pacific Northwest National Laboratories were interested in solving the problem of information overload for Intelligence analysts. Motivation Modern information technologies have contributed to an increased availability of information. Accompanying the increasing quantity of available information is a subsequently decreasing quantity of time to locate and absorb it. The ability to overview large document corpora and get information without the heavy cognitive processes involved in language processing will improve the search process. Approach Problem of processing large amounts of text can be solved if text is spatialized in manner that takes advantage of human perceptual abilities. Visual processing take place in parallel on the retinal level and is: Relatively effortless Exceptionally fast Not additive to cognitive workload Approach Transform text into visualizations that: Communicate through images instead of prose. Preserve information characteristics from documents. Represent textual content and meaning without the need to read it in the normal manner. Reveal thematic patterns and relationships between documents in ways in which the natural world is perceived. SPIRE Spatial Paradigm for Information Retrieval and Exploration Developed to facilitate the browsing and selection of documents from large corpora Two major approaches: Galaxies Themescapes Galaxies and Themescapes Display metaphor rationale: Each paradigm offers a rich variety of cognitive spatial affordances that naturally address the problems of text visualization. Spatial perceptual mechanisms that operate on the real world will respond analogously to synthetic cues. Paradigm Overviews Galaxies Point clusters suggest patterns of interest Themescapes Topographies of peaks and valleys that can easily be detected based on contour patterns. Paradigm Overviews Both allow for overview + detail without a change of view. Each view offers a different perspective of the same information. Galaxies Two-dimensional scatterplot of ‘docupoints’ that appear like stars in the night sky. Computes word similarities and patterns in documents and communicates similarity via proximity. Provides a first cut at sifting through information and determining how the contents of a document base are related. Types Treatment Case Studies ….. Types Treatment Case Studies ….. Themescapes Three-dimensional relief map of themes within the document corpora themes. Complex surfaces convey information about topics or themes found within the corpus without cognitive load of reading Terrain simultaneously communicates: Primary themes of an arbitrarily large collection of documents. Measure of relevance in the corpus. Similarity of themes. Themescapes Glance provides visual thematic summary of the entire corpus Elevation: Theme strength Shapes: Information distribution Proximity: Content Similarity Themescapes Utilizes human abilities for pattern recognition and spatial reasoning Employs communicative invariance across levels of textual scale Entire document corpus Cluster of documents Individual documents Summarization Reading is a slow, serial process of mentally encoding a document. Text visualizations can overcome much of the user limitations that result from accessing and trying to read from large document bases. Summarization Visual cues can offer readers a way to employ their primarily preattentive, parallel processing powers of visual perception. Galaxy and landscape metaphors allow the cognitive and visual processes that enable our spatial interactions with the natural world to be applied to the search process. Contributions Prior visualization approaches offered methods for visualization of structured, hierarchical text. Free text visualization was relatively unexamined. MVAB Project produced novel methods for interaction with large amounts of text. Current Project Status Correlation Tool WebTheme ThemeRiver Rainbow Love Tybalt Romeo Caesar Critique The visualization paradigms were discussed in a straight-forward manner. There was, however, a deficiency of example figure explanations. My Favorite Sentence [The] perceptual processes involved are the results of millions of years of selective mammalian and primate evolution, and have become biologically tuned to seeing in the natural world. References Information Retrieval Information Visualization Visualizing the Non-Visual Spatial Analysis and Interaction with Information from Text Documents Questions? Technical Considerations Clear definition of text Way to transform text into a different visual form that retains high dimensional invariants of natural language. Suitable mathematical procedures and analytical measures must be defined as the foundation of the visualizations Database management system must be designed to store and manage text Technical Considerations Way to transform text into a different visual form that retains high dimensional invariants of natural language. Text has statistical and semantic attributes such as frequency and context and combination of words in themes and topics Differences between texts statistical and semantic compositions provide much of opportunity for text visualizations described in this paper. Approach A set of measures which characterize the text in meaningful ways provide for multiple perspective of documents and their relationships to one another. One measure is similarity Based on occurrences and context of key words or other extracted features measure of similarity can be computed that reflect relatedness between documents. In a visualization, similarity can be shown as proximity or congruity to form.