TextIn3D(Presentation)

advertisement
Visualizating the Non-Visual:
Spatial Analysis and Interaction with
Information from Text Documents
J.A. Wise, J.J. Thomas, K. Pennock,
D. Lantrip, M. Pottier, A. Schur, and V. Crow
Bongshin Lee
April 4, 2001
Paper Outline
• Introduction
• Visualizing text
• Visualization transformations: from text to
pictures
• Examples from the MVAB Project
• Conclusions and directions for future
research and development
April 4, 2001
CMSC838b Information Visualization
2
Introduction
• Current Visualization approaches
– For visualizing mostly structured and/or hierarchical
information
• Some research in information retrieval
– Utilized graph theory or figural display
– Information returned is documents in text form
• Users still have to read
• Causes a severe upper limit
• Open Source digital information
– Available text overwhelms the traditional reading
methods of inspection, sift and synthesis
April 4, 2001
CMSC838b Information Visualization
3
Visualizing text
• True text visualizations
– Must represent textual content and meaning without the
user having to read it
– Result from content abstraction and spatialization of the
text document
• Use primarily preattentive, parallel processing
powers of visual perception
• Goal is to spatially transform text information into
a new visual representation
April 4, 2001
CMSC838b Information Visualization
4
Visualization transformations:
from text to pictures
• Four important technical considerations
– Clear definition of text
• what comprises text
• how it can be distinguished from other symbolic
representations
– Way to transform raw text into a different visual form
– Foundation for meaningful visualization
• Suitable mathematical procedures and analytical measures
– A database management system
April 4, 2001
CMSC838b Information Visualization
5
Processing Text
• Requirements of text processing engine
– Identification and extraction of text features
• Frequency-based measures on words
• Higher order statistics taken on the words
• Semantic in nature
– Efficient and flexible representation of documents in
terms of these text features
– Support for information retrieval and visualization
• Pre-process, indexing
April 4, 2001
CMSC838b Information Visualization
6
Visualizing output
from text processing
• Representing the document
– a vector in high dimensional feature space
• Comparisons, filters, and transformations can be applied
• Clustering using the normalized document vectors
• Projection
– Principal Components Analysis
– Multi-Dimensional Scaling
– Exponential order of complexity
• Clustering in the high-dimensional feature space
• Visualize the cluster centroids
April 4, 2001
CMSC838b Information Visualization
7
Managing the representation
• Two basic classes of data
– Raw text files
• Static in nature, Simple in structure
• Easy to manage
– Visual forms of the text
• Extensive and dynamic
• Object-Oriented Database
– Flexibility of data representation
– Power of inheritance
– Ease of data access
April 4, 2001
CMSC838b Information Visualization
8
Interface design
for text visualization
• Backdrop
– Central display resource
• Workshop
– Grid having resizable windows to hold multiple
views
• Chronicle
– Area where views are placed and linked to form
a visual story
April 4, 2001
CMSC838b Information Visualization
9
Examples
from the MVAB Project
• MVAB
– Multidimensional Visualization and Advanced
Browsing Project
– Visualization and analysis of textual information
– Showcased in SPIRE
• SPIRE
– Spatial Paradigm for Information Retrieval and
Exploration
• Starfields and Topographical maps metaphors
– Galaxies and Themescapes
April 4, 2001
CMSC838b Information Visualization
10
Galaxies
• Displays cluster and document
interrelatedness
• 2D scatterplot of ‘docupoints’
• Simple point and click exploration
• Sophisticated tools
– Facilitate more in-depth analysis
– Ex) temporal slicer
April 4, 2001
CMSC838b Information Visualization
11
Galaxies Screen Shot
April 4, 2001
CMSC838b Information Visualization
12
ThemeScapes
• Abstract, 3D landscapes of information
• Convey relevant information about topic or
themes without the cognitive load
• Spatial relationships reveal the intricate
interconnection of thems
April 4, 2001
CMSC838b Information Visualization
13
ThemScapes - Advantages
• Displays much of the complex content of
the document database
• Utilizes innate human abilities for pattern
recognition and spatial reasoning
• Communicative invariance across levels of
textual scale
• Promote analysis
April 4, 2001
CMSC838b Information Visualization
14
ThemeScapes Screen Shot
April 4, 2001
CMSC838b Information Visualization
15
Conclusions
• Text visualizations can overcome much of the user
limitations
– Enhanced insight and time savings (35 mins vs 2 weeks)
– Creative with the tool
• Querying and analytical manipulation come
together in a single visualization
– Permits a different kinds of querying
• Text visualizations will have to access and utilize
the cognitive and visual processes
April 4, 2001
CMSC838b Information Visualization
16
Directions for Future R & D
• Visual Data Analysis
• Elaborate the visual metaphors
• Addition of sensory modalities
– Virtual interaction
April 4, 2001
CMSC838b Information Visualization
17
My Favorite Sentence
The bottleneck in the human processing and
understanding of information in large
amounts of text can be overcome if the text
is specialized in a manner that takes
advantage of common powers of perception.
April 4, 2001
CMSC838b Information Visualization
18
Contributions
• Explorations of new visualizations
• Discussion of the process for mapping Raw
Data Document collections into
visualizations
April 4, 2001
CMSC838b Information Visualization
19
Notes on the Reference
• Designing Interaction: Psychology at the
Human Computer Interaction
• Interfaces Issues and Interaction Strategies
for Information Retrieval Systems
• Clustering and Dimensionality Reduction in
SPIRE
April 4, 2001
CMSC838b Information Visualization
20
Critique
– Strengths and Weaknesses
• Strengths
– Provide natural visual metaphors
– Enable the users to see the relationships
between documents with minimal required
reading
• Weaknesses
– No validation of some conclusions
April 4, 2001
CMSC838b Information Visualization
21
What has happened to this topic?
• 1996 R&D 100 Award
• OCSB
– On-line Citation Searching and Browsing in UMD
• "ThemeScape" is now a trademarked term of
Cartia, Inc.
• WebThemeTM
– an interactive tool that provides a visual display of the
common themes in collections of web-based documents
April 4, 2001
CMSC838b Information Visualization
22
WebTheme Screen Shot
April 4, 2001
CMSC838b Information Visualization
23
Download