Wise, Rennison

advertisement
Document Visualization
“Visualizing the Non-Visual:
Spatial Analysis and Interaction with
Information from Text Documents”
J. A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M.
Pottier, A. Schur, and V. Crow
Proceedings of Infoviz’95
Reviewed by
Nada Golmie
for CMSC 838S
Fall 1999
Outline
• Document visualization:
– What? Why? How?
• Examples for 1D, 2D visualizations:
– vector space analysis (Salton 1995)
– reduced text + interaction (Eick, 1992)
– 2D maps of document collections (Lin, 1992)
• 3D Visualization: SPIRE
(Wise et. al. 1995)
• 3D + Time: Interactive Landscapes
(Rennison, 1994)
6/27/2016
Document Visualization
2
Document Visualization
• Document visualization is an important IV
application due to emerging technology trends:
– World Wide Web
– Digital Libraries
– Communication Advances
• Mapping a text document:
– Understand the content of a document.
• Mapping a collection of documents:
– Discover relationships among documents.
6/27/2016
Document Visualization
3
Vector Space Analysis (Salton et. al.)
• Support of free-form text queries in IR.
• Text passages are mapped into a vector of
terms in high dimensional space:
Di  (d i1, di2 ,..., dik )
where dik is the weighted assigned to term k in
document Di .
• Given document Di and queryQ j a similarity
computation is computed as:
t
sim(D i , Q j )   d ikd jk
k 1
6/27/2016
Document Visualization
4
Reduced Text + Interaction:
SeeSoft (Eick, 1992)
• Reduced representation
– display of lines as rows, files as columns
(max 900 rows per column)
• Colors are used to display statistics
– statistics include: age, programmer, feature, type of
line, number of times the line was executed
• Direct manipulation techniques
– find interesting patterns
– capability to read actual code using magnification
6/27/2016
Document Visualization
5
SeeSoft (Eick, 1992)
6/27/2016
Document Visualization
6
2D Maps (Lin, 1992)
• Framework for information retrieval:
– mapping of high dimensional document space into 2D
map.
– document relationships are explored using visual cues
such as: dots, links, clusters, and areas.
• Neural network self-organizing learning algorithm
based on Kohonen’s feature map:
– preserves distance relationships between input data.
– allocates different numbers of nodes to inputs based on
their occurrence frequencies.
• Sitemap
6/27/2016
Document Visualization
7
Visual Text Analysis: SPIRE
SPIRE (Spatial Paradigm for Information
Retrieval and Exploration) is a software that
allows users:
– to explore complex relationships between text
documents.
– to rapidly discover known and hidden information
relationships by reading only the pertinent
documents rather than wading through large
volumes of text.
6/27/2016
Document Visualization
8
Applications
• SPIRE was originally developed for the U.S.
intelligence community.
• Other potential applications include:
–
–
–
–
–
environmental assessment
market analysis
corporations researching competitive products,
health care providers searching patient records,
or attorneys reading through previous cases.
6/27/2016
Document Visualization
9
2D Scatterplot: Galaxies
• Galaxies computes word similarities and
patterns in documents and then displays the
documents on a computer screen to look like
a universe of "docustars”:
– closely related documents will cluster together in
a tight group.
– unrelated documents will be separated by large
spaces.
6/27/2016
Document Visualization
10
Galaxies
6/27/2016
Document Visualization
11
3D Landscapes: Themescapes
• Themes within the document spaces appear
on the computer screen as a relief map of
natural terrain:
– mountains in Themescapes indicate where
themes are dominant;
– valleys indicate weak themes.
– shapes reflect how the thematic information is
distributed and relate across documents.
• Themes close in content will be close
visually based on the many relationships
within the text spaces.
6/27/2016
Document Visualization
12
Themescapes
6/27/2016
Document Visualization
13
Visualization Transformations
• Definition of text: written form of natural language.
• Text conversion to spatial form: algorithms & processes.
• Meaningful visualizations: mathematical procedures and analytical
measures.
• Database management:store and manage text and its derivative
forms.
6/27/2016
Document Visualization
14
Processing Text Requirements
• Identification and extraction of text features:
– frequency-based measures of words
– higher order statistics taken on words: occurrence,
frequency, context of individual words are used to
characterize defined word classes.
– Semantic approaches using natural language
understanding.
• Efficient and flexible representation of documents
in terms of text features.
• Support of information retrieval and visualization.
6/27/2016
Document Visualization
15
Visual Output of Text Processing
• Vector representation of document in high
dimensional feature space.
– Comparisons, filters, transformations can be applied
• Projection onto 2-3D visualization
– dimensionality reduction
– scaling
– clustering in high dimension feature space and
centroids of clusters are fed into layout algorithms
(principal component analysis or multidimensional
scaling)
6/27/2016
Document Visualization
16
Interface Design
• Three display types:
– Backdrop: central display resource.
– Workshop: grid with resizable windows to hold
multiple views.
– Chronicle: space where views are placed and linked to
form a visual story.
• Tools provided to allow more in-depth analysis:
point and click, grouping, annotation, query,
subset, temporal slicing.
6/27/2016
Document Visualization
17
Screenshot
6/27/2016
Document Visualization
18
Favorite Sentences
“The bottleneck in the human processing and
understanding of information in large amounts
of text can be overcome if the text is
spatialized in a manner that takes advantage of
common powers of perception.”
“So much has already been written about
everything that you can’t find out anything
about it”. James Thurber (1961).
6/27/2016
Document Visualization
19
Contributions
• Effective use of physical metaphors such as
night sky and landscape to provide overview
visualization on the collection of documents:
– helps answer simple questions about the database
• Discussion on processing text for visualization.
• Platform includes integrated tools and
techniques for text manipulation and analysis.
6/27/2016
Document Visualization
20
Critique
• How to measure the effectiveness of the
visualization in discovering relationships and
answering detailed questions about the documents:
– may depend on the ease of interaction
– need to verify claim: “discovering in 35 minutes what
would have taken two weeks otherwise”.
• There could be cluttering and occlusion resulting
from layout algorithms (complex for large
collections of documents)
• Clustering may reduce feature sensitivity from
individual documents.
6/27/2016
Document Visualization
21
Other Comments
• Agree with the need to create visual tools to aid
cognitive skills, however skeptical about
statement:
“And the limitations of Information Age will not be
set by the speed with which human mind can read”:
• Paper contains too many sound biting sentences
and buzz words which could be distracting:
“fluid environment for reflective cognition and
higher-order thought”
6/27/2016
Document Visualization
22
Galaxy of News:
Interactive Landscapes
•
•
•
•
•
Parse content to extract key information
Build an associative relation network
Classify elements into hierarchies
Sort peer elements spatially and temporally
Construct visual information space based on
classified elements
• Dynamic response to visual interaction
6/27/2016
Document Visualization
23
Galaxy of News:
Summary
• Use of motion to visualize relationships among
documents.
• Documents have no fixed position in space
– associative relation network built dynamically
– fixed positioning of categories
• Space constructed is based on conceptual
abstract metaphors (galaxies) and could have
any dimensions.
6/27/2016
Document Visualization
24
Download