Visual Analytics Agenda

advertisement
Visual Analytics
CS 4460/7450 - Information Visualization
March 16, 2010
John Stasko
Agenda
• Overview of what the term means and
how it relates to information visualization
• Some example VA research projects
• Specific example, Jigsaw, helping
investigative analysis
Spring 2010
CS 4460/7450
2
1
Acknowledgment
Slides looking like this provided
courtesy of Jim Thomas
CS 4460/7450
3
Visual Analytics
• A new term for something that is familiar
to all of us
• Informal description:
Using visual representations to help make
decisions
Spring 2010
CS 4460/7450
4
2
VA vs. InfoVis
• Compare & contrast
Spring 2010
CS 4460/7450
5
Visual Analytics Definition
Visual analytics is the science of analytical reasoning facilitated
by interactive visual interfaces.
People use visual analytics tools and techniques to
 Synthesize information and derive insight from massive,
dynamic, ambiguous, and often conflicting data
 Detect the expected and discover the unexpected
 Provide timely, defensible, and understandable assessments
 Communicate assessment effectively for action.
CS 4460/7450
Thomas & Cook
2005
“The beginning of knowledge is the discovery of
something we do not understand.”
~Frank Herbert (1920 - 1986)
#
3
Visual Analytics
• Not really an “area” per se
More of an “umbrella” notion
• Combines multiple areas or disciplines
• Ultimately about using data to improve
our knowledge and help make decisions
Spring 2010
CS 4460/7450
7
Alternate Definition
• Visual analytics combines automated
analysis techniques with interactive
visualizations for an effective
understanding, reasoning and decision
making on the basis of very large and
complex data sets
Keim et al, chapter in
Information Visualization: Human-Centered
Issues and Perspectives, 2008
Spring 2010
CS 4460/7450
8
4
Synergy
• Combine strengths of both human and
electronic data processing
Gives a semi-automated analytical process
Use strengths from each
From Keim
Spring 2010
CS 4460/7450
9
Main Components
Interactive
visualization
Analytical
reasoning
Spring 2010
Computational
analysis
CS 4460/7450
10
5
Back to InfoVis Comparison
• Clearly much overlap
• Perhaps fair to say that infovis hasn‟t
always focused on analysis tasks so much
and that it doesn‟t always include
advanced data analysis algorithms
Not a criticism, just not focus
InfoVis has a more narrow scope
(Some of us actually believe that infovis
has/should include those topics)
Spring 2010
CS 4460/7450
11
Academic Context
Visual
Analytics
~2005
Art-y casual
infovis
Spring 2010
Analytical reasoning
Computational analysis
Information
Visualization
~1990
CS 4460/7450
12
6
Visual Analytics
• Encompassing, integrated approach to
data analysis
Use computational algorithms where helpful
Use human-directed visual exploration where
helpful
Not just “Apply A, then apply B” though
Integrate the two tightly
Spring 2010
CS 4460/7450
13
VA-related Areas
• Visualization
InfoVis & SciVis
• Data management
Databases, information retrieval
• Data Analysis
Knowledge discovery, data mining
• Perception & cognition
• Human-computer interaction
Spring 2010
CS 4460/7450
14
7
Visual Analytics
Partnership Disciplines
•
•
•
•
Statistics, data representation and statistical graphics
Geospatial and Temporal Sciences
Applied Mathematics
Knowledge representation, management and
discovery
– Ontology, semantics, NLP, extraction, synthesis, …
• Cognitive and Perceptual Sciences
• Comunications: Capture, Illustrate and present a
message
• Decision sciences
• Information and Scientific Visualization
And far more than homeland security
CS 4460/7450
15
Multiple Techniques Contribute to Threat Assessment
Prediction
Visual Analytics
Cognition
Graph Matching
Analysis
Pattern Analysis
Evidence Extraction
Organization
Synthesis
Content
Management
Link Discovery
Integration
Connect the
Dots
Extraction
Aggregation
Data
CS 4460/7450
Information
Knowledge
16
8
Uses Today
•
•
•
•
•
•
•
Scientific Research
Regulatory and Legal Communities
Intelligence Analysis
DOE and DOD
Market Assessments
Capability Analysis - Resumes
Medical and Pharmaceutical
Communities
• National Security and Law Enforcement
• Information Assurance, Web Analytics
• Technology Scanning, Asset and
Intellectual Property Management
CS 4460/7450
17
Capabilities Desired
•
Reduce the threat of terrorism through the invention, development, evaluation, and
deployment of technology to analyze masses of data in different formats and types, from
different sources, with highly varying degrees of confidence levels, within time frames
required for rapid decision making.
•
Better understand the risks and vulnerabilities of our critical infrastructures, trade,
ports, and immigration by combining sensor, computational and visual analytics
technologies for in-the-field and strategic decision making.
•
Enable rapid visual communication technology for response teams for clear
understanding of the situation assessment and alternate options for response with
geospatial, and multi-jurisdictional situations for WME and natural disasters.
•
Ensure effective information communication methods and technologies throughout DHS
missions of analysis, risk, levels of alerts, and response, in unwrappable levels of
assessment with evidence and communication styles aimed within audience-centric
applications for rapid understanding and action.
•
Provide an enduring talent base of educated professionals supporting future
developments requiring visual communication of integrated information and
operational support missions.
CS 4460/7450
18
9
Research Agenda
• Available at
http://nvac.pnl.gov/ in PDF
form
• At IEEE Press in book form
• Special thanks to IEEE
Technical Committee on
Visualization and Graphics
CS 4460/7450
19
Overview of the R&D Agenda
• Challenges
• Science of Analytical
Reasoning
• Science of Visual
Representations
and Interactions
• Data Representations
and Transformations
• Production, Presentation, and
Dissemination
• Moving Research Into
Practice
• Positioning for an Enduring
Success
CS 4460/7450
20
10
Projects
• Let‟s look at some of the research projects
in this area
Spring 2010
CS 4460/7450
21
IN-SPIRE™ Visual Document Analysis
A “Thinking Aid” for advanced investigation of unstructured text
Uncovers Common Topics in
Large Document Collections
Enganging Displays for Exploration
Multiple Query and Search Tools
Supports Real-Time Streaming Data
Compatible with Foreign Languages
Shows Trends over Time
CS 4460/7450
http://in-spire.pnl.gov
Video
22
11
HARVEST
Combined user-system
synthesis of knowledge
Users annotate and
refine existing
knowledge
Video
Spring 2010
Gotz, Zhou & Aggarwal
VAST „06
CS 4460/7450
D-DUPE
23
Video
System for entity resolution
in large networks such as
bibliographic collections
System does computational
analysis and provides
suggestions and user can
augment and correct
Spring 2010
CS 4460/7450
Bilgic et al
VAST „06
24
12
Other Examples
CS 4460/7450
25
Many Others
• A number of nice examples shown earlier
on Graph & Network visualization day
Wong: Graph Signatures
Perer: Social Action
etc.
Spring 2010
CS 4460/7450
26
13
Application Area
• Investigative & Intelligence Analysis
Gather information from various sources then
analyze and reason about what you find and
know
Analyze situations, understand the
particulars, anticipate what may happen
Spring 2010
CS 4460/7450
27
Definitions
• Thinking1 - or reasoning - involves objectively connecting present beliefs with
evidence in order to believe something else
• Critical Thinking1 is a deliberate meta-cognitive(thinking about thinking) thinking
act whereby a person reflects on the quality of the reasoning process
simultaneously while reasoning to a conclusion.
• Intelligence1 is a specialized form of knowledge, an activity, and an
organization. As knowledge, intelligence informs leaders, uniquely aiding their
judgment and decision-making. …
1.
Critical Thinking and Intelligence Analysis: David Moore
CS 4460/7450
28
14
Critical Thinking*
“…the quality of our life and that of what we produce, make, or
build depends precisely on the quality of our thoughts.”
Elements of thought:
Points of View
Implications &
Consequences
Purpose of
the Thinking
Question at Issue
Assumptions
Concepts
Information
Interpretation
And Inference
* Foundations of Critical Thinking www.criticalthinking.org
CS 4460/7450
29
Example: Heuer’s Central Ideas
• “Tools and techniques that
gear the analyst’s mind to
apply higher levels of
critical thinking can
substantially improve
analysis… structuring
information, challenging
assumptions, and
exploring alternative
interpretations.”
CS 4460/7450
30
15
Investigative Process
Spring 2010
CS 4460/7450
Pirolli & Card
Intl Conf Intelligence Analysis „05
31
Pain Points
• Cost structure of scanning and selecting
items for further attention
• Analysts‟ span of attention for evidence
and hypotheses
Spring 2010
CS 4460/7450
32
16
HW 5 – Investigative Analysis
• Did you encounter those pain points?
• Discuss
How did you work on the problem?
What were the main challenges?
Spring 2010
CS 4460/7450
33
Jigsaw
• Visualization for Investigative Analysis
across Document Collections
Targeting analysis
Law enforcement
Fraud detection
Intelligence community
“Putting the pieces together”
Spring 2010
CS 4460/7450
34
17
Problem Addressed
• Help investigative analysts discover plans,
plots and threats embedded across the
individual documents in large document
collections
DBs
Spring 2010
Documents/
case reports
Blogs
CS 4460/7450
35
Document Focus
• Unstructured (plain) text documents
• Semi-structured documents such as
spreadsheets
Spring 2010
CS 4460/7450
Examples
36
18
Our Focus
• Entities within the documents
Person, place, organization, phone number,
date, license plate, etc.
• Thesis: A plot/threat within the
documents will involve a set of entities in
coordination
Spring 2010
CS 4460/7450
Doc 1
John Smith
37
Doc 2
Doc 3
June 1, 2009
Mary Wilson
Atlanta
Boston
PETA
Bill Jones
Spring 2010
CS 4460/7450
38
19
Entity Identification
• Must identify and extract entities from
plain text documents
Crucial for our work
• Not our main research focus – Collaborate
with or use tools from others
Spring 2010
CS 4460/7450
39
Example Document
Spring 2010
CS 4460/7450
40
20
Entities Identified
Spring 2010
CS 4460/7450
41
Connections
• Entities relate/connect to each other to
make a larger “story”
• Connection definition:
Two entities are connected if they appear in a
document together
The more documents they appear in
together, the stronger the connection
Spring 2010
CS 4460/7450
42
21
Jigsaw
“Putting the pieces together”
• Multiple visualizations (views) of
documents, entities, & their connections
• Views are highly interactive and
coordinated
• User actions generate events
that are transmitted to and
(possibly) reflected in other
views
Spring 2010
CS 4460/7450
43
CS 4460/7450
44
System Views
Spring 2010
22
The Need for Pixels
45
Even More!
Spring 2010
CS 4460/7450
46
23
Even More!
Spring 2010
CS 4460/7450
47
Demo
Spring 2010
CS 4460/7450
48
24
Console
Spring 2010
CS 4460/7450
49
Document View
Spring 2010
CS 4460/7450
50
25
List View
Spring 2010
CS 4460/7450
51
CS 4460/7450
52
Graph View
Spring 2010
26
Report Cluster View
Spring 2010
CS 4460/7450
53
Entity Connection View
Spring 2010
CS 4460/7450
54
27
WordTree View
Spring 2010
CS 4460/7450
55
Calendar View
Spring 2010
CS 4460/7450
56
28
Document Grid View
Spring 2010
CS 4460/7450
57
Scatterplot View
Spring 2010
CS 4460/7450
58
29
Timeline View
Spring 2010
CS 4460/7450
59
Room to Improve
• What Jigsaw doesn‟t do so well now
The end-part of the Pirolli-Card model
Helping the analyst take notes, organize evidence,
generate hypotheses, etc.
(but we‟re working on it)
Sometimes called “evidence marshalling”
Others have focused more on that aspect…
Spring 2010
CS 4460/7450
60
30
i2’s Analyst Notebook
Spring 2010
CS 4460/7450
61
Analyst’s Notebook
• Leading commercial tool in this space (law
enforcement and intelligence agencies)
• Large zooming workspace where analyst
creates networks of entities and notes
• Often used to produce presentation or
story of analysis done
Spring 2010
CS 4460/7450
62
31
Oculus’ Sandbox
Video
Wright et al
CHI „06
Spring 2010
CS 4460/7450
63
Sandbox
• Flexible space for inserting text and
graphics
• Objects can be dragged-and-dropped
from their other analysis tools
• Flexible level of detail
• Flexible gestures for making space,
inserting, etc.
• Assertions with evidence gates
• Reasoning templates
Spring 2010
CS 4460/7450
64
32
PARC’s Entity Workspace
Video
Bier, Card & Bodnar
VAST „08
Spring 2010
CS 4460/7450
65
Entity Workspace
• Tools for rapid ingest of entities from
documents
• Can snap together entities into groups
• Can indicate level of interest in objects
• Four main view panels, with zooming UI
Spring 2010
CS 4460/7450
66
33
Jigsaw Future Work
• Collaborative capabilities
• Improved evidence
marshalling
• Present/browse
investigation history
• Scalability upward
• Web document ingest
• Implement network
algorithms
• Document similarity
• Sentiment analysis
Spring 2010
•
•
•
•
•
•
•
•
•
Wikipedia & Intellipedia
Geospatial view
Better timeline capabilities
Document clustering by
theme/topic
Reliability/uncertainty
Other types of data
Active crawling/RSS ingest
Try it on display wall
Deployment to real clients
CS 4460/7450
67
Related Area of Interest
• Sensemaking
• A general term that has been used in a
number of different contexts
E.g., How large corporations make decisions
• To me, ultimately about people working
with data and information to understand it
better
Spring 2010
CS 4460/7450
68
34
Sensemaking
Nice definition:
“A motivated , continuous effort to understand
connections (which can be among people,
places, and events) in order to anticipate
their trajectories and act effectively.”
– Klein, Moon and Hoffman
IEEE Intelligent Systems „06
Spring 2010
CS 4460/7450
69
Alternate Definition
“The process of creating situation awareness in
situations of uncertainty”
– D. Leedom, ‟01 SM Symp. Report
Situation awareness:
“It‟s knowing what‟s going on so you know what to do”
– B. McGuinness, quoting an Air Force pilot
Spring 2010
CS 4460/7450
70
35
Chang et al
WireVis
Information Visualization „08
• Another VA investigative analysis project
• Helping Bank of America examine wire
transfers of money
• Want to detect fraud and illegal actions
Thanks to R. Chang
for some slide content
Spring 2010
CS 4460/7450
71
Particulars
• Who – Bank analysts
• Problem – Detect money laundering and
fraud in wire transfers of money
• Data – Electronic records of wire
transactions and information associated
with each
Spring 2010
CS 4460/7450
72
36
Background
• Wire transfers of money can be complex
Have a “from” and “to” but often many
“middlemen
May not know who intermediaries are
• Millions of transfers per day occur
Vast majority are legal
• Bank has legal responsibility to report
suspicious activities
Spring 2010
CS 4460/7450
73
Data
• Each transaction:
Money amount
Payer (could be third party)
Payee (could be an agent)
Potential intermediaries
Addresses of payer and payee, instructions,
additional comments are optional
Spring 2010
CS 4460/7450
74
37
Challenges
London
Singapore
Charlotte, NC
Indonesia
No Standard Form…
When a wire leaves Bank of America in Charlotte…
The recipient can appear as if receiving at London, Indonesia or
Singapore
Vice versa, if receiving from Indonesia to Charlotte
The sender can appear as if originating from London, Singapore,
or Indonesia
Spring 2010
CS 4460/7450
75
Challenges
• Scale: BoA may do 200k transfers per day
• No international standard: loosely
structured data
• Bad guys are smart and one step ahead
Detection tools are always reactive
Spring 2010
CS 4460/7450
76
38
Existing Detection
• Examine for certain temporal patterns of
activity
• Look for keywords in free text
Filter transactions based on these highly
secretive words
Typically a few hundred
Updated based on intelligence reports
Spring 2010
CS 4460/7450
77
Current Practices
• Load transactions into large relational DB
• Download some amount to spreadsheets
via filtering based on keywords, amounts,
dates, …
• Can only look at a week or two this way
• Difficult to notice temporal patterns
Spring 2010
CS 4460/7450
78
39
System Overview
Heatmap View
(Accounts to Keywords
Relationship)
Strings and Beads
(Relationships over Time)
Spring 2010
Search by
Example (Find
Similar
Accounts)
Keyword
Network
(Keyword
Relationships)
CS 4460/7450
79
Heatmap View
 List of Keywords
 Sorted by frequency from high to low (left
to right)
 Hierarchical
clusters of
accounts
 Sorted by
activities from big
companies to
individuals (top to
bottom)
 Fast “binning”
that takes O(3n)
Spring 2010
 Number of occurrences of keywords
 Light color indicates few occurrences
CS 4460/7450
80
40
Strings and Beads
 Each string corresponds to a cluster of accounts
in the Heatmap view
 Each bead represents a day
 Y-axis can
be amounts,
number of
transactions,
etc.
 Fixed or
logarithmic
scale
Spring 2010
CS 4460/7450
 Time
81
Keyword Network
• Each dot is a keyword
• Position of the keyword is
based on their relationships
Keywords close to each other
appear together more
frequently
Using a spring network,
keywords in the center are
the most frequently occurring
keyword
• Link between keywords
denote co-occurrence
Spring 2010
CS 4460/7450
82
41
Search By Example
 Target Account
 Histogram depicts
the occurrences
of keywords
 User interactively
selects features
within the
histogram used in
comparison
 Accounts that
are within the
similarity
threshold
appear ranked
(most similar on
top)
 Similarity threshold slider
Spring 2010
CS 4460/7450
83
CS 4460/7450
84
Video
Spring 2010
42
This Topic
• It‟s what I do now :^)
• A number of the group projects are really
in this space
• Interested in getting more work in this
area started
Spring 2010
CS 4460/7450
85
Upcoming
• Time Series Data
Reading:
Aigner et al
• Spring Break – woo woo
• Interaction
Reading:
Yi et al
Spring 2010
CS 4460/7450
86
43
Download