Introduction

advertisement
Lecture 01:
Introduction
September 5, 2012
COMP 250-2
Visual Analytics and Provenance
Motivation:
What is in a User’s Interactions?
Keyboard, Mouse, etc
Input
Visualization
Human
Output
Images (monitor)
Types of Human-Visualization Interactions
Text editing (input heavy, little output)
Browsing, watching a movie (output heavy, little input)
Visual Analysis (closer to 50-50)
Provenance: Definitions
• Provenance (according to Webster):
1. origin, source
2. the history of ownership of a valued object or
work of art or literature
• Example: Has anyone traced
the provenances of these paintings?
Provenance: Computer Science
• Examples in CS relating to provenance:
•
•
•
•
•
•
•
•
•
Undo / Redo
Diff / Revision history
History logging (e.g., Wikipedia)
Interaction logging
Data annotation
Database queries and results (Oracle Flashback)
Screen capturing?
Knowledge management
Etc.
Problems and Challenges
• There are lots of people working on problems
pertaining to provenance. For example:
•
•
•
Open Provenance Model (OPM): common framework
for describing information provenance (e.g.,
Wikipedia)
Database: (1) how data is derived from large
computation, (2) how data is copied / synthesized
from one part of the database to another
Knowledge management / Ontology: the study of
relationships between data, concepts, processes,
information, etc. (e.g., Scientific Workflow)
What is “Analytic Provenance”?
• Analytic Provenance (AP): similar to workflow, is the
provenance of the analysis (the steps of the analysis).
This includes:
•
•
•
•
•
The record of the data and data manipulation
The record of the user’s interactions
The record of what the user sees
The record of the computational products (both the
visualization and the data)
The record of the user’s analysis results
• How is provenance research in visual analytics similar
and different from the others?
Similarities
• At a high level, both are attempts at keeping
track of the changes of data and information
over time.
• Along the same line, one common goal is to
verify and validate the current information or
the results of a query / analysis.
• Others?
Differences
• Exploration with high interactivity.
•
•
•
•
Many provenance systems utilize state diagrams
Similar to depicting “scientific workflow”
What is a “state” in a highly exploratory system?
Often, the “states” are data dependent
• Computation can be costly
•
Storing only the procedures do not always lead to the
same result due to computational powers
• Storing of “insight”
•
How to determine which interactions are useful? (the
labeling problem…)
Why Study Analytic Provenance?
• Damn good question…
• A problem that has plagued me for years
• Validation (of the results) –kind of lame
• Verification (of the process) – kind of lame
• Training – Less lame, but still not very sexy
• These can all be (somewhat) solved with
video-capturing the screen (with some
annotation and/or microphone)
Why Study Analytic Provenance?
• More interestingly, as a post hoc analysis:
•
Redo of a certain task
•
•
What analysis steps are useful?
•
•
For example, I just did some analysis on the Census of
Massachusetts. Can I reapply the same analysis to Rhode
Island Census?
For example, in solving a difficult problem, is there a key
step (or steps) that lead to success (and to failure)?
Building a knowledge-base
•
If we believe that interactions encodes reasoning and
analysis, can I aggregate all interactions/analyses into a
knowledge-base?
Why Study Analytic Provenance?
• As an ad hoc analysis (that is, in real time),
determine a user’s “analytical needs”:
•
•
•
Related to “adaptive” visualization or interfaces, but
goes beyond adapting the UI. For example:
Analysis of search space: what data space has the
user explored (and not explored)?
Analysis of bias: is the user favoring some hypothesis
and ignoring evidence?
• Essentially, anything that can give us some
indication of who the user is (in terms of “ICD3”)
Challenges?
• What are some challenges that need to be
solved in order to accomplish the previously
stated goals?
• How would you categorize them?
• CHI 2010 Analytic Provenance workshop
Questions?
Provenance and Scientific Workflows:
Challenges and Opportunities
Susan B.Davidson (Upenn)
Juliana Freire (Utah/NYU)
SIGMOD 2008
Prospective
Provenance: Steps
taken in the
scientific workflow
Retrospective
Provenance:
The environment in
which the steps
were taken
Annotated
Provenance:
Some notes entered
by the user
Note the small
squares in the
upper left corner of
each box
(representing each
step)
Ways to Store Provenance
• RDF (resource description file)
• SPARQL (query language for RDF)
Ways to Store Provenance
OWL (Web
Ontology
Language)
Yuck…
• Knowledge vs. information vs. data
• Example about grass, rabbit, and wolf
Opportunities
1. Provenance and Scientific Publications:
•
Reproducibility for scientific experiments
2. Provenance and Data Exploration:
•
Simplify exploratory processes (graph reduction)
3. Social Network Analysis:
•
Not sure how this is different from 1, but applied to
SNA (and crowd sourcing)
4. Provenance in Education:
•
“Teaching is one of the killer applications of
provenance-enabled workflow systems”
• User starts with an analogy template (left)
• The user then reapplies the workflow to a
different dataset (right). The system
automatically figures out inconsistencies or
problems with the reapplication and either
(a) fixes it or (b) warns the user
Open Problems
1. Information Management Infrastructure:
•
Usability issue relating to how to use information
management systems
2. Provenance Analytics and Visualization:
•
Visualize and data-mine provenance.
3. Interoperability:
•
Steps of workflow generated from different software
and computers
4. Connecting Database and Workflow:
•
Does provenance make sense without access to the
(a) original or the (b) derived data
Questions?
Date and Time
• When and where should we meet??
Structure of the Course
• Open-ended
•
The topic of provenance is both old and new…
•
•
•
Read existing papers (that I think are important), and
You suggest new ones
Identify research opportunities
• Daily schedule (tentative):
•
•
•
1:30 – 2:30 3-4 presentations of 15 minutes each
2:30 – 2:45 break
2:45 – 4:00 group discussion
Time Line
• See course website
• http://www.cs.tufts.edu/comp/250VA/
• Sign up for:
• 2 scribes positions (on different weeks)
• 2 papers to present (on different weeks)
•
And on those weeks, you will lead the discussions
• 1 week where your group (of 3) will identify
papers and lead discussions
Questions?
Download