Provenance Management: Challenges and Opportunities Juliana Freire

advertisement
Provenance Management:
Challenges and Opportunities
Juliana Freire
http://www.cs.utah.edu/~juliana
What is Provenance?
 
 
Oxford English Dictionary: (i) the fact of coming
from some particular source or quarter; origin,
derivation. (ii) the history or pedigree of a work of
art, manuscript, rare book, etc.; concretely, a
record of the ultimate derivation and passage of an
item through its various owners.
Apple Dictionary: Origin, source, place of origin;
birthplace, fount, roots, pedigree, derivation, root,
etymology; formal radix.
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
2
Provenance in Art
Rembrandt van Rijn
Self-Portrait, 1659
Andrew W. Mellon Collection
1937.1.72
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
3
Provenance in Business
 
 
 
Corporate and accounting scandals, e.g., Enron,
Tyco, WorldCom
Decline of public trust in accounting and reporting
practices
Sarbanes-Oxley Act of 2002 requires detailed
provenance (auditing), e.g., http://en.wikipedia.org/wiki/Sarbanes-Oxley_Act
–  Understand the flow of transactions
….to identify points at which a
misstatement could arise
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
4
Provenance in Health Care
 
 
 
Need to keep track of patients’ provenance
Verify compliance to protocols
Mine/analyze this information to make informed
decisions
–  E.g., assess the effectiveness of procedures and drugs for
large populations
–  "Evidence-based medicine is the conscientious, explicit and
judicious use of current best evidence in making decisions
about the care of the individual patient.”
[D. Sacket, 1996]
http://www.hsl.unc.edu/Services/Tutorials/EBM/
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
5
Provenance in Science
Provenance is as (or
more!) important as
the result
  Not a new issue
  Lab notebooks have
been used for a long time
  What is new?
When
 
–  Large volumes of data
–  Complex analyses
 
 
Writing notes is no
longer an option…
Computational
provenance
Provenance Management – Juliana Freire
Annotation
Observed
data
DNA recombination
By Lederberg
BTW 2009 – Münster, Germany
6
Provenance for
Digital Data
Emp
John
Susan
Provenance Management – Juliana Freire
Dept
D01
D02
Mgr
Mary
Ken
BTW 2009 – Münster, Germany
7
Uses of Computational Provenance
anon4876_zspace_20060331.jpg
anon4877_zspace_20060331.jpg
anon4877_lesion_20060401.jpg
Reproducibility
Data quality
Attribution
Informational
How were these images created?
Was any pre-processing applied to the raw data?
What’s the difference?
Who created them?
Are they really from the
same patient?
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
8
Computational Provenance
anon4877_CT_scan
usedBy
usedBy
!"#7"5'3"'5)84%02".*)+8)5
!"#*&73)
!"#4"+'/"'+)5*%.3"16)75)+
!"#-%&%5:5+2.;)51'23"0%2
!"#$%&'()*+,-+."-%(/%.0")1'23"0%2
!"#*&73)
!"#8%&%+9+731:)+2'3/".%3
!"#40)3)60.)1'23"0%2
!"#<,)3=>$%&'()9)?"'+)@7,,)+AB
!"#$%&'()*+,-+."9+//)5
!"#*.)/)0.1)2'3/".%3
!"#$%&'()45%/)5",
!"#$%&'()*+%,)+"-
!"#$%&'()
!"#$%&'()
!"#-+()5+
!"#87()+7
!"#6)35)+)+
!"#*)28)5)5
Causality graph
$9;8)&&
$:<-)&&
generated
anon4877_zspace_20060331.jpg
Provenance Management – Juliana Freire
generated
anon4877_lesion_20060401.jpg
BTW 2009 – Münster, Germany
9
Uses of Computational Provenance
anon4877_CT_scan
usedBy
usedBy
!"#7"5'3"'5)84%02".*)+8)5
!"#*&73)
!"#4"+'/"'+)5*%.3"16)75)+
!"#-%&%5:5+2.;)51'23"0%2
!"#$%&'()*+,-+."-%(/%.0")1'23"0%2
!"#*&73)
!"#8%&%+9+731:)+2'3/".%3
!"#40)3)60.)1'23"0%2
!"#<,)3=>$%&'()9)?"'+)@7,,)+AB
!"#$%&'()*+,-+."9+//)5
!"#*.)/)0.1)2'3/".%3
!"#$%&'()45%/)5",
!"#$%&'()*+%,)+"-
!"#$%&'()
!"#$%&'()
!"#-+()5+
!"#87()+7
!"#6)35)+)+
!"#*)28)5)5
Causality graph
$9;8)&&
$:<-)&&
generated
anon4877_zspace_20060331.jpg
Provenance Management – Juliana Freire
generated
anon4877_lesion_20060401.jpg
BTW 2009 – Münster, Germany
10
Provenance Management: Desiderata
 
 
 
 
 
Need reliable, accurate and transparent
mechanisms to capture provenance information
Need appropriate model to represent information
Need efficient storage and querying
Need to integrate provenance from multiple sources
Need usable tools
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
11
Outline
 
Provenance management for data exploration and
visualization
 
Querying and re-using provenance
 
Emerging applications and research directions
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
12
Provenance Management for
Data Exploration and Visualization
Data Exploration and Visualization
A picture is worth 1,000 words…
And it might take 1,000 steps to create!
Data
Process
Data
Product
Specification
Data
Manipulation
Perception &
Cognition
Knowledge
Exploration
User
Figure modified from J. van Wijk, IEEE Vis 2005
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
14
Data Exploration and Visualization
raw data:CT scan
workflow
!"#7"5'3"'5)84%02".*)+8)5
!"#4"+'/"'+)5*%.3"16)75)+
!"#*&73)
!"#4"+'/"'+)5*%.3"16)75)+
!"#*&73)
!"#8%&%+9+731:)+2'3/".%3
!"#4"+'/"'+)5*%.3"16)75)+
!"#4"+'/"'+)5*%.3"16)75)+
!"#-%&%5:5+2.;)51'23"0%2
!"#8%&%+9+731:)+2'3/".%3
!"#*&73)
!"#$%&'()*+,-+."-%(/%.0")1'23"0%2
!"#*&73)
!"#*.)/)0.1)2'3/".%3
!"#8%&%+9+731:)+2'3/".%3
!"#8%&%+9+731:)+2'3/".%3
!"#*.)/)0.1)2'3/".%3
!"#<,)3=>$%&'()9)?"'+)@7,,)+AB
!"#40)3)60.)1'23"0%2
!"#<,)3=>$%&'()9)?"'+)@7,,)+AB
!"#<,)3=>$%&'()9)?"'+)@7,,)+AB
!"#<,)3=>$%&'()9)?"'+)@7,,)+AB
!"#$%&'()*+,-+."9+//)5
!"#*.)/)0.1)2'3/".%3
!"#*.)/)0.1)2'3/".%3
!"#$%&'()*+%,)+"!"#$%&'()*+%,)+"!"#$%&'()*+%,)+"!"#$%&'()45%/)5",
!"#$%&'()*+%,)+"-
!"#$%&'()
!"#$%&'()
!"#$%&'()
!"#$%&'()
!"#$%&'()
!"#87()+7
!"#87()+7
!"#-+()5+
!"#87()+7
!"#87()+7
!"#6)35)+)+
!"#6)35)+)+
!"#6)35)+)+
!"#6)35)+)+
!"#*)28)5)5
$9;8)&&
$9;8)&&
$9;8)&&
$9;8)&&
$:<-)&&
Files (workflow specifications)
anon4877_voxel_scale_1_zspace_20060331.srn
anon4877_textureshading_20060331.srn
anon4877_textureshading_plane0_20060331.srn
anon4877_goodxferfunction_20060331.srn
anon4877_lesion_20060331.srn
Provenance Management – Juliana Freire
Notes
Initial
visualization
withAdded
z-scaling
texture
corrected!
andAdded
shading!
plane to
visualize
Found good
internal
transfer
structure!
Identified
function!
lesion tissue!
BTW 2009 – Münster, Germany
15
Data Exploration and Visualization: Issues
 
Provenance is maintained manually through filenaming conventions and detailed notes
–  A time-consuming, error-prone process
 
 
 
Hard to understand the exploratory process and
relationships among workflows
Hard to further explore the data, e.g., locate relevant
data products/workflows and modify them
Hard to collaborate, and work is likely to be lost if
creator leaves
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
16
Exploration and Creativity Support
 
 
 
Reflective reasoning is key in the exploratory
processes
“Reflective reasoning requires the ability to store
temporary results, to make inferences from stored
knowledge, and to follow chains of reasoning
backward and forward, sometimes backtracking
when a promising line of thought proves to be
unfruitful. …the process is slow and laborious”
Donald A. Norman
Need external aids—tools to facilitate this process
–  Creativity support tools
 
[Shneiderman, CACM 2002]
Need aid from people
–  Support for collaboration
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
17
VisTrails: Managing Provenance
 
Support for exploratory tasks such as visualization and
data mining
–  Task specification iteratively refined as users generate and
test hypotheses
 
Comprehensive provenance infrastructure for
computational tasks
–  Data + workflow provenance
–  Treat workflow as a 1st-class data product
 
 
 
VisTrails manages the data, metadata and the
exploration process, scientists can focus on science!
Infrastructure can be combined with and enhance
these scientific workflow and visualization systems
Focus on usability—build tools for scientists
http://www.vistrails.org
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
18
Keeping Exploration Trails
Trail
Workflows
Data Products
!"#7"5'3"'5)84%02".*)+8)5
!"#-%&%5:5+2.;)51'23"0%2
!"#$%&'()*+,-+."-%(/%.0")1'23"0%2
!"#40)3)60.)1'23"0%2
!"#$%&'()*+,-+."9+//)5
!"#4"+'/"'+)5*%.3"16)75)+
!"#8%&%+9+731:)+2'3/".%3
!"#$%&'()45%/)5",
!"#*.)/)0.1)2'3/".%3
!"#<,)3=>$%&'()9)?"'+)@7,,)+AB
!"#$%&'()
!"#$%&'()*+%,)+"-
!"#*&73)
!"#4"+'/"'+)5*%.3"16)75)+
!"#-+()5+
!"#8%&%+9+731:)+2'3/".%3
!"#$%&'()
!"#<,)3=>$%&'()9)?"'+)@7,,)+AB
!"#*.)/)0.1)2'3/".%3
!"#*)28)5)5
!"#87()+7
$:<-)&&
!"#$%&'()*+%,)+"-
!"#6)35)+)+
!"#$%&'()
$9;8)&&
!"#*&73)
!"#4"+'/"'+)5*%.3"16)75)+
!"#87()+7
!"#8%&%+9+731:)+2'3/".%3
!"#6)35)+)+
!"#*.)/)0.1)2'3/".%3
!"#<,)3=>$%&'()9)?"'+)@7,,)+AB
$9;8)&&
!"#$%&'()*+%,)+"-
!"#$%&'()
!"#*&73)
!"#4"+'/"'+)5*%.3"16)75)+
!"#87()+7
!"#*&73)
!"#8%&%+9+731:)+2'3/".%3
!"#<,)3=>$%&'()9)?"'+)@7,,)+AB
!"#*.)/)0.1)2'3/".%3
!"#6)35)+)+
$9;8)&&
!"#$%&'()*+%,)+"-
!"#$%&'()
!"#87()+7
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
!"#6)35)+)+
$9;8)&&
19
Keeping Exploration Trails
Trail
Notes
Initial
visualization
with z-scaling
corrected!
Added texture
and shading!
User
juliana
eranders
Added plane to visualize
internal structure!
Provenance Management – Juliana Freire
eranders
Found good
transfer
function!
eranders
Identified
lesion tissue!
stevec
BTW 2009 – Münster, Germany
20
Short Demo:
The Change-Based Provenance Model
Change-Based Provenance
 
 
Records user actions
Provenance = changes to
computational tasks
–  Add a module, add a connection,
change a parameter value
 
Extensible change algebra
addModule
deleteConnection
addConnection
addConnection
setParameter
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
22
Change-Based Provenance
 
 
Records user actions
Provenance = save changes to
computational tasks
–  Add a module, add a connection,
change a parameter value
Extensible change algebra
  A vistrail node vt corresponds to
the workflow that is constructed
by the sequence of actions from
the root to vt
vt = xn ◦ xn-1 ◦ … ◦ x1 ◦ Ø
 
vistrail
x1
x2
x3
[Freire et al, IPAW 2006]
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
23
Exploring the Change Space
 
 
Scripting workflows: Parameter explorations are
simple to specify and apply
Exploration of parameter space for a workflow vt
(setParameter(idn,valuen) ◦ … ◦ (setParameter(id1,value1) ◦ vt )
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
24
Exploring the Change Space
 
 
Scripting workflows: Parameter explorations are
simple to specify and apply
Exploration of parameter space for a workflow vt
(setParameter(idn,valuen) ◦ … ◦ (setParameter(id1,value1) ◦ vt )
 
Exploration of multiple workflow specifications
(addModule(idi,…) ◦ (deleteModule(idi) ◦ v1 )
…
(addModule(idi,…) ◦ (deleteModule(idi) ◦ vn )
 
 
 
Results can be conveniently compared in the
VisTrails spreadsheet
Can create animations too!
Caching to avoid redundant computations [Bavoil et
al., IEEE Vis 2005]
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
25
Computing Workflow Differences
No need to compute subgraph
isomorphism!
  A vistrail is a rooted tree: all
nodes have a common
ancestor—diffs are welldefined and simple to
compute
vt1 = xi ◦ xi-1 ◦ … ◦ x1 ◦ Ø
vt2 = xj ◦ xj-1 ◦ … ◦ x1 ◦ Ø
vt1-vt2 = {xi, xi-1, …, x1, Ø} –
{xj, xj-1, …,x1 , Ø}
  Different semantics:
 
–  Exact, based on ids
–  Approximate, based on module
signatures
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
26
Collaborative Exploration
 
Collaboration is key to data exploration
–  Translational, integrative approaches to science
 
 
Store provenance information in a database
Synchronize concurrent updates through locking
–  Real-time collaboration [Ellkvist et al., IPAW 2008]
 
Asynchronous access: similar to version control
systems
–  Check out, work offline, synchronize
–  Users exchange patches
 
 
Synchronization is simple—provenance is
monotonic
No need for a central repository—support for
distributed collaboration
–  For details see Callahan et al, SCI Institute Technical
Report, No. UUSCI-2006-016 2006
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
27
Change-Based Provenance: Summary
 
General: Works with any system that has undo/redo!
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
28
Provenance Enabling Tools
[Callahan et al., IPAW 2008]
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
29
Change-Based Provenance: Summary
 
 
 
General: Works with any system that has undo/redo!
Concise representation
Uniformly captures data and workflow provenance
–  Data provenance: where does a specific data product come
from?
–  Workflow evolution: how has workflow structure changed over
time?
 
 
 
Results can be reproduced
Detailed information about the exploration process
Provenance beyond reproducibility:
–  Support for reflective reasoning: Scientists can return to any
point in the exploration space, compare results side-by-side
–  Support for collaboration
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
30
Querying and Re-Using
Provenance
Querying and Re-Using Provenance
 
 
 
Storing detailed information
is important, but not
sufficient
Need appropriate user
interface and operations to
leverage information
Understand and re-use the
history
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
32
Querying Provenace: Examples
 
Find examples:
–  Find workflows that process a particular type of file
–  Find workflows that output a particular data product
–  Find workflows that contain a given module or sequence of
modules
 
Explore provenance:
–  What was the process used to create a data product? Which
algorithms were used?
–  Who created it?
–  Which data sets were used as input to create a data
product? What were the intermediate results?
–  Which data products were derived from a given data set?
–  What are the differences between two data products?
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
33
Query from REDUX, Microsoft
Querying Provenance
 
 
Provenance and workflows are represented as DAGs, hard to
express queries using text
Sample query from Provenance Challenge:
–  Find all invocations of procedure align_warp using parameter
“model” set to “12” that ran on a Monday.
 
New provenance query language
wf{*}:
x where x.module = AlignWarp and
x.parameter('model') = '12' and
(log{x}: y where y.dayOfWeek = 'Monday')
Workflow Evolution
Workflow
Execution
For details see http://twiki.gridprovenance.org/bin/view/Challenge/VisTrails
 
 
Much simpler than other approaches which use SQL, SparQL,
Prolog….
But who is going to write those queries?
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
34
Querying Provenance by Example
 
 
Specify queries visually as graphs
Querying provenance by example [Scheidegger et al., TVCG
2007; Beeri et al., VLDB 2006]
–  WYSIWYQ -- What You See Is What You Query
–  Interface to query is the same used to create workflow
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
35
Refining Workflows
 
Complex analyses are hard to put together
–  Programming expertise
–  Domain knowledge
–  Familiarity with different tools
Provenance Management – Juliana Freire
Steep learning curve
BTW 2009 – Münster, Germany
36
Refining Workflows by Analogy
 
Leverage the wisdom of the crowds in shared
provenance
–  Some workflow refinements are common, e.g., change the
rendering technique, publish image on the Web
 
Apply refinements by analogy, automatically
[Scheidegger et al, IEEE TVCG 2007]
Source
Target
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
37
Refining Workflows by Analogy: Example
http://www.cs.utah.edu/~juliana/videos/Analogies.m4v
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
38
Refining Workflows by Analogy
is to
as
Provenance Management – Juliana Freire
is to
?
BTW 2009 – Münster, Germany
39
Refining Workflows by Analogy
A
as
1. 
Compute difference: ∆(A,B)
–  Just like a patch!
–  But…
D = ∆(A,B) ◦ C may not be a valid
workflow
2. 
B
is to
C
is to
D
A
C
Find correspondences between A
and C: map(A,C)
–  Diffuse similarity scores across the
product graph AxC using Eigenvalue
decompositions
3. 
4. 
Compute mapped difference
∆AC(A,B) =map(A,C) ∆(A,B)
D = ∆AC(A,B) ◦ C
Details in [Scheidegger et al, IEEE TVCG 2007]
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
40
The Analogy Operation
 
 
 
Allows workflows to be refined without requiring
users to directly modify the specification
Basis for scalable updates
Analogies are not foolproof
–  If it works, great. If it doesn't, it may help
–  User can edit and fix the new version
 
Improve by
–  Using domain knowledge
–  Learning from user feedback
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
41
The Need for Guidance in
Workflow Design
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
42
VisComplete: A Workflow
Recommendation System
 
 
 
Mine provenance collection: Identify graph
fragments that co-occur in a collection of workflows
Predict sets of likely workflow additions to a given
partial workflow
Similar to a Web browser suggesting URL
completions
[Koop et al., IEEE Vis 2008]
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
43
VisComplete: A Workflow
Recommendation System
 
 
 
Mine provenance collection: Identify graph
fragments that co-occur in a collection of workflows
Predict sets of likely workflow additions to a given
partial workflow
Similar to a Web browser suggesting URL
completions
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
44
VisComplete: Demo
[Koop et al., IEEE Vis2008]
http://www.cs.utah.edu/~juliana/videos/viscomplete_h_264.mov
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
45
VisComplete: Results Summary
 
 
 
 
Eliminates over 50% of actions
Selected completions are almost always in the first
four suggestions
A database of simple pipelines can aid users
constructing more complex pipelines
See [Koop et al., TVCG 2008] for details on how the
path database is constructed and on the completion
algorithm
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
46
Emerging Applications
Research Directions
is currently the six-time Grand Champion of the Tour de
France. It reports that the physiological factor most relevant to
performance improvement as he matured over the 7-yr period
from ages 21 to 28 yr was an 8% improvement in muscular
efficiency when cycling. This adaptation combined with relatively large reductions in body fat and thus body weight (e.g.,
78 –72 kg) during the months before the Tour de France
J Appl Physiol 98: 2191–2196, 2005.
contributed
to an impressive
18%doi:10.1152/japplphysiol.00216.2005.
improvement in his powerFirst published
March 17, 2005;
to-body weight ratio (i.e., W/kg) when cycling at a given V̇O2
(e.g., 5.0 l/min or !83% V̇O2 max). Remarkably, this individual
was able to display these achievements despite the fact that he
displayeddeveloped
as Tour
de France
matures
advanced
cancer at agechampion
25 yr and required
surgeries
and chemotherapy.
Scientific Publications and Provenance
Improved muscular efficiency
Edward F. Coyle
Human Performance Laboratory, Department of Kinesiology and
Health Education, The University of Texas at Austin, Austin, Texas
Submitted 22 February 2005; accepted in final form 10 March 2005
maximum oxygen uptake; blood lactate concentration
Provenance
MUCH HAS BEEN LEARNED about the physiological factors that
contribute to endurance performance ability by simply describing the characteristics of elite endurance athletes in sports such
as distance running, bicycle racing, and cross-country skiing.
The numerous physiological determinants of endurance have
been organized into a model that integrates such factors as
maximal oxygen uptake (V̇O2 max), the blood lactate threshold,
and muscular efficiency,
as theseFreire
have been found to be the
Management
– Juliana
most important variables (7, 8, 15, 21). A common approach
has been to measure these physiological factors in a given
athlete at one point in time during their competitive career and
ages 21 to 28 y. Description of this person is noteworthy for
two reasons. First, he rose to become a six-time and present
Grand Champion of the Tour de France, and thus adaptations
relevant to this feat were identified. Remarkably, he accomplished this after developing and receiving treatment for advanced cancer. Therefore, this report is also important because
it provides insight, although limited, regarding the recovery of
“performance physiology” after successful treatment for adFig. 1. Mechanical efficiency when bicycling expressed as “gross efficiency”
vanced
cancer. The
of this
study
will be
to World
report
and
“delta efficiency”
overapproach
the 7-yr period
in this
individual.
WC,
results
from
standardized
laboratory
on this individual
Bicycle
Road
Racing
Championships,
1st and 4thtesting
place, respectively.
Tour de
France
1st,time
Grand
Champion
of the Tour de
in 199921.5,
–2004.22.0, 25.9,
at five
points
corresponding
toFrance
ages 21.1,
and 28.2 yr.
J Appl Physiol • VOL
Downloaded from jap.physiology.org on February 15, 2009
Coyle, Edward F. Improved muscular efficiency displayed as Tour
de France champion matures. J Appl Physiol 98: 2191–2196, 2005. First
published March 17, 2005;doi:10.1152/japplphysiol.00216.2005.—
This case describes the physiological maturation from ages 21 to 28 yr
of the bicyclist who has now become the six-time consecutive Grand
Champion of the Tour de France, at ages 27–32 yr. Maximal oxygen
uptake (V̇O2 max) in the trained state remained at !6 l/min, lean body
weight remained at !70 kg, and maximal heart rate declined from 207
to 200 beats/min. Blood lactate threshold was typical of competitive
cyclists in that it occurred at 76 – 85% V̇O2 max, yet maximal blood
lactate concentration was remarkably low in the trained state. It
appears that an 8% improvement in muscular efficiency and thus
power production when cycling at a given oxygen uptake (V̇O2) is the
characteristic that improved most as this athlete matured from ages 21
to 28 yr. It is noteworthy that at age 25 yr, this champion developed
advanced cancer, requiring surgeries and chemotherapy. During the
months leading up to each of his Tour de France victories, he reduced
body weight and body fat by 4 –7 kg (i.e., !7%). Therefore, over the
7-yr period, an improvement in muscular efficiency and reduced body
fat contributed equally to a remarkable 18% improvement in his
steady-state power per kilogram body weight when cycling at a given
V̇O2 (e.g., 5 l/min). It is hypothesized that the improved muscular
efficiency probably reflects changes in muscle myosin type stimulated
from years of training intensely for 3– 6 h on most days.
whom subsequently raced professio
period of 1989 –1995. The five-tim
Tour de France during the years 19
to possess a V̇O2 max of 6.4 l/min an
a body weight of 81 kg (28). La
subject in our study were not ma
France; however, with the cons
V̇O2 max was at least 6.1 l/min an
weight of 72 kg, we estimate his V̇
85 ml!kg"1 !min"1 during the per
Tour de France. Therefore, his V̇O
weight during his victories of 1999
what higher than what was reporte
1991–1995 and to be among the high
class runners and bicyclists (e.g., 80 –
16, 28, 29)
It is generally appreciated that in
success in endurance sports also req
for prolonged periods at a high per
as the ability to efficiently convert
muscular power and velocity (5, 7,
blood LT (e.g., 1 mM increase in bl
in absolute terms or as a percentag
reasonably good predictor of aero
that a given rate of ATP turnover c
21), and prediction is strengthened
ment of muscle capillary density i
Capillary density is thought to be
muscle’s ability to clear fatiguing m
muscle fibers into the circulation,
98 • JUNE 2005 •
METHODS
General testing sequence. On reporting to the laboratory, training,
racing, and medical histories were obtained, body weight was measured ("0.1 kg), and the following tests were performed after informed consent was obtained, with procedures approved by the
Internal Review Board of The University of Texas at Austin. Mechanical efficiency and the blood lactate threshold (LT) were determined as the subject bicycled a stationary ergometer for 25 min, with
work rate increasing progressively every 5 min over a range of 50, 60,
70, 80, and 90% V̇O2 max. After a 10- to 20-min period of active
recovery, V̇O2 max when cycling was measured. Thereafter, body
composition was determined by hydrostatic weighing and/or analysis
of skin-fold thickness (34, 35).
Measurement of V̇O2 max. The same Monark ergometer (model 819)
equipped with a racing seat and drop handlebars and pedals for
cycling shoes was used for all cycle testing, and seat height and saddle
position were held constant. The pedal’s crank length was 170 mm.
V̇O2 max was measured during continuous cycling lasting between 8
and 12 min, with work rate increasing every 2 min. A leveling off of
oxygen uptake (V̇O2) always occurred, and this individual cycled until
exhaustion at a final power output that was 10 –20% higher than the
– Münster,
minimal power output needed to BTW
elicit V̇O2009
2 max. A venous blood
sample was obtained 3– 4 min after exhaustion for determination of
blood lactate concentration after maximal exercise, as described
below. The subject breathed through a Daniels valve; expired gases
www.jap.org
Germany
48
is currently the six-time Grand Champion of the Tour de
France. It reports that the physiological factor most relevant to
performance improvement as he matured over the 7-yr period
from ages 21 to 28 yr was an 8% improvement in muscular
efficiency when cycling. This adaptation combined with relatively large reductions in body fat and thus body weight (e.g.,
78 –72 kg) during the months before the Tour de France
J Appl Physiol 98: 2191–2196, 2005.
contributed
to an impressive
18%doi:10.1152/japplphysiol.00216.2005.
improvement in his powerFirst published
March 17, 2005;
to-body weight ratio (i.e., W/kg) when cycling at a given V̇O2
(e.g., 5.0 l/min or !83% V̇O2 max). Remarkably, this individual
was able to display these achievements despite the fact that he
displayeddeveloped
as Tour
de France
matures
advanced
cancer at agechampion
25 yr and required
surgeries
and chemotherapy.
Scientific Publications and Provenance
Improved muscular efficiency
Edward F. Coyle
Human Performance Laboratory, Department of Kinesiology and
Health Education, The University of Texas at Austin, Austin, Texas
whom subsequently raced professio
period of 1989 –1995. The five-tim
Tour de France during the years 19
to possess a V̇O2 max of 6.4 l/min an
a body weight of 81 kg (28). La
subject in our study were not ma
France; however, with the cons
V̇O2 max was at least 6.1 l/min an
weight of 72 kg, we estimate his V̇
85 ml!kg"1 !min"1 during the per
Tour de France. Therefore, his V̇O
weight during his victories of 1999
what higher than what was reporte
1991–1995 and to be among the high
class runners and bicyclists (e.g., 80 –
16, 28, 29)
It is generally appreciated that in
success in endurance sports also req
for prolonged periods at a high per
as the ability to efficiently convert
muscular power and velocity (5, 7,
blood LT (e.g., 1 mM increase in bl
in absolute terms or as a percentag
reasonably good predictor of aero
that a given rate of ATP turnover c
21), and prediction is strengthened
ment of muscle capillary density i
Capillary density is thought to be
muscle’s ability to clear fatiguing m
muscle fibers into the circulation,
Submitted 22 February 2005; accepted in final form 10 March 2005
O2 max
O2 max
power production when cycling at a given oxygen uptake (V̇O2) is the
characteristic that improved most as this athlete matured from ages 21
to 28 yr. It is noteworthy that at age 25 yr, this champion developed
advanced cancer, requiring surgeries and chemotherapy. During the
months leading up to each of his Tour de France victories, he reduced
body weight and body fat by 4 –7 kg (i.e., !7%). Therefore, over the
7-yr period, an improvement in muscular efficiency and reduced body
fat contributed equally to a remarkable 18% improvement in his
steady-state power per kilogram body weight when cycling at a given
V̇O2 (e.g., 5 l/min). It is hypothesized that the improved muscular
efficiency probably reflects changes in muscle myosin type stimulated
from years of training intensely for 3– 6 h on most days.
Fig. 1. Mechanical efficiency when bicycling expressed as “gross efficiency”
and “delta efficiency” over the 7-yr period in this individual. WC, World
Bicycle Road Racing Championships, 1st and 4th place, respectively. Tour de
France 1st, Grand Champion of the Tour de France in 1999 –2004.
and 28.2 yr.
J Appl Physiol • VOL
Downloaded from jap.physiology.org on February 15, 2009
"raw data from the January 1993 test that revealed several
additional
published
methodology.
Coyle
Coyle, Edward F. deviations
Improved muscular efficiencyfrom
displayed as the
Tour ages
21 to 28 y. Description
of this person is noteworthy for
de France champion matures. J Appl Physiol 98: 2191–2196, 2005. First two reasons. First, he rose to become a six-time and present
published March 17, 2005;doi:10.1152/japplphysiol.00216.2005.— Grand Champion of the Tour de France, and thus adaptations
used
20-min
ergometer
(not
25 min), including 2This case a
describes
the physiological
maturation from ages 21 to protocol
28 yr relevant to this
feat were identified. Remarkably, he accomof the bicyclist who has now become the six-time consecutive Grand
plished this after developing and receiving treatment for adChampion
of the Tour stages
de France, at ages 27–32
yr. Maximalrespiratory
oxygen
and
3-min
where
exchange ratios (RER)
) in the trained state remained at !6 l/min, lean body vanced cancer. Therefore, this report is also important because
uptake (V̇
it
provides
insight,
although limited, regarding the recovery of
weight remained at !70 kg, and maximal heart rate declined from 207
“performance
physiology”
successful
adexceeded
1.00.
AnwasRER
>1.00 invalidatesafteruse
oftreatment
theforLusk
to 200 beats/min. Blood
lactate threshold
typical of competitive
, yet maximal blood vanced cancer. The approach of this study will be to report
cyclists in that it occurred at 76 – 85% V̇
lactate concentration was remarkably low in the trained state. It results from standardized laboratory testing on this individual
equations
(5) toin estimate
expenditure.”
appears that an 8% improvement
muscular efficiency andenergy
thus at five time
points corresponding to ages 21.1, 21.5, 22.0, 25.9,
98 • JUNE 2005 •
METHODS
www.jap.org
General testing sequence.
On reporting are
to the laboratory,
training,
”…all of the published delta efficiency
values
wrong.
…
racing, and medical histories were obtained, body weight was measured ("0.1 kg), and the following tests were performed after inthere exists no credible evidence
to was
support
Coyle's
formed consent
obtained, with procedures
approved by the
Internal Review Board of The University of Texas at Austin. Mechanical efficiencyefficiency
and the blood lactate threshold
(LT) were deterconclusion that Armstrong's muscle
improved."
mined as the subject bicycled a stationary ergometer for 25 min, with
maximum oxygen uptake; blood lactate concentration
work rate increasing progressively every 5 min over a range of 50, 60,
70, 80, and 90% V̇O2 max. After a 10- to 20-min period of active
recovery, V̇O2 max when cycling was measured. Thereafter, body
composition was determined by hydrostatic weighing and/or analysis
of skin-fold thickness (34, 35).
Measurement of V̇O2 max. The same Monark ergometer (model 819)
equipped with a racing seat and drop handlebars and pedals for
cycling shoes was used for all cycle testing, and seat height and saddle
position were held constant. The pedal’s crank length was 170 mm.
V̇O2 max was measured during continuous cycling lasting between 8
and 12 min, with work rate increasing every 2 min. A leveling off of
oxygen uptake (V̇O2) always occurred, and this individual cycled until
exhaustion at a final power output that was 10 –20% higher than the
– Münster,
minimal power output needed to BTW
elicit V̇O2009
2 max. A venous blood
sample was obtained 3– 4 min after exhaustion for determination of
blood lactate concentration after maximal exercise, as described
below. The subject breathed through a Daniels valve; expired gases
http://jap.physiology.org/cgi/content/full/105/3/1020
Provenance
MUCH HAS BEEN LEARNED about the physiological factors that
contribute to endurance performance ability by simply describing the characteristics of elite endurance athletes in sports such
as distance running, bicycle racing, and cross-country skiing.
The numerous physiological determinants of endurance have
been organized into a model that integrates such factors as
maximal oxygen uptake (V̇O2 max), the blood lactate threshold,
and muscular efficiency,
as theseFreire
have been found to be the
Management
– Juliana
most important variables (7, 8, 15, 21). A common approach
has been to measure these physiological factors in a given
athlete at one point in time during their competitive career and
Germany
49
Provenance-Rich Publications
 
 
Bridge the gap between the scientific process and
publications
Results that can be reproduced and validated
–  Papers with deep captions
–  Encouraged by ACM SIGMOD and a number of journals
 
 
Describe more of the discovery process: people only
describe successes, can we learn from mistakes?
Dynamic (interactive) publications
–  Evolve over time
–  Blog/wiki like=> Science 2.0
–  E.g., http://project.liquidpub.org
 
Need tools to support this!
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
50
The Provenance-Enabled Paper
http://www.cs.utah.edu/~juliana/videos/vistrails_pdf.mov
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
51
Science 2.0
 
Web 2.0 technologies has opened up new opportunities
to improve collaboration and information sharing in
science [Shneiderman, Science 2008; Waldrop, Scientific
American 2008]
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
52
Science 2.0: Challenges
 
 
 
 
 
Some skepticism: Can we trust that the information is
accurate? If unpublished results are posted on a wiki, can
we prevent others from stealing that work?
Need provenance: determine authorship, enforce
intellectual property rights, validate the integrity of
artifacts and assess their quality, and to reproduce the
artifact
Social Data Analysis: Share data and processes
Add to information overload!
Finding and making sense of information
–  Google is not enough!
–  Need focused search and structured queries
–  Integrate data on the fly – DataSpaces
 
Preserving data
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
53
Provenance and Teaching (1)
 
Leverage provenance to improve the way we teach
CS and Science
–  http://www.vistrails.org/index.php/SciVisFall2008
–  Lecture provenance: student can reproduce results
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
54
Provenance and Teaching (2)
 
Homework provenance provides insights regarding
–  Task complexity and nature: number of actions; structural vs.
parameter changes; task duration
–  Student confusion: large branching factor=lots of trial and error
steps
 
Very detailed (and honest!) feedback: instructors can
leverage this information
[Lins et al., SSDBM 2008]
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
55
Provenance and Teaching (3)
 
Homework provenance helps students and instructors to
collaborate
–  Student is stuck, sends his provenance
–  Instructor understands studentʼs problem, provides hints---student
can see what instructor did!
–  They can also collaborate in real time [Ellkvist et al., IPAW 2008]
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
56
Using Provenance to
Teach Electronic Media
[Langefeld and Kessler,
Submitted 2009]
“[...] The students have gotten to the point where
they demand the VisTrails files for every
demonstration just after I complete [it]”
“[...] students used [a vistrail instead of a reference
model] 62% of the time”
Students who used provenance produced higherquality models
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
57
Provenance Interoperability
 
Complex tasks may require
–  Multiple workflow systems---each has its own provenance
model, e.g., Swift to run simulation on the Grid, VisTrails to
visualize the results
–  Tools which cannot be wrapped and controlled by workflow
system
–  Human interaction
 
But need to integrate their provenance
–  To obtain complete provenance of a data item
–  To re-use powerful provenance exploration tools such as
VisTrails
 
 
Provenance as an interoperability layer across
different tools and workflow systems
Challenge: information integration!
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
58
Provenance Analytics: Opportunities
 
 
Volume of collected provenance is growing
Workflow and provenance repositories
–  myExperiments (EU), Provenance Repository (Indiana),
ManyEyes (IBM), Yahoo! Pipes
 
Opportunity for knowledge discovery, sharing and
re-use
–  Discover workflow patterns  a recommendation system
that suggests alternatives to users as they construct a
workflow
–  Discover workflow refinement patterns  automatically
extract analogies from shared repositories
–  Cluster (organize) workflow collections  simplify query
and search over repositories
–  Infer workflow specification from execution log [Aalst et al.,
TKDE 2004]
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
59
Provenance Analytics: Challenges
 
Lots of data, complex data: graphs + metadata
–  Modules, parameters, parameter values, data products
 
 
Do existing approaches to graph mining scale?
Case study in clustering: [Santos, IPAW 2008]
–  Explore different workflow representations: Graphs versus
bag of words
–  Examine trade-off between efficiency and cluster quality
–  Bag of words surprisingly effective, and much more
efficient
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
60
Efficient Storage and Querying
 
Provenance information can be large
–  Mimi Project (U. Michigan): provenance = 10 * data
 
Need to worry about efficiency and usability
–  Reduce redundancies in provenance data [Chapman et al,
SIGMOD 2008]
–  Develop efficient techniques for evaluating provenance queries,
which are frequently recursive [Heinis and Alonso, SIGMOD 2008]
–  Need to reduce provenance overload for users [Cohen-Boulakia,
Davidson et al VLDB2007, ICDE2008]
 
Need indexes
–  E.g. to search over a large provenance store, or to apply
analogies to a large number of workflows
–  Provenance and workflows are graphs
–  Can graph indexing techniques be applied to this problem? e.g.,
[Shasha et al. SIGMOD 2002; Yan et al., SIGMOD 2004]
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
61
Conclusions and Future Work
 
Provenance management is essential for exploratory
computational tasks
–  Provenance can be used to support reflective reasoning
–  Intuitive interfaces for simplifying the construction and
refinement of workflows
 
Science 2.0: Sharing provenance at a large scale
creates new opportunities [Freire and Silva, CHI SDA, 2008]
–  Workflow/provenance repositories; provenance-enabled
publications
–  Expose scientists to different techniques and tools
–  Scientists can learn by example; expedite their scientific
training; and potentially reduce their time to insight
 
Provenance + Workflows + Sharing have the
potential to revolutionize science!
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
62
Conclusions and Future Work
 
 
 
 
Appropriate support for exploratory tasks is essential
for a wider adoption and more effective use of
scientific workflow systems
Sharing workflows and provenance at a large scale
creates new opportunities [Freire and Silva, CHI SDA, 2008]
Provenance + Workflows + Sharing have the
potential to revolutionize science!
Many challenging data management problems--great potential for practical impact
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
63
Acknowledgments
 
 
Thanks to VisTrails group
This work is partially supported by the National
Science Foundation, the Department of Energy, an
IBM Faculty Award, and a University of Utah Seed
Grant.
Provenance Management – Juliana Freire
BTW 2009 – Münster, Germany
64
Danke
Obrigada
Thank you
Download