Provenance in Science: Challenges and Opportunities Juliana Freire Claudio Silva

advertisement
Provenance in Science:
Challenges and Opportunities
Juliana Freire
Claudio Silva
http://www.cs.utah.edu/~juliana
http://www.cs.utah.edu/~juliana
What is Provenance?
 
 
Oxford English Dictionary: (i) the fact of coming
from some particular source or quarter; origin,
derivation. (ii) the history or pedigree of a work of
art, manuscript, rare book, etc.; concretely, a
record of the ultimate derivation and passage of an
item through its various owners.
Apple Dictionary: Origin, source, place of origin;
birthplace, fount, roots, pedigree, derivation, root,
etymology; formal radix.
Provenance in Science–Freire & Silva
Moorea 2009
2
Provenance in Art
Rembrandt van Rijn
Self-Portrait, 1659
Andrew W. Mellon Collection
1937.1.72
Provenance in Science–Freire & Silva
Moorea 2009
3
Provenance in Science
Provenance is as (or
more!) important as
the result
  Not a new issue
  Lab notebooks have
been used for a long time
  What is new?
When
 
–  Large volumes of data
–  Complex analyses—
computational processes
 
Writing notes is no
longer an option…
Annotation
Observed
data
DNA recombination
By Lederberg
Provenance in Science–Freire & Silva
Moorea 2009
4
Digital Data and Provenance
Emp
John
Susan
Provenance in Science–Freire & Silva
Dept
D01
D02
Mgr
Mary
Ken
Moorea 2009
5
Uses of Computational Provenance
anon4876_zspace_20060331.jpg
anon4877_zspace_20060331.jpg
anon4877_lesion_20060401.jpg
Reproducibility
Data quality
Attribution
Informational
How were these images created?
Was any pre-processing applied to the raw data?
What’s the difference?
Who created them?
Are they really from the
same patient?
Provenance in Science–Freire & Silva
Moorea 2009
6
Provenance and Workflows
Input: anon4877_CT_scan
!"#7"5'3"'5)84%02".*)+8)5
usedBy
usedBy
!"#-%&%5:5+2.;)51'23"0%2
!"#$%&'()*+,-+."-%(/%.0")1'23"0%2
!"#7"5'3"'5)84%02".*)+8)5
!"#*&73)
!"#4"+'/"'+)5*%.3"16)75)+
!"#-%&%5:5+2.;)51'23"0%2
!"#$%&'()*+,-+."-%(/%.0")1'23"0%2
!"#*&73)
!"#8%&%+9+731:)+2'3/".%3
!"#40)3)60.)1'23"0%2
!"#40)3)60.)1'23"0%2
!"#<,)3=>$%&'()9)?"'+)@7,,)+AB
!"#$%&'()*+,-+."9+//)5
!"#*.)/)0.1)2'3/".%3
!"#$%&'()*+,-+."9+//)5
!"#$%&'()45%/)5",
workflow
!"#$%&'()*+%,)+"-
!"#$%&'()
!"#$%&'()
workflow
!"#$%&'()45%/)5",
!"#-+()5+
!"#87()+7
!"#6)35)+)+
!"#*)28)5)5
$9;8)&&
$:<-)&&
generated
!"#$%&'()
generated
!"#-+()5+
!"#*)28)5)5
Output: anon4877_zspace_20060331.jpg
Provenance in Science–Freire & Silva
Output: anon4877_lesion_20060401.jpg
$:<-)&&
Moorea 2009
7
Motivating Applications
The CMOP Project (Baptista et al.)
Animations by Claudio Silva
 
 
 
 
Center for Coastal Margin
Observation & Prediction
(CMOP)
Enable river-to-ocean
observation of physical and
ecological processes
Create accurate computational
models
Further understanding of these
processes to manage, operate,
and sustain coastal resources
and ecosystems effectively
http://www.ccalmr.ogi.edu/CORIE
Provenance in Science–Freire & Silva
Moorea 2009
9
CMOP: Making Sense of Data
 
 
 
 
Observation and modeling of multiple systems at
multiple scales
Very large number of data products, sensor
measurements, and results from numerical models
Cover more than 10 years of experiments: occupy
over 30 TB of storage
To analyze data, need to integrate data and tools
from different disciplines
–  Biologists, chemists, oceanographers, computer scientists
Provenance in Science–Freire & Silva
Moorea 2009
10
CMOP: Issues
 
 
Heterogeneous tools and data
Custom-built scripts by several staff members
–  Scripts to run simulations; to select data; to create
visualizations
 
Creating new data products is a labor intensive and
time-consuming task, and also error prone
–  Finding and running scripts is a task that can only be
performed by their creators
–  Multiple people must collaborate
 
Data products are generated in an ad-hoc manner
–  Data provenance is not captured in a persistent way
–  Often captured only in the captions of figures
 
Hard to reproduce results and to refine data
products and visualizations
Provenance in Science–Freire & Silva
Moorea 2009
11
A CMOP Workflow: Visualizing Salmon
http://projectcroos.com/
CROOS
http://www.stccmop.org/
CMOP
Utah
 
Workflows are used for managing processes and
data, and to support collaboration
Provenance in Science–Freire & Silva
Moorea 2009
12
A CMOP Workflow: Visualizing Salmon
Enable new
discoveries!
Provenance in Science–Freire & Silva
Moorea 2009
13
CMOP: Why workflows+provenance?
 
Structured workflows can be more easily shared
than Perl or Python scripts
–  Workflow repository can be queried allowing users to find
useful workflows
–  Simplify the creation of new data products through re-use
 
 
Data provenance is systematically captured and
persisted in a database
Results can be reproduced
Provenance in Science–Freire & Silva
Moorea 2009
14
Working Memory (Preston & Silva)
 
Many psychological disorders are (partially) rooted
in deficiency in working memory performance
– 
– 
– 
– 
Schizophrenia
Obsessive-Compulsive Disorder
Bi-polar Disorder
Depression
Studies show working memory
resides in the pre-frontal cortex
 
Is it possible to improve working
memory?
Provenance in Science–Freire & Silva
Moorea 2009
15
Working Memory and rTMS: A Study
 
 
Show that directed Repetitive Transcranial Magnetic
Stimulation (rTMS) can change the performance of
working memory
Develop a new treatment for working memory
deficiencies
–  Non-invasive
–  Possible treatment for people unresponsive to
pharmacotherapy
Provenance in Science–Freire & Silva
Moorea 2009
16
rTMS Study: Making Sense of Data
 
Data is multi-modal
– 
– 
– 
– 
 
EEG Data – 64 sensors
MEG Data – 322 sensors
MRI Data – 256x256x209 volume
Genetic Data
Data is large
–  Approx 400 MB per subject
–  1400 participants
 
Complex analyses
–  Signal processing techniques
(EEG and MEG data)
–  Segmentation and projection
(MRI)
Provenance in Science–Freire & Silva
Moorea 2009
17
rTMS Study: Why use workflows?
1400 participants!
  Need detailed provenance
of results
 
–  Reproducibility and
verification
–  What if a scanner is found
to be defective?
 
 
Ability to re-use analyses
Data products are much
larger than raw data
Provenance in Science–Freire & Silva
Moorea 2009
18
rTMS Study: Preliminary Results
 
rTMS modulates working memory spectral dynamics
to improve performance
Provenance in Science–Freire & Silva
Moorea 2009
19
Issues in Provenance Management
Managing Provenance
 
 
 
Provenance Models: What to capture?
Capture Mechanisms: How to capture?
Storage and Querying
Provenance in Science–Freire & Silva
Moorea 2009
21
Provenance Models
[Clifford et al., CCPE 2008]
 
Retrospective provenance:
Execution log
–  Invocation records of run time environments and resources
used: site, host, executable, execution time, file stats ...
 
Prospective provenance:
Workflow definition
–  Recipes for how to produce data
 
Causality graph or derivation lineage
–  Relationships among data, programs and computations
 
Annotations: user-defined provenance
–  Metadata annotations about procedures and data
Provenance in Science–Freire & Silva
Moorea 2009
22
Visualization Workflow: An Example
head_iso.jpg
head_hist.jpg
Provenance in Science–Freire & Silva
Moorea 2009
23
Prospective versus Retrospective
Visualization Workflow
Generate
histogram
Visualize
isosurface
Provenance in Science–Freire & Silva
vtkStructuredPointsReader
Input: /home/juliana/examples/
data/head.120.vtk
Output: preader
Start:2006-08-19 13:02:45
End: 2006-08-19 13:03:22
User: juliana
vtkContourFilter
Input: preader
Param: contourValue = 57
Output: contour
Start:2006-08-19 13:04:25
End: 2006-08-19 13:05:12
User: juliana
…
Moorea 2009
24
Annotations
Visible
human
dataset!
head.120.vtk
usedBy
vtkStructuredPointsReader
generated
usedBy
preader
plot.wf
generated
head-hist.png
Provenance in Science–Freire & Silva
usedBy
Plot
histogram
for scalar
values!
vis.wf
generated
head-vis.png
Good
isosurface
showing !
bone
structure!
Moorea 2009
25
Provenance Models: Issues
 
 
 
 
 
What to capture?
A little provenance is better than no provenance…
But if you don’t capture what you will need, tough
luck!
How much is too much?
If and when can information be discarded?
Application dependent!
Provenance in Science–Freire & Silva
Moorea 2009
26
Capture Mechanisms
 
Embedded in execution environment:
–  Logging in a workflow system (VisTrails, Kepler,Taverna, Centralized
REDUX)
 
Operating system (PASS, ES3)
–  File read/write
 
Instrument tools/services (PASOA, Karma) Distributed
–  Application specific
 
Issues:
–  Provenance granularity
–  Capture overheads
Provenance in Science–Freire & Silva
no reliance
on workflow engine
Moorea 2009
27
Querying Provenace: Examples
 
 
 
 
 
 
 
What was the process used to create a data
product? Which algorithms were used?
Who created it?
Which data sets were used as input to create a data
product?
What were the intermediate results that contributed
to the derivation of a data product?
Which data products were derived from a given data
set?
Which data products were derived using a particular
algorithm?
What are the differences between two data
products?
Provenance in Science–Freire & Silva
Moorea 2009
28
Querying Provenance
head.120.vtk
 
Recursive queries
–  Derivation lineage
–  Data dependencies
Find the process that led to
resampled-head-vis.png
Which data sets contributed
to resampled-head-vis.png
 
resampled-head-vis.png
Provenance in Science–Freire & Silva
Graph patterns
Find all invocations of
vtkContourFilter with
isosurface value = 57 that
are preceded by resampling
Moorea 2009
29
Querying Provenance
 
Workflow difference
A user has created an isosurface visualization of the visible
human dataset. Her colleague modified the workflow to use
volume rendering instead. Find the differences between the
two workflows.
Differences
Volume rendering
Isosurface
Provenance in Science–Freire & Silva
Moorea 2009
30
Querying Provenance
 
Parameter settings
Find all different isosurface values used to generate visualizations
of the visible human head
Provenance in Science–Freire & Silva
Moorea 2009
31
Querying Provenance
 
Annotation
A user has annotated output
images using free text. Find all
images of whose annotations
mention “bone” that were
output by a vtkContourFilter
head-hist.png
head-vis.png
Provenance in Science–Freire & Silva
Good
isosurface
showing !
bone
structure!
Moorea 2009
32
VisTrails:
A Provenance Management
System
VisTrails: Managing Provenance
 
 
Comprehensive provenance infrastructure for
computational tasks
Focus on exploratory tasks such as simulations,
visualization and data mining
Data
Process
Data
Product
Specification
Data
Manipulation
Perception &
Cognition
Knowledge
Exploration
User
Figure modified from J. van Wijk, IEEE Vis 2005
Provenance in Science–Freire & Silva
Moorea 2009
34
VisTrails: Managing Provenance
 
 
 
Comprehensive provenance infrastructure for
computational tasks
Focus on exploratory tasks such as visualization and
data mining
Transparently tracks provenance of the discovery
process
–  The trail followed as users generate and test hypotheses
 
 
Use provenance to streamline exploration
Focus on usability—build tools for scientists
–  VisTrails manages the data, metadata and the exploration
process, scientists can focus on science!
 
 
Infrastructure can be combined with and enhance
these scientific workflow and visualization systems
VisTrails is open source: http://www.vistrails.org
Provenance in Science–Freire & Silva
Moorea 2009
35
Demo
Keeping Exploration Trails
Trail
Workflows
Data Products
!"#7"5'3"'5)84%02".*)+8)5
!"#-%&%5:5+2.;)51'23"0%2
!"#$%&'()*+,-+."-%(/%.0")1'23"0%2
!"#40)3)60.)1'23"0%2
!"#$%&'()*+,-+."9+//)5
!"#4"+'/"'+)5*%.3"16)75)+
!"#8%&%+9+731:)+2'3/".%3
!"#$%&'()45%/)5",
!"#*.)/)0.1)2'3/".%3
!"#<,)3=>$%&'()9)?"'+)@7,,)+AB
!"#$%&'()
!"#$%&'()*+%,)+"-
!"#*&73)
!"#4"+'/"'+)5*%.3"16)75)+
!"#-+()5+
!"#8%&%+9+731:)+2'3/".%3
!"#$%&'()
!"#<,)3=>$%&'()9)?"'+)@7,,)+AB
!"#*.)/)0.1)2'3/".%3
!"#*)28)5)5
!"#87()+7
$:<-)&&
!"#$%&'()*+%,)+"-
!"#6)35)+)+
!"#$%&'()
$9;8)&&
!"#*&73)
!"#4"+'/"'+)5*%.3"16)75)+
!"#87()+7
!"#8%&%+9+731:)+2'3/".%3
!"#6)35)+)+
!"#*.)/)0.1)2'3/".%3
!"#<,)3=>$%&'()9)?"'+)@7,,)+AB
$9;8)&&
!"#$%&'()*+%,)+"-
!"#$%&'()
!"#*&73)
!"#4"+'/"'+)5*%.3"16)75)+
!"#87()+7
!"#*&73)
!"#8%&%+9+731:)+2'3/".%3
!"#<,)3=>$%&'()9)?"'+)@7,,)+AB
!"#*.)/)0.1)2'3/".%3
!"#6)35)+)+
$9;8)&&
!"#$%&'()*+%,)+"-
!"#$%&'()
!"#87()+7
Provenance in Science–Freire & Silva
Moorea 2009
!"#6)35)+)+
$9;8)&&
37
Provenance Beyond Reproducibility
 
 
Support for reflective reasoning
Ability
to compare
data
products
“Reflective
reasoning
requires
the ability to store temporary
results, to make inferences from
stored knowledge, and to follow
chains of reasoning backward
and forward, sometimes
backtracking when a promising
line of thought proves to be
unfruitful. …the process is slow
and laborious”
Donald A. Norman
[Freire et al., IPAW 2006]
Provenance in Science–Freire & Silva
Moorea 2009
38
Provenance Beyond Reproducibility
 
 
 
Support for reflective reasoning
Ability to compare data products
Explore parameter spaces and compare results
[Freire et al., IPAW 2006]
Provenance in Science–Freire & Silva
Moorea 2009
39
Provenance Beyond Reproducibility
 
 
 
 
Support for reflective reasoning
Ability to compare data products
Explore parameter spaces and compare results
Support for collaboration
[Ellkvist et al., IPAW 2008]
Provenance in Science–Freire & Silva
Moorea 2009
40
Provenance Beyond Reproducibility
 
 
 
 
 
Support for reflective reasoning
Ability to compare data products
Explore parameter spaces and compare results
Support for collaboration
Streamline data analysis and visualization
Provenance in Science–Freire & Silva
Moorea 2009
41
Refining Analyses by Analogy
 
Leverage the wisdom of the crowds in shared
provenance
–  Some refinements are common, e.g., change the
rendering technique, publish image on the Web
 
Apply refinements by analogy, automatically
[Scheidegger et al, IEEE TVCG 2007]
Provenance in Science–Freire & Silva
Moorea 2009
42
Generating Visualizations by Analogy
[Scheidegger et al, IEEE TVCG 2007]
Provenance in Science–Freire & Silva
http://www.cs.utah.edu/~juliana/videos/Analogies.m4v
Moorea 2009
43
The Need for Guidance in
Workflow Design
Provenance in Science–Freire & Silva
Moorea 2009
44
VisComplete: A Workflow
Recommendation System
 
 
 
Mine provenance collection: Identify fragments that
co-occur in a collection of workflows
Predict sets of likely workflow additions to a given
partial workflow
Similar to a Web browser suggesting URL
completions
[Koop et al., IEEE Vis 2008]
Provenance in Science–Freire & Silva
Moorea 2009
45
VisComplete: A Workflow
Recommendation System
 
 
 
Mine provenance collection: Identify graph
fragments that co-occur in a collection of workflows
Predict sets of likely workflow additions to a given
partial workflow
Similar to a Web browser suggesting URL
completions
Provenance in Science–Freire & Silva
Moorea 2009
46
VisComplete: Demo
[Koop et al., IEEE Vis2008]
http://www.cs.utah.edu/~juliana/videos/viscomplete_h_264.mov
Provenance in Science–Freire & Silva
Moorea 2009
47
VisTrails: Some Applications
Cosmology-LANL
Environmental Simulations-STC CMOP
Psychiatry-U of Utah
High Energy Physics-Cornell
Provenance in Science–Freire & Silva
Moorea 2009
48
Emerging Applications
Research Directions
Provenance Enabling Tools
[Callahan et al., IPAW 2008]
Provenance in Science–Freire & Silva
Moorea 2009
50
is currently the six-time Grand Champion of the Tour de
France. It reports that the physiological factor most relevant to
performance improvement as he matured over the 7-yr period
from ages 21 to 28 yr was an 8% improvement in muscular
efficiency when cycling. This adaptation combined with relatively large reductions in body fat and thus body weight (e.g.,
78 –72 kg) during the months before the Tour de France
J Appl Physiol 98: 2191–2196, 2005.
contributed
to an impressive
18%doi:10.1152/japplphysiol.00216.2005.
improvement in his powerFirst published
March 17, 2005;
to-body weight ratio (i.e., W/kg) when cycling at a given V̇O2
(e.g., 5.0 l/min or !83% V̇O2 max). Remarkably, this individual
was able to display these achievements despite the fact that he
displayeddeveloped
as Tour
de France
matures
advanced
cancer at agechampion
25 yr and required
surgeries
and chemotherapy.
Scientific Publications and Provenance
Improved muscular efficiency
Edward F. Coyle
Human Performance Laboratory, Department of Kinesiology and
Health Education, The University of Texas at Austin, Austin, Texas
Submitted 22 February 2005; accepted in final form 10 March 2005
maximum oxygen uptake; blood lactate concentration
Provenance
MUCH HAS BEEN LEARNED about the physiological factors that
contribute to endurance performance ability by simply describing the characteristics of elite endurance athletes in sports such
as distance running, bicycle racing, and cross-country skiing.
The numerous physiological determinants of endurance have
been organized into a model that integrates such factors as
maximal oxygen uptake (V̇O2 max), the blood lactate threshold,
muscular efficiency, &
as Silva
these have been found to be the
inandScience–Freire
most important variables (7, 8, 15, 21). A common approach
has been to measure these physiological factors in a given
athlete at one point in time during their competitive career and
ages 21 to 28 y. Description of this person is noteworthy for
two reasons. First, he rose to become a six-time and present
Grand Champion of the Tour de France, and thus adaptations
relevant to this feat were identified. Remarkably, he accomplished this after developing and receiving treatment for advanced cancer. Therefore, this report is also important because
it provides insight, although limited, regarding the recovery of
“performance physiology” after successful treatment for adFig. 1. Mechanical efficiency when bicycling expressed as “gross efficiency”
vanced
cancer. The
of this
study
will be
to World
report
and
“delta efficiency”
overapproach
the 7-yr period
in this
individual.
WC,
results
from
standardized
laboratory
on this individual
Bicycle
Road
Racing
Championships,
1st and 4thtesting
place, respectively.
Tour de
France
1st,time
Grand
Champion
of the Tour de
in 199921.5,
–2004.22.0, 25.9,
at five
points
corresponding
toFrance
ages 21.1,
and 28.2 yr.
J Appl Physiol • VOL
Downloaded from jap.physiology.org on February 15, 2009
Coyle, Edward F. Improved muscular efficiency displayed as Tour
de France champion matures. J Appl Physiol 98: 2191–2196, 2005. First
published March 17, 2005;doi:10.1152/japplphysiol.00216.2005.—
This case describes the physiological maturation from ages 21 to 28 yr
of the bicyclist who has now become the six-time consecutive Grand
Champion of the Tour de France, at ages 27–32 yr. Maximal oxygen
uptake (V̇O2 max) in the trained state remained at !6 l/min, lean body
weight remained at !70 kg, and maximal heart rate declined from 207
to 200 beats/min. Blood lactate threshold was typical of competitive
cyclists in that it occurred at 76 – 85% V̇O2 max, yet maximal blood
lactate concentration was remarkably low in the trained state. It
appears that an 8% improvement in muscular efficiency and thus
power production when cycling at a given oxygen uptake (V̇O2) is the
characteristic that improved most as this athlete matured from ages 21
to 28 yr. It is noteworthy that at age 25 yr, this champion developed
advanced cancer, requiring surgeries and chemotherapy. During the
months leading up to each of his Tour de France victories, he reduced
body weight and body fat by 4 –7 kg (i.e., !7%). Therefore, over the
7-yr period, an improvement in muscular efficiency and reduced body
fat contributed equally to a remarkable 18% improvement in his
steady-state power per kilogram body weight when cycling at a given
V̇O2 (e.g., 5 l/min). It is hypothesized that the improved muscular
efficiency probably reflects changes in muscle myosin type stimulated
from years of training intensely for 3– 6 h on most days.
whom subsequently raced professio
period of 1989 –1995. The five-tim
Tour de France during the years 19
to possess a V̇O2 max of 6.4 l/min an
a body weight of 81 kg (28). La
subject in our study were not ma
France; however, with the cons
V̇O2 max was at least 6.1 l/min an
weight of 72 kg, we estimate his V̇
85 ml!kg"1 !min"1 during the per
Tour de France. Therefore, his V̇O
weight during his victories of 1999
what higher than what was reporte
1991–1995 and to be among the high
class runners and bicyclists (e.g., 80 –
16, 28, 29)
It is generally appreciated that in
success in endurance sports also req
for prolonged periods at a high per
as the ability to efficiently convert
muscular power and velocity (5, 7,
blood LT (e.g., 1 mM increase in bl
in absolute terms or as a percentag
reasonably good predictor of aero
that a given rate of ATP turnover c
21), and prediction is strengthened
ment of muscle capillary density i
Capillary density is thought to be
muscle’s ability to clear fatiguing m
muscle fibers into the circulation,
98 • JUNE 2005 •
METHODS
General testing sequence. On reporting to the laboratory, training,
racing, and medical histories were obtained, body weight was measured ("0.1 kg), and the following tests were performed after informed consent was obtained, with procedures approved by the
Internal Review Board of The University of Texas at Austin. Mechanical efficiency and the blood lactate threshold (LT) were determined as the subject bicycled a stationary ergometer for 25 min, with
work rate increasing progressively every 5 min over a range of 50, 60,
70, 80, and 90% V̇O2 max. After a 10- to 20-min period of active
recovery, V̇O2 max when cycling was measured. Thereafter, body
composition was determined by hydrostatic weighing and/or analysis
of skin-fold thickness (34, 35).
Measurement of V̇O2 max. The same Monark ergometer (model 819)
equipped with a racing seat and drop handlebars and pedals for
cycling shoes was used for all cycle testing, and seat height and saddle
position were held constant. The pedal’s crank length was 170 mm.
V̇O2 max was measured during continuous cycling lasting between 8
and 12 min, with work rate increasing every 2 min. A leveling off of
oxygen uptake (V̇O2) always occurred, and this individual cycled until
exhaustion at a final power output that was 10 –20% higher than the
Moorea
minimal power output needed to elicit V̇O2 max. A venous
blood
sample was obtained 3– 4 min after exhaustion for determination of
blood lactate concentration after maximal exercise, as described
below. The subject breathed through a Daniels valve; expired gases
www.jap.org
2009
51
is currently the six-time Grand Champion of the Tour de
France. It reports that the physiological factor most relevant to
performance improvement as he matured over the 7-yr period
from ages 21 to 28 yr was an 8% improvement in muscular
efficiency when cycling. This adaptation combined with relatively large reductions in body fat and thus body weight (e.g.,
78 –72 kg) during the months before the Tour de France
J Appl Physiol 98: 2191–2196, 2005.
contributed
to an impressive
18%doi:10.1152/japplphysiol.00216.2005.
improvement in his powerFirst published
March 17, 2005;
to-body weight ratio (i.e., W/kg) when cycling at a given V̇O2
(e.g., 5.0 l/min or !83% V̇O2 max). Remarkably, this individual
was able to display these achievements despite the fact that he
displayeddeveloped
as Tour
de France
matures
advanced
cancer at agechampion
25 yr and required
surgeries
and chemotherapy.
Scientific Publications and Provenance
Improved muscular efficiency
Edward F. Coyle
Human Performance Laboratory, Department of Kinesiology and
Health Education, The University of Texas at Austin, Austin, Texas
whom subsequently raced professio
period of 1989 –1995. The five-tim
Tour de France during the years 19
to possess a V̇O2 max of 6.4 l/min an
a body weight of 81 kg (28). La
subject in our study were not ma
France; however, with the cons
V̇O2 max was at least 6.1 l/min an
weight of 72 kg, we estimate his V̇
85 ml!kg"1 !min"1 during the per
Tour de France. Therefore, his V̇O
weight during his victories of 1999
what higher than what was reporte
1991–1995 and to be among the high
class runners and bicyclists (e.g., 80 –
16, 28, 29)
It is generally appreciated that in
success in endurance sports also req
for prolonged periods at a high per
as the ability to efficiently convert
muscular power and velocity (5, 7,
blood LT (e.g., 1 mM increase in bl
in absolute terms or as a percentag
reasonably good predictor of aero
that a given rate of ATP turnover c
21), and prediction is strengthened
ment of muscle capillary density i
Capillary density is thought to be
muscle’s ability to clear fatiguing m
muscle fibers into the circulation,
Submitted 22 February 2005; accepted in final form 10 March 2005
O2 max
O2 max
power production when cycling at a given oxygen uptake (V̇O2) is the
characteristic that improved most as this athlete matured from ages 21
to 28 yr. It is noteworthy that at age 25 yr, this champion developed
advanced cancer, requiring surgeries and chemotherapy. During the
months leading up to each of his Tour de France victories, he reduced
body weight and body fat by 4 –7 kg (i.e., !7%). Therefore, over the
7-yr period, an improvement in muscular efficiency and reduced body
fat contributed equally to a remarkable 18% improvement in his
steady-state power per kilogram body weight when cycling at a given
V̇O2 (e.g., 5 l/min). It is hypothesized that the improved muscular
efficiency probably reflects changes in muscle myosin type stimulated
from years of training intensely for 3– 6 h on most days.
Fig. 1. Mechanical efficiency when bicycling expressed as “gross efficiency”
and “delta efficiency” over the 7-yr period in this individual. WC, World
Bicycle Road Racing Championships, 1st and 4th place, respectively. Tour de
France 1st, Grand Champion of the Tour de France in 1999 –2004.
and 28.2 yr.
J Appl Physiol • VOL
Downloaded from jap.physiology.org on February 15, 2009
"raw data from the January 1993 test that revealed several
additional
published
methodology.
Coyle
Coyle, Edward F. deviations
Improved muscular efficiencyfrom
displayed as the
Tour ages
21 to 28 y. Description
of this person is noteworthy for
de France champion matures. J Appl Physiol 98: 2191–2196, 2005. First two reasons. First, he rose to become a six-time and present
published March 17, 2005;doi:10.1152/japplphysiol.00216.2005.— Grand Champion of the Tour de France, and thus adaptations
used
20-min
ergometer
(not
25 min), including 2This case a
describes
the physiological
maturation from ages 21 to protocol
28 yr relevant to this
feat were identified. Remarkably, he accomof the bicyclist who has now become the six-time consecutive Grand
plished this after developing and receiving treatment for adChampion
of the Tour stages
de France, at ages 27–32
yr. Maximalrespiratory
oxygen
and
3-min
where
exchange ratios (RER)
) in the trained state remained at !6 l/min, lean body vanced cancer. Therefore, this report is also important because
uptake (V̇
it
provides
insight,
although limited, regarding the recovery of
weight remained at !70 kg, and maximal heart rate declined from 207
“performance
physiology”
successful
adexceeded
1.00.
AnwasRER
>1.00 invalidatesafteruse
oftreatment
theforLusk
to 200 beats/min. Blood
lactate threshold
typical of competitive
, yet maximal blood vanced cancer. The approach of this study will be to report
cyclists in that it occurred at 76 – 85% V̇
lactate concentration was remarkably low in the trained state. It results from standardized laboratory testing on this individual
equations
(5) toin estimate
expenditure.”
appears that an 8% improvement
muscular efficiency andenergy
thus at five time
points corresponding to ages 21.1, 21.5, 22.0, 25.9,
98 • JUNE 2005 •
METHODS
www.jap.org
General testing sequence.
On reporting are
to the laboratory,
training,
”…all of the published delta efficiency
values
wrong.
…
racing, and medical histories were obtained, body weight was measured ("0.1 kg), and the following tests were performed after inthere exists no credible evidence
to was
support
Coyle's
formed consent
obtained, with procedures
approved by the
Internal Review Board of The University of Texas at Austin. Mechanical efficiencyefficiency
and the blood lactate threshold
(LT) were deterconclusion that Armstrong's muscle
improved."
mined as the subject bicycled a stationary ergometer for 25 min, with
maximum oxygen uptake; blood lactate concentration
work rate increasing progressively every 5 min over a range of 50, 60,
70, 80, and 90% V̇O2 max. After a 10- to 20-min period of active
recovery, V̇O2 max when cycling was measured. Thereafter, body
composition was determined by hydrostatic weighing and/or analysis
of skin-fold thickness (34, 35).
Measurement of V̇O2 max. The same Monark ergometer (model 819)
equipped with a racing seat and drop handlebars and pedals for
cycling shoes was used for all cycle testing, and seat height and saddle
position were held constant. The pedal’s crank length was 170 mm.
V̇O2 max was measured during continuous cycling lasting between 8
and 12 min, with work rate increasing every 2 min. A leveling off of
oxygen uptake (V̇O2) always occurred, and this individual cycled until
exhaustion at a final power output that was 10 –20% higher than the
Moorea
minimal power output needed to elicit V̇O2 max. A venous
blood
sample was obtained 3– 4 min after exhaustion for determination of
blood lactate concentration after maximal exercise, as described
below. The subject breathed through a Daniels valve; expired gases
http://jap.physiology.org/cgi/content/full/105/3/1020
Provenance
MUCH HAS BEEN LEARNED about the physiological factors that
contribute to endurance performance ability by simply describing the characteristics of elite endurance athletes in sports such
as distance running, bicycle racing, and cross-country skiing.
The numerous physiological determinants of endurance have
been organized into a model that integrates such factors as
maximal oxygen uptake (V̇O2 max), the blood lactate threshold,
muscular efficiency, &
as Silva
these have been found to be the
inandScience–Freire
most important variables (7, 8, 15, 21). A common approach
has been to measure these physiological factors in a given
athlete at one point in time during their competitive career and
2009
52
Provenance-Rich Publications
 
 
Bridge the gap between the scientific process and
publications
Results that can be reproduced and validated
–  Papers with deep captions
–  Encouraged by ACM SIGMOD and a number of journals
 
 
Describe more of the discovery process: people only
describe successes, can we learn from mistakes?
Dynamic (interactive) publications
–  Evolve over time
–  Blog/wiki like=> Science 2.0
–  E.g., http://project.liquidpub.org
 
Need tools to support this!
Provenance in Science–Freire & Silva
Moorea 2009
53
The Provenance-Enabled Paper
http://www.cs.utah.edu/~juliana/videos/vistrails_pdf.mov
Provenance in Science–Freire & Silva
Moorea 2009
54
Science Mashups
Provenance in Science–Freire & Silva
Moorea 2009
55
Provenance and Teaching (1)
 
Leverage provenance to improve the way we teach
CS and Science
–  http://www.vistrails.org/index.php/SciVisFall2008
–  Also used at UNC, Linkoping, UTEP
–  Lecture provenance: student can reproduce results
Provenance in Science–Freire & Silva
Moorea 2009
56
Provenance and Teaching (2)
 
Homework provenance provides insights regarding
–  Task complexity and nature: number of actions; structural vs.
parameter changes; task duration
–  Student confusion: large branching factor=lots of trial and error
steps
 
Very detailed (and honest!) feedback: instructors can
leverage this information
[Lins et al., SSDBM 2008]
Provenance in Science–Freire & Silva
Moorea 2009
57
Provenance and Teaching (3)
 
Homework provenance helps students and instructors to
collaborate
–  Student is stuck, sends his provenance
–  Instructor understands studentʼs problem, provides hints---student
can see what instructor did!
–  They can also collaborate in real time [Ellkvist et al., IPAW 2008]
Provenance in Science–Freire & Silva
Moorea 2009
58
Using Provenance to
Teach Electronic Media
[Langefeld and Kessler,
Submitted 2009]
“[...] The students have gotten to the point where
they demand the VisTrails files for every
demonstration just after I complete [it]”
“[...] students used [a vistrail instead of a reference
model] 62% of the time”
Provenance in Science–Freire & Silva
Moorea 2009
59
Using Provenance to
Teach Electronic Media
http://www.cs.utah.edu/~juliana/videos/maya_playback_slow.mov
Provenance in Science–Freire & Silva
Moorea 2009
60
Using Provenance to
Teach Electronic Media
[Langefeld and Kessler,
Submitted 2009]
“[...] The students have gotten to the point where
they demand the VisTrails files for every
demonstration just after I complete [it]”
“[...] students used [a vistrail instead of a reference
model] 62% of the time”
“Students who used provenance produced higherquality models”
Provenance in Science–Freire & Silva
Moorea 2009
61
Science 2.0
 
Web 2.0 technologies has opened up new opportunities
to improve collaboration and information sharing in
science [Shneiderman, Science 2008; Waldrop, Scientific
American 2008]
Provenance in Science–Freire & Silva
Moorea 2009
62
Science 2.0: Challenges
 
 
 
 
 
Open Science and skepticism: Can we trust that the
information is accurate? If unpublished results are posted
on a wiki, can we prevent others from stealing that work?
Need provenance: determine authorship, enforce
intellectual property rights, validate the integrity of
artifacts and assess their quality, and to reproduce the
artifact
Social Data Analysis: Share data and processes
Add to information overload!
Finding and making sense of information
–  Google is not enough!
–  Need focused search and structured queries
–  Integrate data on the fly
 
Preserving data
Provenance in Science–Freire & Silva
Moorea 2009
63
Conclusions and Future Work
 
Provenance management is essential for
computational science
–  Provenance can be used to support reflective reasoning
–  Intuitive interfaces for simplifying the construction and
refinement of workflows
 
Science 2.0: Sharing provenance at a large scale
creates new opportunities [Freire and Silva, CHI SDA, 2008]
–  Workflow/provenance repositories; provenance-enabled
publications
–  Expose scientists to different techniques and tools
–  Scientists can learn by example; expedite their scientific
training; and potentially reduce their time to insight
 
Provenance + Workflows + Sharing have the
potential to revolutionize science!
Provenance in Science–Freire & Silva
Moorea 2009
64
Conclusions and Future Work
 
 
 
 
Provenance management is essential for
computational science
Science 2.0: Sharing provenance at a large scale
creates new opportunities [Freire and Silva, CHI SDA, 2008]
Provenance + Workflows + Sharing have the
potential to revolutionize science
Need infrastructure and tools
–  Many challenges and several open computer science
questions
Provenance in Science–Freire & Silva
Moorea 2009
65
Acknowledgments
 
 
Thanks to VisTrails group
This work is partially supported by the National
Science Foundation, the Department of Energy, an
IBM Faculty Award, and a University of Utah Seed
Grant.
Provenance in Science–Freire & Silva
Moorea 2009
66
Thank you
Download