Provenance in Science: Challenges and Opportunities Juliana Freire Claudio Silva http://www.cs.utah.edu/~juliana http://www.cs.utah.edu/~juliana What is Provenance? Oxford English Dictionary: (i) the fact of coming from some particular source or quarter; origin, derivation. (ii) the history or pedigree of a work of art, manuscript, rare book, etc.; concretely, a record of the ultimate derivation and passage of an item through its various owners. Apple Dictionary: Origin, source, place of origin; birthplace, fount, roots, pedigree, derivation, root, etymology; formal radix. Provenance in Science–Freire & Silva Moorea 2009 2 Provenance in Art Rembrandt van Rijn Self-Portrait, 1659 Andrew W. Mellon Collection 1937.1.72 Provenance in Science–Freire & Silva Moorea 2009 3 Provenance in Science Provenance is as (or more!) important as the result Not a new issue Lab notebooks have been used for a long time What is new? When – Large volumes of data – Complex analyses— computational processes Writing notes is no longer an option… Annotation Observed data DNA recombination By Lederberg Provenance in Science–Freire & Silva Moorea 2009 4 Digital Data and Provenance Emp John Susan Provenance in Science–Freire & Silva Dept D01 D02 Mgr Mary Ken Moorea 2009 5 Uses of Computational Provenance anon4876_zspace_20060331.jpg anon4877_zspace_20060331.jpg anon4877_lesion_20060401.jpg Reproducibility Data quality Attribution Informational How were these images created? Was any pre-processing applied to the raw data? What’s the difference? Who created them? Are they really from the same patient? Provenance in Science–Freire & Silva Moorea 2009 6 Provenance and Workflows Input: anon4877_CT_scan !"#7"5'3"'5)84%02".*)+8)5 usedBy usedBy !"#-%&%5:5+2.;)51'23"0%2 !"#$%&'()*+,-+."-%(/%.0")1'23"0%2 !"#7"5'3"'5)84%02".*)+8)5 !"#*&73) !"#4"+'/"'+)5*%.3"16)75)+ !"#-%&%5:5+2.;)51'23"0%2 !"#$%&'()*+,-+."-%(/%.0")1'23"0%2 !"#*&73) !"#8%&%+9+731:)+2'3/".%3 !"#40)3)60.)1'23"0%2 !"#40)3)60.)1'23"0%2 !"#<,)3=>$%&'()9)?"'+)@7,,)+AB !"#$%&'()*+,-+."9+//)5 !"#*.)/)0.1)2'3/".%3 !"#$%&'()*+,-+."9+//)5 !"#$%&'()45%/)5", workflow !"#$%&'()*+%,)+"- !"#$%&'() !"#$%&'() workflow !"#$%&'()45%/)5", !"#-+()5+ !"#87()+7 !"#6)35)+)+ !"#*)28)5)5 $9;8)&& $:<-)&& generated !"#$%&'() generated !"#-+()5+ !"#*)28)5)5 Output: anon4877_zspace_20060331.jpg Provenance in Science–Freire & Silva Output: anon4877_lesion_20060401.jpg $:<-)&& Moorea 2009 7 Motivating Applications The CMOP Project (Baptista et al.) Animations by Claudio Silva Center for Coastal Margin Observation & Prediction (CMOP) Enable river-to-ocean observation of physical and ecological processes Create accurate computational models Further understanding of these processes to manage, operate, and sustain coastal resources and ecosystems effectively http://www.ccalmr.ogi.edu/CORIE Provenance in Science–Freire & Silva Moorea 2009 9 CMOP: Making Sense of Data Observation and modeling of multiple systems at multiple scales Very large number of data products, sensor measurements, and results from numerical models Cover more than 10 years of experiments: occupy over 30 TB of storage To analyze data, need to integrate data and tools from different disciplines – Biologists, chemists, oceanographers, computer scientists Provenance in Science–Freire & Silva Moorea 2009 10 CMOP: Issues Heterogeneous tools and data Custom-built scripts by several staff members – Scripts to run simulations; to select data; to create visualizations Creating new data products is a labor intensive and time-consuming task, and also error prone – Finding and running scripts is a task that can only be performed by their creators – Multiple people must collaborate Data products are generated in an ad-hoc manner – Data provenance is not captured in a persistent way – Often captured only in the captions of figures Hard to reproduce results and to refine data products and visualizations Provenance in Science–Freire & Silva Moorea 2009 11 A CMOP Workflow: Visualizing Salmon http://projectcroos.com/ CROOS http://www.stccmop.org/ CMOP Utah Workflows are used for managing processes and data, and to support collaboration Provenance in Science–Freire & Silva Moorea 2009 12 A CMOP Workflow: Visualizing Salmon Enable new discoveries! Provenance in Science–Freire & Silva Moorea 2009 13 CMOP: Why workflows+provenance? Structured workflows can be more easily shared than Perl or Python scripts – Workflow repository can be queried allowing users to find useful workflows – Simplify the creation of new data products through re-use Data provenance is systematically captured and persisted in a database Results can be reproduced Provenance in Science–Freire & Silva Moorea 2009 14 Working Memory (Preston & Silva) Many psychological disorders are (partially) rooted in deficiency in working memory performance – – – – Schizophrenia Obsessive-Compulsive Disorder Bi-polar Disorder Depression Studies show working memory resides in the pre-frontal cortex Is it possible to improve working memory? Provenance in Science–Freire & Silva Moorea 2009 15 Working Memory and rTMS: A Study Show that directed Repetitive Transcranial Magnetic Stimulation (rTMS) can change the performance of working memory Develop a new treatment for working memory deficiencies – Non-invasive – Possible treatment for people unresponsive to pharmacotherapy Provenance in Science–Freire & Silva Moorea 2009 16 rTMS Study: Making Sense of Data Data is multi-modal – – – – EEG Data – 64 sensors MEG Data – 322 sensors MRI Data – 256x256x209 volume Genetic Data Data is large – Approx 400 MB per subject – 1400 participants Complex analyses – Signal processing techniques (EEG and MEG data) – Segmentation and projection (MRI) Provenance in Science–Freire & Silva Moorea 2009 17 rTMS Study: Why use workflows? 1400 participants! Need detailed provenance of results – Reproducibility and verification – What if a scanner is found to be defective? Ability to re-use analyses Data products are much larger than raw data Provenance in Science–Freire & Silva Moorea 2009 18 rTMS Study: Preliminary Results rTMS modulates working memory spectral dynamics to improve performance Provenance in Science–Freire & Silva Moorea 2009 19 Issues in Provenance Management Managing Provenance Provenance Models: What to capture? Capture Mechanisms: How to capture? Storage and Querying Provenance in Science–Freire & Silva Moorea 2009 21 Provenance Models [Clifford et al., CCPE 2008] Retrospective provenance: Execution log – Invocation records of run time environments and resources used: site, host, executable, execution time, file stats ... Prospective provenance: Workflow definition – Recipes for how to produce data Causality graph or derivation lineage – Relationships among data, programs and computations Annotations: user-defined provenance – Metadata annotations about procedures and data Provenance in Science–Freire & Silva Moorea 2009 22 Visualization Workflow: An Example head_iso.jpg head_hist.jpg Provenance in Science–Freire & Silva Moorea 2009 23 Prospective versus Retrospective Visualization Workflow Generate histogram Visualize isosurface Provenance in Science–Freire & Silva vtkStructuredPointsReader Input: /home/juliana/examples/ data/head.120.vtk Output: preader Start:2006-08-19 13:02:45 End: 2006-08-19 13:03:22 User: juliana vtkContourFilter Input: preader Param: contourValue = 57 Output: contour Start:2006-08-19 13:04:25 End: 2006-08-19 13:05:12 User: juliana … Moorea 2009 24 Annotations Visible human dataset! head.120.vtk usedBy vtkStructuredPointsReader generated usedBy preader plot.wf generated head-hist.png Provenance in Science–Freire & Silva usedBy Plot histogram for scalar values! vis.wf generated head-vis.png Good isosurface showing ! bone structure! Moorea 2009 25 Provenance Models: Issues What to capture? A little provenance is better than no provenance… But if you don’t capture what you will need, tough luck! How much is too much? If and when can information be discarded? Application dependent! Provenance in Science–Freire & Silva Moorea 2009 26 Capture Mechanisms Embedded in execution environment: – Logging in a workflow system (VisTrails, Kepler,Taverna, Centralized REDUX) Operating system (PASS, ES3) – File read/write Instrument tools/services (PASOA, Karma) Distributed – Application specific Issues: – Provenance granularity – Capture overheads Provenance in Science–Freire & Silva no reliance on workflow engine Moorea 2009 27 Querying Provenace: Examples What was the process used to create a data product? Which algorithms were used? Who created it? Which data sets were used as input to create a data product? What were the intermediate results that contributed to the derivation of a data product? Which data products were derived from a given data set? Which data products were derived using a particular algorithm? What are the differences between two data products? Provenance in Science–Freire & Silva Moorea 2009 28 Querying Provenance head.120.vtk Recursive queries – Derivation lineage – Data dependencies Find the process that led to resampled-head-vis.png Which data sets contributed to resampled-head-vis.png resampled-head-vis.png Provenance in Science–Freire & Silva Graph patterns Find all invocations of vtkContourFilter with isosurface value = 57 that are preceded by resampling Moorea 2009 29 Querying Provenance Workflow difference A user has created an isosurface visualization of the visible human dataset. Her colleague modified the workflow to use volume rendering instead. Find the differences between the two workflows. Differences Volume rendering Isosurface Provenance in Science–Freire & Silva Moorea 2009 30 Querying Provenance Parameter settings Find all different isosurface values used to generate visualizations of the visible human head Provenance in Science–Freire & Silva Moorea 2009 31 Querying Provenance Annotation A user has annotated output images using free text. Find all images of whose annotations mention “bone” that were output by a vtkContourFilter head-hist.png head-vis.png Provenance in Science–Freire & Silva Good isosurface showing ! bone structure! Moorea 2009 32 VisTrails: A Provenance Management System VisTrails: Managing Provenance Comprehensive provenance infrastructure for computational tasks Focus on exploratory tasks such as simulations, visualization and data mining Data Process Data Product Specification Data Manipulation Perception & Cognition Knowledge Exploration User Figure modified from J. van Wijk, IEEE Vis 2005 Provenance in Science–Freire & Silva Moorea 2009 34 VisTrails: Managing Provenance Comprehensive provenance infrastructure for computational tasks Focus on exploratory tasks such as visualization and data mining Transparently tracks provenance of the discovery process – The trail followed as users generate and test hypotheses Use provenance to streamline exploration Focus on usability—build tools for scientists – VisTrails manages the data, metadata and the exploration process, scientists can focus on science! Infrastructure can be combined with and enhance these scientific workflow and visualization systems VisTrails is open source: http://www.vistrails.org Provenance in Science–Freire & Silva Moorea 2009 35 Demo Keeping Exploration Trails Trail Workflows Data Products !"#7"5'3"'5)84%02".*)+8)5 !"#-%&%5:5+2.;)51'23"0%2 !"#$%&'()*+,-+."-%(/%.0")1'23"0%2 !"#40)3)60.)1'23"0%2 !"#$%&'()*+,-+."9+//)5 !"#4"+'/"'+)5*%.3"16)75)+ !"#8%&%+9+731:)+2'3/".%3 !"#$%&'()45%/)5", !"#*.)/)0.1)2'3/".%3 !"#<,)3=>$%&'()9)?"'+)@7,,)+AB !"#$%&'() !"#$%&'()*+%,)+"- !"#*&73) !"#4"+'/"'+)5*%.3"16)75)+ !"#-+()5+ !"#8%&%+9+731:)+2'3/".%3 !"#$%&'() !"#<,)3=>$%&'()9)?"'+)@7,,)+AB !"#*.)/)0.1)2'3/".%3 !"#*)28)5)5 !"#87()+7 $:<-)&& !"#$%&'()*+%,)+"- !"#6)35)+)+ !"#$%&'() $9;8)&& !"#*&73) !"#4"+'/"'+)5*%.3"16)75)+ !"#87()+7 !"#8%&%+9+731:)+2'3/".%3 !"#6)35)+)+ !"#*.)/)0.1)2'3/".%3 !"#<,)3=>$%&'()9)?"'+)@7,,)+AB $9;8)&& !"#$%&'()*+%,)+"- !"#$%&'() !"#*&73) !"#4"+'/"'+)5*%.3"16)75)+ !"#87()+7 !"#*&73) !"#8%&%+9+731:)+2'3/".%3 !"#<,)3=>$%&'()9)?"'+)@7,,)+AB !"#*.)/)0.1)2'3/".%3 !"#6)35)+)+ $9;8)&& !"#$%&'()*+%,)+"- !"#$%&'() !"#87()+7 Provenance in Science–Freire & Silva Moorea 2009 !"#6)35)+)+ $9;8)&& 37 Provenance Beyond Reproducibility Support for reflective reasoning Ability to compare data products “Reflective reasoning requires the ability to store temporary results, to make inferences from stored knowledge, and to follow chains of reasoning backward and forward, sometimes backtracking when a promising line of thought proves to be unfruitful. …the process is slow and laborious” Donald A. Norman [Freire et al., IPAW 2006] Provenance in Science–Freire & Silva Moorea 2009 38 Provenance Beyond Reproducibility Support for reflective reasoning Ability to compare data products Explore parameter spaces and compare results [Freire et al., IPAW 2006] Provenance in Science–Freire & Silva Moorea 2009 39 Provenance Beyond Reproducibility Support for reflective reasoning Ability to compare data products Explore parameter spaces and compare results Support for collaboration [Ellkvist et al., IPAW 2008] Provenance in Science–Freire & Silva Moorea 2009 40 Provenance Beyond Reproducibility Support for reflective reasoning Ability to compare data products Explore parameter spaces and compare results Support for collaboration Streamline data analysis and visualization Provenance in Science–Freire & Silva Moorea 2009 41 Refining Analyses by Analogy Leverage the wisdom of the crowds in shared provenance – Some refinements are common, e.g., change the rendering technique, publish image on the Web Apply refinements by analogy, automatically [Scheidegger et al, IEEE TVCG 2007] Provenance in Science–Freire & Silva Moorea 2009 42 Generating Visualizations by Analogy [Scheidegger et al, IEEE TVCG 2007] Provenance in Science–Freire & Silva http://www.cs.utah.edu/~juliana/videos/Analogies.m4v Moorea 2009 43 The Need for Guidance in Workflow Design Provenance in Science–Freire & Silva Moorea 2009 44 VisComplete: A Workflow Recommendation System Mine provenance collection: Identify fragments that co-occur in a collection of workflows Predict sets of likely workflow additions to a given partial workflow Similar to a Web browser suggesting URL completions [Koop et al., IEEE Vis 2008] Provenance in Science–Freire & Silva Moorea 2009 45 VisComplete: A Workflow Recommendation System Mine provenance collection: Identify graph fragments that co-occur in a collection of workflows Predict sets of likely workflow additions to a given partial workflow Similar to a Web browser suggesting URL completions Provenance in Science–Freire & Silva Moorea 2009 46 VisComplete: Demo [Koop et al., IEEE Vis2008] http://www.cs.utah.edu/~juliana/videos/viscomplete_h_264.mov Provenance in Science–Freire & Silva Moorea 2009 47 VisTrails: Some Applications Cosmology-LANL Environmental Simulations-STC CMOP Psychiatry-U of Utah High Energy Physics-Cornell Provenance in Science–Freire & Silva Moorea 2009 48 Emerging Applications Research Directions Provenance Enabling Tools [Callahan et al., IPAW 2008] Provenance in Science–Freire & Silva Moorea 2009 50 is currently the six-time Grand Champion of the Tour de France. It reports that the physiological factor most relevant to performance improvement as he matured over the 7-yr period from ages 21 to 28 yr was an 8% improvement in muscular efficiency when cycling. This adaptation combined with relatively large reductions in body fat and thus body weight (e.g., 78 –72 kg) during the months before the Tour de France J Appl Physiol 98: 2191–2196, 2005. contributed to an impressive 18%doi:10.1152/japplphysiol.00216.2005. improvement in his powerFirst published March 17, 2005; to-body weight ratio (i.e., W/kg) when cycling at a given V̇O2 (e.g., 5.0 l/min or !83% V̇O2 max). Remarkably, this individual was able to display these achievements despite the fact that he displayeddeveloped as Tour de France matures advanced cancer at agechampion 25 yr and required surgeries and chemotherapy. Scientific Publications and Provenance Improved muscular efficiency Edward F. Coyle Human Performance Laboratory, Department of Kinesiology and Health Education, The University of Texas at Austin, Austin, Texas Submitted 22 February 2005; accepted in final form 10 March 2005 maximum oxygen uptake; blood lactate concentration Provenance MUCH HAS BEEN LEARNED about the physiological factors that contribute to endurance performance ability by simply describing the characteristics of elite endurance athletes in sports such as distance running, bicycle racing, and cross-country skiing. The numerous physiological determinants of endurance have been organized into a model that integrates such factors as maximal oxygen uptake (V̇O2 max), the blood lactate threshold, muscular efficiency, & as Silva these have been found to be the inandScience–Freire most important variables (7, 8, 15, 21). A common approach has been to measure these physiological factors in a given athlete at one point in time during their competitive career and ages 21 to 28 y. Description of this person is noteworthy for two reasons. First, he rose to become a six-time and present Grand Champion of the Tour de France, and thus adaptations relevant to this feat were identified. Remarkably, he accomplished this after developing and receiving treatment for advanced cancer. Therefore, this report is also important because it provides insight, although limited, regarding the recovery of “performance physiology” after successful treatment for adFig. 1. Mechanical efficiency when bicycling expressed as “gross efficiency” vanced cancer. The of this study will be to World report and “delta efficiency” overapproach the 7-yr period in this individual. WC, results from standardized laboratory on this individual Bicycle Road Racing Championships, 1st and 4thtesting place, respectively. Tour de France 1st,time Grand Champion of the Tour de in 199921.5, –2004.22.0, 25.9, at five points corresponding toFrance ages 21.1, and 28.2 yr. J Appl Physiol • VOL Downloaded from jap.physiology.org on February 15, 2009 Coyle, Edward F. Improved muscular efficiency displayed as Tour de France champion matures. J Appl Physiol 98: 2191–2196, 2005. First published March 17, 2005;doi:10.1152/japplphysiol.00216.2005.— This case describes the physiological maturation from ages 21 to 28 yr of the bicyclist who has now become the six-time consecutive Grand Champion of the Tour de France, at ages 27–32 yr. Maximal oxygen uptake (V̇O2 max) in the trained state remained at !6 l/min, lean body weight remained at !70 kg, and maximal heart rate declined from 207 to 200 beats/min. Blood lactate threshold was typical of competitive cyclists in that it occurred at 76 – 85% V̇O2 max, yet maximal blood lactate concentration was remarkably low in the trained state. It appears that an 8% improvement in muscular efficiency and thus power production when cycling at a given oxygen uptake (V̇O2) is the characteristic that improved most as this athlete matured from ages 21 to 28 yr. It is noteworthy that at age 25 yr, this champion developed advanced cancer, requiring surgeries and chemotherapy. During the months leading up to each of his Tour de France victories, he reduced body weight and body fat by 4 –7 kg (i.e., !7%). Therefore, over the 7-yr period, an improvement in muscular efficiency and reduced body fat contributed equally to a remarkable 18% improvement in his steady-state power per kilogram body weight when cycling at a given V̇O2 (e.g., 5 l/min). It is hypothesized that the improved muscular efficiency probably reflects changes in muscle myosin type stimulated from years of training intensely for 3– 6 h on most days. whom subsequently raced professio period of 1989 –1995. The five-tim Tour de France during the years 19 to possess a V̇O2 max of 6.4 l/min an a body weight of 81 kg (28). La subject in our study were not ma France; however, with the cons V̇O2 max was at least 6.1 l/min an weight of 72 kg, we estimate his V̇ 85 ml!kg"1 !min"1 during the per Tour de France. Therefore, his V̇O weight during his victories of 1999 what higher than what was reporte 1991–1995 and to be among the high class runners and bicyclists (e.g., 80 – 16, 28, 29) It is generally appreciated that in success in endurance sports also req for prolonged periods at a high per as the ability to efficiently convert muscular power and velocity (5, 7, blood LT (e.g., 1 mM increase in bl in absolute terms or as a percentag reasonably good predictor of aero that a given rate of ATP turnover c 21), and prediction is strengthened ment of muscle capillary density i Capillary density is thought to be muscle’s ability to clear fatiguing m muscle fibers into the circulation, 98 • JUNE 2005 • METHODS General testing sequence. On reporting to the laboratory, training, racing, and medical histories were obtained, body weight was measured ("0.1 kg), and the following tests were performed after informed consent was obtained, with procedures approved by the Internal Review Board of The University of Texas at Austin. Mechanical efficiency and the blood lactate threshold (LT) were determined as the subject bicycled a stationary ergometer for 25 min, with work rate increasing progressively every 5 min over a range of 50, 60, 70, 80, and 90% V̇O2 max. After a 10- to 20-min period of active recovery, V̇O2 max when cycling was measured. Thereafter, body composition was determined by hydrostatic weighing and/or analysis of skin-fold thickness (34, 35). Measurement of V̇O2 max. The same Monark ergometer (model 819) equipped with a racing seat and drop handlebars and pedals for cycling shoes was used for all cycle testing, and seat height and saddle position were held constant. The pedal’s crank length was 170 mm. V̇O2 max was measured during continuous cycling lasting between 8 and 12 min, with work rate increasing every 2 min. A leveling off of oxygen uptake (V̇O2) always occurred, and this individual cycled until exhaustion at a final power output that was 10 –20% higher than the Moorea minimal power output needed to elicit V̇O2 max. A venous blood sample was obtained 3– 4 min after exhaustion for determination of blood lactate concentration after maximal exercise, as described below. The subject breathed through a Daniels valve; expired gases www.jap.org 2009 51 is currently the six-time Grand Champion of the Tour de France. It reports that the physiological factor most relevant to performance improvement as he matured over the 7-yr period from ages 21 to 28 yr was an 8% improvement in muscular efficiency when cycling. This adaptation combined with relatively large reductions in body fat and thus body weight (e.g., 78 –72 kg) during the months before the Tour de France J Appl Physiol 98: 2191–2196, 2005. contributed to an impressive 18%doi:10.1152/japplphysiol.00216.2005. improvement in his powerFirst published March 17, 2005; to-body weight ratio (i.e., W/kg) when cycling at a given V̇O2 (e.g., 5.0 l/min or !83% V̇O2 max). Remarkably, this individual was able to display these achievements despite the fact that he displayeddeveloped as Tour de France matures advanced cancer at agechampion 25 yr and required surgeries and chemotherapy. Scientific Publications and Provenance Improved muscular efficiency Edward F. Coyle Human Performance Laboratory, Department of Kinesiology and Health Education, The University of Texas at Austin, Austin, Texas whom subsequently raced professio period of 1989 –1995. The five-tim Tour de France during the years 19 to possess a V̇O2 max of 6.4 l/min an a body weight of 81 kg (28). La subject in our study were not ma France; however, with the cons V̇O2 max was at least 6.1 l/min an weight of 72 kg, we estimate his V̇ 85 ml!kg"1 !min"1 during the per Tour de France. Therefore, his V̇O weight during his victories of 1999 what higher than what was reporte 1991–1995 and to be among the high class runners and bicyclists (e.g., 80 – 16, 28, 29) It is generally appreciated that in success in endurance sports also req for prolonged periods at a high per as the ability to efficiently convert muscular power and velocity (5, 7, blood LT (e.g., 1 mM increase in bl in absolute terms or as a percentag reasonably good predictor of aero that a given rate of ATP turnover c 21), and prediction is strengthened ment of muscle capillary density i Capillary density is thought to be muscle’s ability to clear fatiguing m muscle fibers into the circulation, Submitted 22 February 2005; accepted in final form 10 March 2005 O2 max O2 max power production when cycling at a given oxygen uptake (V̇O2) is the characteristic that improved most as this athlete matured from ages 21 to 28 yr. It is noteworthy that at age 25 yr, this champion developed advanced cancer, requiring surgeries and chemotherapy. During the months leading up to each of his Tour de France victories, he reduced body weight and body fat by 4 –7 kg (i.e., !7%). Therefore, over the 7-yr period, an improvement in muscular efficiency and reduced body fat contributed equally to a remarkable 18% improvement in his steady-state power per kilogram body weight when cycling at a given V̇O2 (e.g., 5 l/min). It is hypothesized that the improved muscular efficiency probably reflects changes in muscle myosin type stimulated from years of training intensely for 3– 6 h on most days. Fig. 1. Mechanical efficiency when bicycling expressed as “gross efficiency” and “delta efficiency” over the 7-yr period in this individual. WC, World Bicycle Road Racing Championships, 1st and 4th place, respectively. Tour de France 1st, Grand Champion of the Tour de France in 1999 –2004. and 28.2 yr. J Appl Physiol • VOL Downloaded from jap.physiology.org on February 15, 2009 "raw data from the January 1993 test that revealed several additional published methodology. Coyle Coyle, Edward F. deviations Improved muscular efficiencyfrom displayed as the Tour ages 21 to 28 y. Description of this person is noteworthy for de France champion matures. J Appl Physiol 98: 2191–2196, 2005. First two reasons. First, he rose to become a six-time and present published March 17, 2005;doi:10.1152/japplphysiol.00216.2005.— Grand Champion of the Tour de France, and thus adaptations used 20-min ergometer (not 25 min), including 2This case a describes the physiological maturation from ages 21 to protocol 28 yr relevant to this feat were identified. Remarkably, he accomof the bicyclist who has now become the six-time consecutive Grand plished this after developing and receiving treatment for adChampion of the Tour stages de France, at ages 27–32 yr. Maximalrespiratory oxygen and 3-min where exchange ratios (RER) ) in the trained state remained at !6 l/min, lean body vanced cancer. Therefore, this report is also important because uptake (V̇ it provides insight, although limited, regarding the recovery of weight remained at !70 kg, and maximal heart rate declined from 207 “performance physiology” successful adexceeded 1.00. AnwasRER >1.00 invalidatesafteruse oftreatment theforLusk to 200 beats/min. Blood lactate threshold typical of competitive , yet maximal blood vanced cancer. The approach of this study will be to report cyclists in that it occurred at 76 – 85% V̇ lactate concentration was remarkably low in the trained state. It results from standardized laboratory testing on this individual equations (5) toin estimate expenditure.” appears that an 8% improvement muscular efficiency andenergy thus at five time points corresponding to ages 21.1, 21.5, 22.0, 25.9, 98 • JUNE 2005 • METHODS www.jap.org General testing sequence. On reporting are to the laboratory, training, ”…all of the published delta efficiency values wrong. … racing, and medical histories were obtained, body weight was measured ("0.1 kg), and the following tests were performed after inthere exists no credible evidence to was support Coyle's formed consent obtained, with procedures approved by the Internal Review Board of The University of Texas at Austin. Mechanical efficiencyefficiency and the blood lactate threshold (LT) were deterconclusion that Armstrong's muscle improved." mined as the subject bicycled a stationary ergometer for 25 min, with maximum oxygen uptake; blood lactate concentration work rate increasing progressively every 5 min over a range of 50, 60, 70, 80, and 90% V̇O2 max. After a 10- to 20-min period of active recovery, V̇O2 max when cycling was measured. Thereafter, body composition was determined by hydrostatic weighing and/or analysis of skin-fold thickness (34, 35). Measurement of V̇O2 max. The same Monark ergometer (model 819) equipped with a racing seat and drop handlebars and pedals for cycling shoes was used for all cycle testing, and seat height and saddle position were held constant. The pedal’s crank length was 170 mm. V̇O2 max was measured during continuous cycling lasting between 8 and 12 min, with work rate increasing every 2 min. A leveling off of oxygen uptake (V̇O2) always occurred, and this individual cycled until exhaustion at a final power output that was 10 –20% higher than the Moorea minimal power output needed to elicit V̇O2 max. A venous blood sample was obtained 3– 4 min after exhaustion for determination of blood lactate concentration after maximal exercise, as described below. The subject breathed through a Daniels valve; expired gases http://jap.physiology.org/cgi/content/full/105/3/1020 Provenance MUCH HAS BEEN LEARNED about the physiological factors that contribute to endurance performance ability by simply describing the characteristics of elite endurance athletes in sports such as distance running, bicycle racing, and cross-country skiing. The numerous physiological determinants of endurance have been organized into a model that integrates such factors as maximal oxygen uptake (V̇O2 max), the blood lactate threshold, muscular efficiency, & as Silva these have been found to be the inandScience–Freire most important variables (7, 8, 15, 21). A common approach has been to measure these physiological factors in a given athlete at one point in time during their competitive career and 2009 52 Provenance-Rich Publications Bridge the gap between the scientific process and publications Results that can be reproduced and validated – Papers with deep captions – Encouraged by ACM SIGMOD and a number of journals Describe more of the discovery process: people only describe successes, can we learn from mistakes? Dynamic (interactive) publications – Evolve over time – Blog/wiki like=> Science 2.0 – E.g., http://project.liquidpub.org Need tools to support this! Provenance in Science–Freire & Silva Moorea 2009 53 The Provenance-Enabled Paper http://www.cs.utah.edu/~juliana/videos/vistrails_pdf.mov Provenance in Science–Freire & Silva Moorea 2009 54 Science Mashups Provenance in Science–Freire & Silva Moorea 2009 55 Provenance and Teaching (1) Leverage provenance to improve the way we teach CS and Science – http://www.vistrails.org/index.php/SciVisFall2008 – Also used at UNC, Linkoping, UTEP – Lecture provenance: student can reproduce results Provenance in Science–Freire & Silva Moorea 2009 56 Provenance and Teaching (2) Homework provenance provides insights regarding – Task complexity and nature: number of actions; structural vs. parameter changes; task duration – Student confusion: large branching factor=lots of trial and error steps Very detailed (and honest!) feedback: instructors can leverage this information [Lins et al., SSDBM 2008] Provenance in Science–Freire & Silva Moorea 2009 57 Provenance and Teaching (3) Homework provenance helps students and instructors to collaborate – Student is stuck, sends his provenance – Instructor understands studentʼs problem, provides hints---student can see what instructor did! – They can also collaborate in real time [Ellkvist et al., IPAW 2008] Provenance in Science–Freire & Silva Moorea 2009 58 Using Provenance to Teach Electronic Media [Langefeld and Kessler, Submitted 2009] “[...] The students have gotten to the point where they demand the VisTrails files for every demonstration just after I complete [it]” “[...] students used [a vistrail instead of a reference model] 62% of the time” Provenance in Science–Freire & Silva Moorea 2009 59 Using Provenance to Teach Electronic Media http://www.cs.utah.edu/~juliana/videos/maya_playback_slow.mov Provenance in Science–Freire & Silva Moorea 2009 60 Using Provenance to Teach Electronic Media [Langefeld and Kessler, Submitted 2009] “[...] The students have gotten to the point where they demand the VisTrails files for every demonstration just after I complete [it]” “[...] students used [a vistrail instead of a reference model] 62% of the time” “Students who used provenance produced higherquality models” Provenance in Science–Freire & Silva Moorea 2009 61 Science 2.0 Web 2.0 technologies has opened up new opportunities to improve collaboration and information sharing in science [Shneiderman, Science 2008; Waldrop, Scientific American 2008] Provenance in Science–Freire & Silva Moorea 2009 62 Science 2.0: Challenges Open Science and skepticism: Can we trust that the information is accurate? If unpublished results are posted on a wiki, can we prevent others from stealing that work? Need provenance: determine authorship, enforce intellectual property rights, validate the integrity of artifacts and assess their quality, and to reproduce the artifact Social Data Analysis: Share data and processes Add to information overload! Finding and making sense of information – Google is not enough! – Need focused search and structured queries – Integrate data on the fly Preserving data Provenance in Science–Freire & Silva Moorea 2009 63 Conclusions and Future Work Provenance management is essential for computational science – Provenance can be used to support reflective reasoning – Intuitive interfaces for simplifying the construction and refinement of workflows Science 2.0: Sharing provenance at a large scale creates new opportunities [Freire and Silva, CHI SDA, 2008] – Workflow/provenance repositories; provenance-enabled publications – Expose scientists to different techniques and tools – Scientists can learn by example; expedite their scientific training; and potentially reduce their time to insight Provenance + Workflows + Sharing have the potential to revolutionize science! Provenance in Science–Freire & Silva Moorea 2009 64 Conclusions and Future Work Provenance management is essential for computational science Science 2.0: Sharing provenance at a large scale creates new opportunities [Freire and Silva, CHI SDA, 2008] Provenance + Workflows + Sharing have the potential to revolutionize science Need infrastructure and tools – Many challenges and several open computer science questions Provenance in Science–Freire & Silva Moorea 2009 65 Acknowledgments Thanks to VisTrails group This work is partially supported by the National Science Foundation, the Department of Energy, an IBM Faculty Award, and a University of Utah Seed Grant. Provenance in Science–Freire & Silva Moorea 2009 66 Thank you