Provenance Management: Challenges and Opportunities Juliana Freire http://www.cs.utah.edu/~juliana What is Provenance? Oxford English Dictionary: (i) the fact of coming from some particular source or quarter; origin, derivation. (ii) the history or pedigree of a work of art, manuscript, rare book, etc.; concretely, a record of the ultimate derivation and passage of an item through its various owners. Apple Dictionary: Origin, source, place of origin; birthplace, fount, roots, pedigree, derivation, root, etymology; formal radix. Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 2 Provenance in Art Rembrandt van Rijn Self-Portrait, 1659 Andrew W. Mellon Collection 1937.1.72 Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 3 Provenance in Business Corporate and accounting scandals, e.g., Enron, Tyco, WorldCom Decline of public trust in accounting and reporting practices Sarbanes-Oxley Act of 2002 requires detailed provenance (auditing), e.g., http://en.wikipedia.org/wiki/Sarbanes-Oxley_Act – Understand the flow of transactions ….to identify points at which a misstatement could arise Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 4 Provenance in Health Care Need to keep track of patients’ provenance Verify compliance to protocols Mine/analyze this information to make informed decisions – E.g., assess the effectiveness of procedures and drugs for large populations – "Evidence-based medicine is the conscientious, explicit and judicious use of current best evidence in making decisions about the care of the individual patient.” [D. Sacket, 1996] http://www.hsl.unc.edu/Services/Tutorials/EBM/ Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 5 Provenance in Science Provenance is as (or more!) important as the result Not a new issue Lab notebooks have been used for a long time What is new? When – Large volumes of data – Complex analyses Writing notes is no longer an option… Computational provenance Provenance Management – Juliana Freire Annotation Observed data DNA recombination By Lederberg BTW 2009 – Münster, Germany 6 Provenance for Digital Data Emp John Susan Provenance Management – Juliana Freire Dept D01 D02 Mgr Mary Ken BTW 2009 – Münster, Germany 7 Uses of Computational Provenance anon4876_zspace_20060331.jpg anon4877_zspace_20060331.jpg anon4877_lesion_20060401.jpg Reproducibility Data quality Attribution Informational How were these images created? Was any pre-processing applied to the raw data? What’s the difference? Who created them? Are they really from the same patient? Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 8 Computational Provenance anon4877_CT_scan usedBy usedBy !"#7"5'3"'5)84%02".*)+8)5 !"#*&73) !"#4"+'/"'+)5*%.3"16)75)+ !"#-%&%5:5+2.;)51'23"0%2 !"#$%&'()*+,-+."-%(/%.0")1'23"0%2 !"#*&73) !"#8%&%+9+731:)+2'3/".%3 !"#40)3)60.)1'23"0%2 !"#<,)3=>$%&'()9)?"'+)@7,,)+AB !"#$%&'()*+,-+."9+//)5 !"#*.)/)0.1)2'3/".%3 !"#$%&'()45%/)5", !"#$%&'()*+%,)+"- !"#$%&'() !"#$%&'() !"#-+()5+ !"#87()+7 !"#6)35)+)+ !"#*)28)5)5 Causality graph $9;8)&& $:<-)&& generated anon4877_zspace_20060331.jpg Provenance Management – Juliana Freire generated anon4877_lesion_20060401.jpg BTW 2009 – Münster, Germany 9 Uses of Computational Provenance anon4877_CT_scan usedBy usedBy !"#7"5'3"'5)84%02".*)+8)5 !"#*&73) !"#4"+'/"'+)5*%.3"16)75)+ !"#-%&%5:5+2.;)51'23"0%2 !"#$%&'()*+,-+."-%(/%.0")1'23"0%2 !"#*&73) !"#8%&%+9+731:)+2'3/".%3 !"#40)3)60.)1'23"0%2 !"#<,)3=>$%&'()9)?"'+)@7,,)+AB !"#$%&'()*+,-+."9+//)5 !"#*.)/)0.1)2'3/".%3 !"#$%&'()45%/)5", !"#$%&'()*+%,)+"- !"#$%&'() !"#$%&'() !"#-+()5+ !"#87()+7 !"#6)35)+)+ !"#*)28)5)5 Causality graph $9;8)&& $:<-)&& generated anon4877_zspace_20060331.jpg Provenance Management – Juliana Freire generated anon4877_lesion_20060401.jpg BTW 2009 – Münster, Germany 10 Provenance Management: Desiderata Need reliable, accurate and transparent mechanisms to capture provenance information Need appropriate model to represent information Need efficient storage and querying Need to integrate provenance from multiple sources Need usable tools Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 11 Outline Provenance management for data exploration and visualization Querying and re-using provenance Emerging applications and research directions Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 12 Provenance Management for Data Exploration and Visualization Data Exploration and Visualization A picture is worth 1,000 words… And it might take 1,000 steps to create! Data Process Data Product Specification Data Manipulation Perception & Cognition Knowledge Exploration User Figure modified from J. van Wijk, IEEE Vis 2005 Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 14 Data Exploration and Visualization raw data:CT scan workflow !"#7"5'3"'5)84%02".*)+8)5 !"#4"+'/"'+)5*%.3"16)75)+ !"#*&73) !"#4"+'/"'+)5*%.3"16)75)+ !"#*&73) !"#8%&%+9+731:)+2'3/".%3 !"#4"+'/"'+)5*%.3"16)75)+ !"#4"+'/"'+)5*%.3"16)75)+ !"#-%&%5:5+2.;)51'23"0%2 !"#8%&%+9+731:)+2'3/".%3 !"#*&73) !"#$%&'()*+,-+."-%(/%.0")1'23"0%2 !"#*&73) !"#*.)/)0.1)2'3/".%3 !"#8%&%+9+731:)+2'3/".%3 !"#8%&%+9+731:)+2'3/".%3 !"#*.)/)0.1)2'3/".%3 !"#<,)3=>$%&'()9)?"'+)@7,,)+AB !"#40)3)60.)1'23"0%2 !"#<,)3=>$%&'()9)?"'+)@7,,)+AB !"#<,)3=>$%&'()9)?"'+)@7,,)+AB !"#<,)3=>$%&'()9)?"'+)@7,,)+AB !"#$%&'()*+,-+."9+//)5 !"#*.)/)0.1)2'3/".%3 !"#*.)/)0.1)2'3/".%3 !"#$%&'()*+%,)+"!"#$%&'()*+%,)+"!"#$%&'()*+%,)+"!"#$%&'()45%/)5", !"#$%&'()*+%,)+"- !"#$%&'() !"#$%&'() !"#$%&'() !"#$%&'() !"#$%&'() !"#87()+7 !"#87()+7 !"#-+()5+ !"#87()+7 !"#87()+7 !"#6)35)+)+ !"#6)35)+)+ !"#6)35)+)+ !"#6)35)+)+ !"#*)28)5)5 $9;8)&& $9;8)&& $9;8)&& $9;8)&& $:<-)&& Files (workflow specifications) anon4877_voxel_scale_1_zspace_20060331.srn anon4877_textureshading_20060331.srn anon4877_textureshading_plane0_20060331.srn anon4877_goodxferfunction_20060331.srn anon4877_lesion_20060331.srn Provenance Management – Juliana Freire Notes Initial visualization withAdded z-scaling texture corrected! andAdded shading! plane to visualize Found good internal transfer structure! Identified function! lesion tissue! BTW 2009 – Münster, Germany 15 Data Exploration and Visualization: Issues Provenance is maintained manually through filenaming conventions and detailed notes – A time-consuming, error-prone process Hard to understand the exploratory process and relationships among workflows Hard to further explore the data, e.g., locate relevant data products/workflows and modify them Hard to collaborate, and work is likely to be lost if creator leaves Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 16 Exploration and Creativity Support Reflective reasoning is key in the exploratory processes “Reflective reasoning requires the ability to store temporary results, to make inferences from stored knowledge, and to follow chains of reasoning backward and forward, sometimes backtracking when a promising line of thought proves to be unfruitful. …the process is slow and laborious” Donald A. Norman Need external aids—tools to facilitate this process – Creativity support tools [Shneiderman, CACM 2002] Need aid from people – Support for collaboration Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 17 VisTrails: Managing Provenance Support for exploratory tasks such as visualization and data mining – Task specification iteratively refined as users generate and test hypotheses Comprehensive provenance infrastructure for computational tasks – Data + workflow provenance – Treat workflow as a 1st-class data product VisTrails manages the data, metadata and the exploration process, scientists can focus on science! Infrastructure can be combined with and enhance these scientific workflow and visualization systems Focus on usability—build tools for scientists http://www.vistrails.org Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 18 Keeping Exploration Trails Trail Workflows Data Products !"#7"5'3"'5)84%02".*)+8)5 !"#-%&%5:5+2.;)51'23"0%2 !"#$%&'()*+,-+."-%(/%.0")1'23"0%2 !"#40)3)60.)1'23"0%2 !"#$%&'()*+,-+."9+//)5 !"#4"+'/"'+)5*%.3"16)75)+ !"#8%&%+9+731:)+2'3/".%3 !"#$%&'()45%/)5", !"#*.)/)0.1)2'3/".%3 !"#<,)3=>$%&'()9)?"'+)@7,,)+AB !"#$%&'() !"#$%&'()*+%,)+"- !"#*&73) !"#4"+'/"'+)5*%.3"16)75)+ !"#-+()5+ !"#8%&%+9+731:)+2'3/".%3 !"#$%&'() !"#<,)3=>$%&'()9)?"'+)@7,,)+AB !"#*.)/)0.1)2'3/".%3 !"#*)28)5)5 !"#87()+7 $:<-)&& !"#$%&'()*+%,)+"- !"#6)35)+)+ !"#$%&'() $9;8)&& !"#*&73) !"#4"+'/"'+)5*%.3"16)75)+ !"#87()+7 !"#8%&%+9+731:)+2'3/".%3 !"#6)35)+)+ !"#*.)/)0.1)2'3/".%3 !"#<,)3=>$%&'()9)?"'+)@7,,)+AB $9;8)&& !"#$%&'()*+%,)+"- !"#$%&'() !"#*&73) !"#4"+'/"'+)5*%.3"16)75)+ !"#87()+7 !"#*&73) !"#8%&%+9+731:)+2'3/".%3 !"#<,)3=>$%&'()9)?"'+)@7,,)+AB !"#*.)/)0.1)2'3/".%3 !"#6)35)+)+ $9;8)&& !"#$%&'()*+%,)+"- !"#$%&'() !"#87()+7 Provenance Management – Juliana Freire BTW 2009 – Münster, Germany !"#6)35)+)+ $9;8)&& 19 Keeping Exploration Trails Trail Notes Initial visualization with z-scaling corrected! Added texture and shading! User juliana eranders Added plane to visualize internal structure! Provenance Management – Juliana Freire eranders Found good transfer function! eranders Identified lesion tissue! stevec BTW 2009 – Münster, Germany 20 Short Demo: The Change-Based Provenance Model Change-Based Provenance Records user actions Provenance = changes to computational tasks – Add a module, add a connection, change a parameter value Extensible change algebra addModule deleteConnection addConnection addConnection setParameter Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 22 Change-Based Provenance Records user actions Provenance = save changes to computational tasks – Add a module, add a connection, change a parameter value Extensible change algebra A vistrail node vt corresponds to the workflow that is constructed by the sequence of actions from the root to vt vt = xn ◦ xn-1 ◦ … ◦ x1 ◦ Ø vistrail x1 x2 x3 [Freire et al, IPAW 2006] Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 23 Exploring the Change Space Scripting workflows: Parameter explorations are simple to specify and apply Exploration of parameter space for a workflow vt (setParameter(idn,valuen) ◦ … ◦ (setParameter(id1,value1) ◦ vt ) Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 24 Exploring the Change Space Scripting workflows: Parameter explorations are simple to specify and apply Exploration of parameter space for a workflow vt (setParameter(idn,valuen) ◦ … ◦ (setParameter(id1,value1) ◦ vt ) Exploration of multiple workflow specifications (addModule(idi,…) ◦ (deleteModule(idi) ◦ v1 ) … (addModule(idi,…) ◦ (deleteModule(idi) ◦ vn ) Results can be conveniently compared in the VisTrails spreadsheet Can create animations too! Caching to avoid redundant computations [Bavoil et al., IEEE Vis 2005] Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 25 Computing Workflow Differences No need to compute subgraph isomorphism! A vistrail is a rooted tree: all nodes have a common ancestor—diffs are welldefined and simple to compute vt1 = xi ◦ xi-1 ◦ … ◦ x1 ◦ Ø vt2 = xj ◦ xj-1 ◦ … ◦ x1 ◦ Ø vt1-vt2 = {xi, xi-1, …, x1, Ø} – {xj, xj-1, …,x1 , Ø} Different semantics: – Exact, based on ids – Approximate, based on module signatures Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 26 Collaborative Exploration Collaboration is key to data exploration – Translational, integrative approaches to science Store provenance information in a database Synchronize concurrent updates through locking – Real-time collaboration [Ellkvist et al., IPAW 2008] Asynchronous access: similar to version control systems – Check out, work offline, synchronize – Users exchange patches Synchronization is simple—provenance is monotonic No need for a central repository—support for distributed collaboration – For details see Callahan et al, SCI Institute Technical Report, No. UUSCI-2006-016 2006 Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 27 Change-Based Provenance: Summary General: Works with any system that has undo/redo! Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 28 Provenance Enabling Tools [Callahan et al., IPAW 2008] Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 29 Change-Based Provenance: Summary General: Works with any system that has undo/redo! Concise representation Uniformly captures data and workflow provenance – Data provenance: where does a specific data product come from? – Workflow evolution: how has workflow structure changed over time? Results can be reproduced Detailed information about the exploration process Provenance beyond reproducibility: – Support for reflective reasoning: Scientists can return to any point in the exploration space, compare results side-by-side – Support for collaboration Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 30 Querying and Re-Using Provenance Querying and Re-Using Provenance Storing detailed information is important, but not sufficient Need appropriate user interface and operations to leverage information Understand and re-use the history Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 32 Querying Provenace: Examples Find examples: – Find workflows that process a particular type of file – Find workflows that output a particular data product – Find workflows that contain a given module or sequence of modules Explore provenance: – What was the process used to create a data product? Which algorithms were used? – Who created it? – Which data sets were used as input to create a data product? What were the intermediate results? – Which data products were derived from a given data set? – What are the differences between two data products? Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 33 Query from REDUX, Microsoft Querying Provenance Provenance and workflows are represented as DAGs, hard to express queries using text Sample query from Provenance Challenge: – Find all invocations of procedure align_warp using parameter “model” set to “12” that ran on a Monday. New provenance query language wf{*}: x where x.module = AlignWarp and x.parameter('model') = '12' and (log{x}: y where y.dayOfWeek = 'Monday') Workflow Evolution Workflow Execution For details see http://twiki.gridprovenance.org/bin/view/Challenge/VisTrails Much simpler than other approaches which use SQL, SparQL, Prolog…. But who is going to write those queries? Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 34 Querying Provenance by Example Specify queries visually as graphs Querying provenance by example [Scheidegger et al., TVCG 2007; Beeri et al., VLDB 2006] – WYSIWYQ -- What You See Is What You Query – Interface to query is the same used to create workflow Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 35 Refining Workflows Complex analyses are hard to put together – Programming expertise – Domain knowledge – Familiarity with different tools Provenance Management – Juliana Freire Steep learning curve BTW 2009 – Münster, Germany 36 Refining Workflows by Analogy Leverage the wisdom of the crowds in shared provenance – Some workflow refinements are common, e.g., change the rendering technique, publish image on the Web Apply refinements by analogy, automatically [Scheidegger et al, IEEE TVCG 2007] Source Target Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 37 Refining Workflows by Analogy: Example http://www.cs.utah.edu/~juliana/videos/Analogies.m4v Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 38 Refining Workflows by Analogy is to as Provenance Management – Juliana Freire is to ? BTW 2009 – Münster, Germany 39 Refining Workflows by Analogy A as 1. Compute difference: ∆(A,B) – Just like a patch! – But… D = ∆(A,B) ◦ C may not be a valid workflow 2. B is to C is to D A C Find correspondences between A and C: map(A,C) – Diffuse similarity scores across the product graph AxC using Eigenvalue decompositions 3. 4. Compute mapped difference ∆AC(A,B) =map(A,C) ∆(A,B) D = ∆AC(A,B) ◦ C Details in [Scheidegger et al, IEEE TVCG 2007] Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 40 The Analogy Operation Allows workflows to be refined without requiring users to directly modify the specification Basis for scalable updates Analogies are not foolproof – If it works, great. If it doesn't, it may help – User can edit and fix the new version Improve by – Using domain knowledge – Learning from user feedback Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 41 The Need for Guidance in Workflow Design Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 42 VisComplete: A Workflow Recommendation System Mine provenance collection: Identify graph fragments that co-occur in a collection of workflows Predict sets of likely workflow additions to a given partial workflow Similar to a Web browser suggesting URL completions [Koop et al., IEEE Vis 2008] Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 43 VisComplete: A Workflow Recommendation System Mine provenance collection: Identify graph fragments that co-occur in a collection of workflows Predict sets of likely workflow additions to a given partial workflow Similar to a Web browser suggesting URL completions Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 44 VisComplete: Demo [Koop et al., IEEE Vis2008] http://www.cs.utah.edu/~juliana/videos/viscomplete_h_264.mov Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 45 VisComplete: Results Summary Eliminates over 50% of actions Selected completions are almost always in the first four suggestions A database of simple pipelines can aid users constructing more complex pipelines See [Koop et al., TVCG 2008] for details on how the path database is constructed and on the completion algorithm Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 46 Emerging Applications Research Directions is currently the six-time Grand Champion of the Tour de France. It reports that the physiological factor most relevant to performance improvement as he matured over the 7-yr period from ages 21 to 28 yr was an 8% improvement in muscular efficiency when cycling. This adaptation combined with relatively large reductions in body fat and thus body weight (e.g., 78 –72 kg) during the months before the Tour de France J Appl Physiol 98: 2191–2196, 2005. contributed to an impressive 18%doi:10.1152/japplphysiol.00216.2005. improvement in his powerFirst published March 17, 2005; to-body weight ratio (i.e., W/kg) when cycling at a given V̇O2 (e.g., 5.0 l/min or !83% V̇O2 max). Remarkably, this individual was able to display these achievements despite the fact that he displayeddeveloped as Tour de France matures advanced cancer at agechampion 25 yr and required surgeries and chemotherapy. Scientific Publications and Provenance Improved muscular efficiency Edward F. Coyle Human Performance Laboratory, Department of Kinesiology and Health Education, The University of Texas at Austin, Austin, Texas Submitted 22 February 2005; accepted in final form 10 March 2005 maximum oxygen uptake; blood lactate concentration Provenance MUCH HAS BEEN LEARNED about the physiological factors that contribute to endurance performance ability by simply describing the characteristics of elite endurance athletes in sports such as distance running, bicycle racing, and cross-country skiing. The numerous physiological determinants of endurance have been organized into a model that integrates such factors as maximal oxygen uptake (V̇O2 max), the blood lactate threshold, and muscular efficiency, as theseFreire have been found to be the Management – Juliana most important variables (7, 8, 15, 21). A common approach has been to measure these physiological factors in a given athlete at one point in time during their competitive career and ages 21 to 28 y. Description of this person is noteworthy for two reasons. First, he rose to become a six-time and present Grand Champion of the Tour de France, and thus adaptations relevant to this feat were identified. Remarkably, he accomplished this after developing and receiving treatment for advanced cancer. Therefore, this report is also important because it provides insight, although limited, regarding the recovery of “performance physiology” after successful treatment for adFig. 1. Mechanical efficiency when bicycling expressed as “gross efficiency” vanced cancer. The of this study will be to World report and “delta efficiency” overapproach the 7-yr period in this individual. WC, results from standardized laboratory on this individual Bicycle Road Racing Championships, 1st and 4thtesting place, respectively. Tour de France 1st,time Grand Champion of the Tour de in 199921.5, –2004.22.0, 25.9, at five points corresponding toFrance ages 21.1, and 28.2 yr. J Appl Physiol • VOL Downloaded from jap.physiology.org on February 15, 2009 Coyle, Edward F. Improved muscular efficiency displayed as Tour de France champion matures. J Appl Physiol 98: 2191–2196, 2005. First published March 17, 2005;doi:10.1152/japplphysiol.00216.2005.— This case describes the physiological maturation from ages 21 to 28 yr of the bicyclist who has now become the six-time consecutive Grand Champion of the Tour de France, at ages 27–32 yr. Maximal oxygen uptake (V̇O2 max) in the trained state remained at !6 l/min, lean body weight remained at !70 kg, and maximal heart rate declined from 207 to 200 beats/min. Blood lactate threshold was typical of competitive cyclists in that it occurred at 76 – 85% V̇O2 max, yet maximal blood lactate concentration was remarkably low in the trained state. It appears that an 8% improvement in muscular efficiency and thus power production when cycling at a given oxygen uptake (V̇O2) is the characteristic that improved most as this athlete matured from ages 21 to 28 yr. It is noteworthy that at age 25 yr, this champion developed advanced cancer, requiring surgeries and chemotherapy. During the months leading up to each of his Tour de France victories, he reduced body weight and body fat by 4 –7 kg (i.e., !7%). Therefore, over the 7-yr period, an improvement in muscular efficiency and reduced body fat contributed equally to a remarkable 18% improvement in his steady-state power per kilogram body weight when cycling at a given V̇O2 (e.g., 5 l/min). It is hypothesized that the improved muscular efficiency probably reflects changes in muscle myosin type stimulated from years of training intensely for 3– 6 h on most days. whom subsequently raced professio period of 1989 –1995. The five-tim Tour de France during the years 19 to possess a V̇O2 max of 6.4 l/min an a body weight of 81 kg (28). La subject in our study were not ma France; however, with the cons V̇O2 max was at least 6.1 l/min an weight of 72 kg, we estimate his V̇ 85 ml!kg"1 !min"1 during the per Tour de France. Therefore, his V̇O weight during his victories of 1999 what higher than what was reporte 1991–1995 and to be among the high class runners and bicyclists (e.g., 80 – 16, 28, 29) It is generally appreciated that in success in endurance sports also req for prolonged periods at a high per as the ability to efficiently convert muscular power and velocity (5, 7, blood LT (e.g., 1 mM increase in bl in absolute terms or as a percentag reasonably good predictor of aero that a given rate of ATP turnover c 21), and prediction is strengthened ment of muscle capillary density i Capillary density is thought to be muscle’s ability to clear fatiguing m muscle fibers into the circulation, 98 • JUNE 2005 • METHODS General testing sequence. On reporting to the laboratory, training, racing, and medical histories were obtained, body weight was measured ("0.1 kg), and the following tests were performed after informed consent was obtained, with procedures approved by the Internal Review Board of The University of Texas at Austin. Mechanical efficiency and the blood lactate threshold (LT) were determined as the subject bicycled a stationary ergometer for 25 min, with work rate increasing progressively every 5 min over a range of 50, 60, 70, 80, and 90% V̇O2 max. After a 10- to 20-min period of active recovery, V̇O2 max when cycling was measured. Thereafter, body composition was determined by hydrostatic weighing and/or analysis of skin-fold thickness (34, 35). Measurement of V̇O2 max. The same Monark ergometer (model 819) equipped with a racing seat and drop handlebars and pedals for cycling shoes was used for all cycle testing, and seat height and saddle position were held constant. The pedal’s crank length was 170 mm. V̇O2 max was measured during continuous cycling lasting between 8 and 12 min, with work rate increasing every 2 min. A leveling off of oxygen uptake (V̇O2) always occurred, and this individual cycled until exhaustion at a final power output that was 10 –20% higher than the – Münster, minimal power output needed to BTW elicit V̇O2009 2 max. A venous blood sample was obtained 3– 4 min after exhaustion for determination of blood lactate concentration after maximal exercise, as described below. The subject breathed through a Daniels valve; expired gases www.jap.org Germany 48 is currently the six-time Grand Champion of the Tour de France. It reports that the physiological factor most relevant to performance improvement as he matured over the 7-yr period from ages 21 to 28 yr was an 8% improvement in muscular efficiency when cycling. This adaptation combined with relatively large reductions in body fat and thus body weight (e.g., 78 –72 kg) during the months before the Tour de France J Appl Physiol 98: 2191–2196, 2005. contributed to an impressive 18%doi:10.1152/japplphysiol.00216.2005. improvement in his powerFirst published March 17, 2005; to-body weight ratio (i.e., W/kg) when cycling at a given V̇O2 (e.g., 5.0 l/min or !83% V̇O2 max). Remarkably, this individual was able to display these achievements despite the fact that he displayeddeveloped as Tour de France matures advanced cancer at agechampion 25 yr and required surgeries and chemotherapy. Scientific Publications and Provenance Improved muscular efficiency Edward F. Coyle Human Performance Laboratory, Department of Kinesiology and Health Education, The University of Texas at Austin, Austin, Texas whom subsequently raced professio period of 1989 –1995. The five-tim Tour de France during the years 19 to possess a V̇O2 max of 6.4 l/min an a body weight of 81 kg (28). La subject in our study were not ma France; however, with the cons V̇O2 max was at least 6.1 l/min an weight of 72 kg, we estimate his V̇ 85 ml!kg"1 !min"1 during the per Tour de France. Therefore, his V̇O weight during his victories of 1999 what higher than what was reporte 1991–1995 and to be among the high class runners and bicyclists (e.g., 80 – 16, 28, 29) It is generally appreciated that in success in endurance sports also req for prolonged periods at a high per as the ability to efficiently convert muscular power and velocity (5, 7, blood LT (e.g., 1 mM increase in bl in absolute terms or as a percentag reasonably good predictor of aero that a given rate of ATP turnover c 21), and prediction is strengthened ment of muscle capillary density i Capillary density is thought to be muscle’s ability to clear fatiguing m muscle fibers into the circulation, Submitted 22 February 2005; accepted in final form 10 March 2005 O2 max O2 max power production when cycling at a given oxygen uptake (V̇O2) is the characteristic that improved most as this athlete matured from ages 21 to 28 yr. It is noteworthy that at age 25 yr, this champion developed advanced cancer, requiring surgeries and chemotherapy. During the months leading up to each of his Tour de France victories, he reduced body weight and body fat by 4 –7 kg (i.e., !7%). Therefore, over the 7-yr period, an improvement in muscular efficiency and reduced body fat contributed equally to a remarkable 18% improvement in his steady-state power per kilogram body weight when cycling at a given V̇O2 (e.g., 5 l/min). It is hypothesized that the improved muscular efficiency probably reflects changes in muscle myosin type stimulated from years of training intensely for 3– 6 h on most days. Fig. 1. Mechanical efficiency when bicycling expressed as “gross efficiency” and “delta efficiency” over the 7-yr period in this individual. WC, World Bicycle Road Racing Championships, 1st and 4th place, respectively. Tour de France 1st, Grand Champion of the Tour de France in 1999 –2004. and 28.2 yr. J Appl Physiol • VOL Downloaded from jap.physiology.org on February 15, 2009 "raw data from the January 1993 test that revealed several additional published methodology. Coyle Coyle, Edward F. deviations Improved muscular efficiencyfrom displayed as the Tour ages 21 to 28 y. Description of this person is noteworthy for de France champion matures. J Appl Physiol 98: 2191–2196, 2005. First two reasons. First, he rose to become a six-time and present published March 17, 2005;doi:10.1152/japplphysiol.00216.2005.— Grand Champion of the Tour de France, and thus adaptations used 20-min ergometer (not 25 min), including 2This case a describes the physiological maturation from ages 21 to protocol 28 yr relevant to this feat were identified. Remarkably, he accomof the bicyclist who has now become the six-time consecutive Grand plished this after developing and receiving treatment for adChampion of the Tour stages de France, at ages 27–32 yr. Maximalrespiratory oxygen and 3-min where exchange ratios (RER) ) in the trained state remained at !6 l/min, lean body vanced cancer. Therefore, this report is also important because uptake (V̇ it provides insight, although limited, regarding the recovery of weight remained at !70 kg, and maximal heart rate declined from 207 “performance physiology” successful adexceeded 1.00. AnwasRER >1.00 invalidatesafteruse oftreatment theforLusk to 200 beats/min. Blood lactate threshold typical of competitive , yet maximal blood vanced cancer. The approach of this study will be to report cyclists in that it occurred at 76 – 85% V̇ lactate concentration was remarkably low in the trained state. It results from standardized laboratory testing on this individual equations (5) toin estimate expenditure.” appears that an 8% improvement muscular efficiency andenergy thus at five time points corresponding to ages 21.1, 21.5, 22.0, 25.9, 98 • JUNE 2005 • METHODS www.jap.org General testing sequence. On reporting are to the laboratory, training, ”…all of the published delta efficiency values wrong. … racing, and medical histories were obtained, body weight was measured ("0.1 kg), and the following tests were performed after inthere exists no credible evidence to was support Coyle's formed consent obtained, with procedures approved by the Internal Review Board of The University of Texas at Austin. Mechanical efficiencyefficiency and the blood lactate threshold (LT) were deterconclusion that Armstrong's muscle improved." mined as the subject bicycled a stationary ergometer for 25 min, with maximum oxygen uptake; blood lactate concentration work rate increasing progressively every 5 min over a range of 50, 60, 70, 80, and 90% V̇O2 max. After a 10- to 20-min period of active recovery, V̇O2 max when cycling was measured. Thereafter, body composition was determined by hydrostatic weighing and/or analysis of skin-fold thickness (34, 35). Measurement of V̇O2 max. The same Monark ergometer (model 819) equipped with a racing seat and drop handlebars and pedals for cycling shoes was used for all cycle testing, and seat height and saddle position were held constant. The pedal’s crank length was 170 mm. V̇O2 max was measured during continuous cycling lasting between 8 and 12 min, with work rate increasing every 2 min. A leveling off of oxygen uptake (V̇O2) always occurred, and this individual cycled until exhaustion at a final power output that was 10 –20% higher than the – Münster, minimal power output needed to BTW elicit V̇O2009 2 max. A venous blood sample was obtained 3– 4 min after exhaustion for determination of blood lactate concentration after maximal exercise, as described below. The subject breathed through a Daniels valve; expired gases http://jap.physiology.org/cgi/content/full/105/3/1020 Provenance MUCH HAS BEEN LEARNED about the physiological factors that contribute to endurance performance ability by simply describing the characteristics of elite endurance athletes in sports such as distance running, bicycle racing, and cross-country skiing. The numerous physiological determinants of endurance have been organized into a model that integrates such factors as maximal oxygen uptake (V̇O2 max), the blood lactate threshold, and muscular efficiency, as theseFreire have been found to be the Management – Juliana most important variables (7, 8, 15, 21). A common approach has been to measure these physiological factors in a given athlete at one point in time during their competitive career and Germany 49 Provenance-Rich Publications Bridge the gap between the scientific process and publications Results that can be reproduced and validated – Papers with deep captions – Encouraged by ACM SIGMOD and a number of journals Describe more of the discovery process: people only describe successes, can we learn from mistakes? Dynamic (interactive) publications – Evolve over time – Blog/wiki like=> Science 2.0 – E.g., http://project.liquidpub.org Need tools to support this! Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 50 The Provenance-Enabled Paper http://www.cs.utah.edu/~juliana/videos/vistrails_pdf.mov Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 51 Science 2.0 Web 2.0 technologies has opened up new opportunities to improve collaboration and information sharing in science [Shneiderman, Science 2008; Waldrop, Scientific American 2008] Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 52 Science 2.0: Challenges Some skepticism: Can we trust that the information is accurate? If unpublished results are posted on a wiki, can we prevent others from stealing that work? Need provenance: determine authorship, enforce intellectual property rights, validate the integrity of artifacts and assess their quality, and to reproduce the artifact Social Data Analysis: Share data and processes Add to information overload! Finding and making sense of information – Google is not enough! – Need focused search and structured queries – Integrate data on the fly – DataSpaces Preserving data Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 53 Provenance and Teaching (1) Leverage provenance to improve the way we teach CS and Science – http://www.vistrails.org/index.php/SciVisFall2008 – Lecture provenance: student can reproduce results Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 54 Provenance and Teaching (2) Homework provenance provides insights regarding – Task complexity and nature: number of actions; structural vs. parameter changes; task duration – Student confusion: large branching factor=lots of trial and error steps Very detailed (and honest!) feedback: instructors can leverage this information [Lins et al., SSDBM 2008] Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 55 Provenance and Teaching (3) Homework provenance helps students and instructors to collaborate – Student is stuck, sends his provenance – Instructor understands studentʼs problem, provides hints---student can see what instructor did! – They can also collaborate in real time [Ellkvist et al., IPAW 2008] Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 56 Using Provenance to Teach Electronic Media [Langefeld and Kessler, Submitted 2009] “[...] The students have gotten to the point where they demand the VisTrails files for every demonstration just after I complete [it]” “[...] students used [a vistrail instead of a reference model] 62% of the time” Students who used provenance produced higherquality models Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 57 Provenance Interoperability Complex tasks may require – Multiple workflow systems---each has its own provenance model, e.g., Swift to run simulation on the Grid, VisTrails to visualize the results – Tools which cannot be wrapped and controlled by workflow system – Human interaction But need to integrate their provenance – To obtain complete provenance of a data item – To re-use powerful provenance exploration tools such as VisTrails Provenance as an interoperability layer across different tools and workflow systems Challenge: information integration! Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 58 Provenance Analytics: Opportunities Volume of collected provenance is growing Workflow and provenance repositories – myExperiments (EU), Provenance Repository (Indiana), ManyEyes (IBM), Yahoo! Pipes Opportunity for knowledge discovery, sharing and re-use – Discover workflow patterns a recommendation system that suggests alternatives to users as they construct a workflow – Discover workflow refinement patterns automatically extract analogies from shared repositories – Cluster (organize) workflow collections simplify query and search over repositories – Infer workflow specification from execution log [Aalst et al., TKDE 2004] Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 59 Provenance Analytics: Challenges Lots of data, complex data: graphs + metadata – Modules, parameters, parameter values, data products Do existing approaches to graph mining scale? Case study in clustering: [Santos, IPAW 2008] – Explore different workflow representations: Graphs versus bag of words – Examine trade-off between efficiency and cluster quality – Bag of words surprisingly effective, and much more efficient Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 60 Efficient Storage and Querying Provenance information can be large – Mimi Project (U. Michigan): provenance = 10 * data Need to worry about efficiency and usability – Reduce redundancies in provenance data [Chapman et al, SIGMOD 2008] – Develop efficient techniques for evaluating provenance queries, which are frequently recursive [Heinis and Alonso, SIGMOD 2008] – Need to reduce provenance overload for users [Cohen-Boulakia, Davidson et al VLDB2007, ICDE2008] Need indexes – E.g. to search over a large provenance store, or to apply analogies to a large number of workflows – Provenance and workflows are graphs – Can graph indexing techniques be applied to this problem? e.g., [Shasha et al. SIGMOD 2002; Yan et al., SIGMOD 2004] Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 61 Conclusions and Future Work Provenance management is essential for exploratory computational tasks – Provenance can be used to support reflective reasoning – Intuitive interfaces for simplifying the construction and refinement of workflows Science 2.0: Sharing provenance at a large scale creates new opportunities [Freire and Silva, CHI SDA, 2008] – Workflow/provenance repositories; provenance-enabled publications – Expose scientists to different techniques and tools – Scientists can learn by example; expedite their scientific training; and potentially reduce their time to insight Provenance + Workflows + Sharing have the potential to revolutionize science! Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 62 Conclusions and Future Work Appropriate support for exploratory tasks is essential for a wider adoption and more effective use of scientific workflow systems Sharing workflows and provenance at a large scale creates new opportunities [Freire and Silva, CHI SDA, 2008] Provenance + Workflows + Sharing have the potential to revolutionize science! Many challenging data management problems--great potential for practical impact Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 63 Acknowledgments Thanks to VisTrails group This work is partially supported by the National Science Foundation, the Department of Energy, an IBM Faculty Award, and a University of Utah Seed Grant. Provenance Management – Juliana Freire BTW 2009 – Münster, Germany 64 Danke Obrigada Thank you