Pegasus: Planning for Execution in Grids Ewa Deelman Information Sciences Institute University of Southern California Pegasus Acknowledgements Ewa Deelman, Carl Kesselman, Saurabh Khurana, Gaurang Mehta, Sonal Patil, Gurmeet Singh, Mei-Hui Su, Karan Vahi (ISI) James Blythe, Yolanda Gil (ISI) http://pegasus.isi.edu Research funded as part of the NSF GriPhyN, NVO and SCEC projects. Ewa Deelman Information Sciences Institute Outline General Scientific Workflow Issues on the Grid Mapping complex applications onto the Grid Pegasus Pegasus Application Portal LIGO-gravitational-wave physics Montage-astronomy Incremental Workflow Refinement Futures Ewa Deelman Information Sciences Institute Grid Applications Increasing in the level of complexity Use of individual application components Reuse of individual intermediate data products (files) Description of Data Products using Metadata Attributes Execution environment is complex and very dynamic Resources come and go Data is replicated Components can be found at various locations or staged in on demand Separation between the application description the actual execution description Ewa Deelman Information Sciences Institute Application Development and Execution Process Abstract Workflow Generation FFT Application Component Selection ApplicationDomain Specify a Different Workflow Concrete Workflow Generation FFT filea Resource Selection Data Replica Selection Transformation Instance Selection Abstract Workflow Pick different Resources transfer filea from host1:// home/filea to host2://home/file1 /usr/local/bin/fft /home/file1 DataTransfer Concrete Workflow host1 host2 host2 Retry Data Data Execution Environment Ewa Deelman Failure Recovery Method Information Sciences Institute Why Automate Workflow Generation? Usability: Limit User’s necessary Grid knowledge Complexity: User needs to make choices Alternative application components Alternative files Alternative locations The user may reach a dead end Many different interdependencies may occur among components Solution cost: Evaluate the alternative solution costs Monitoring and Directory Service Replica Location Service Performance Reliability Resource Usage Global cost: minimizing cost within a community or a virtual organization requires reasoning about individual user’s choices in light of other user’s choices Ewa Deelman Information Sciences Institute Executable Workflow Construction Chimera builds an abstract workflow based on VDL descriptions Pegasus takes the abstract workflow and produces and executable workflow for the Grid Condor’s DAGMan executes the workflow Abstract Worfklow Chimera Ewa Deelman Concrete Workflow Pegasus Jobs DAGMan Information Sciences Institute Pegasus: Planning for Execution in Grids Maps from abstract to concrete workflow Automatically locates physical locations for both components (transformations) and data Algorithmic and AI based techniques Use Globus RLS and the Transformation Catalog Finds appropriate resources to execute via Globus MDS Reuses existing data products where applicable Publishes newly derived data products Chimera virtual data catalog Ewa Deelman Information Sciences Institute Chimera is developed at ANL By I. Foster, M. Wilde, and J. Voeckler Virtual Data Language Chimera Abstract Worfklow Request Manager Workflow Planning Replica Locati on Available Reources Data Management Workflow Reduction at io n in fo rm Concrete Workflow Globus Monitoring and Discovery Service Transformation Catalog M on ito r in g workflow executor (DAGman) Execution Data Publication Dynamic information Submission and Monitoring System Replica and Resource Selector Globus Replica Location Service Information and Models s ta Grid ks Raw data detector Ewa Deelman Information Sciences Institute Example Workflow Reduction Original abstract workflow a b d1 d2 c If “b” already exists (as determined by query to the RLS), the workflow can be reduced b Ewa Deelman d2 c Information Sciences Institute Mapping from abstract to concrete b d2 c Query RLS, MDS, and TC, schedule computation and data movement Move b from A to B Execute d2 at B Ewa Deelman Move c from B to U Register c in the RLS Information Sciences Institute User the ntic atio n V ac DL/ tW or fk low tion u c Exe ords re c ata/ Metad Pegasus VDL Portal Abs Metada trac t t W a/ orkf low Abstract Workflow/ Information LIGO-specific interface Montagespecific Interface Ab str Metadata Catalog Service Au Chimera Globus MDS / nc r In ete fo W rm o at rkf io lo n w In fo r m at io Co Simplified View of SC 2003 Portal MyProxy DAGMan Jobs/ Information Ewa Deelman n on ati The Grid Globus RLS rm o f In Transformation Catalog Information Sciences Institute LIGO Scientific Collaboration Continuous gravitational waves are expected to be produced by a variety of celestial objects Only a small fraction of potential sources are known Need to perform blind searches, scanning the regions of the sky where we have no a priori information of the presence of a source Wide area, wide frequency searches Search is performed for potential sources of continuous periodic waves near the Galactic Center and the galactic core The search is very compute and data intensive LSC used the occasion of SC2003 to initiate a month-long production run with science data collected during 8 weeks in the Spring of 2003 Ewa Deelman Information Sciences Institute Additional resources used: Grid3 iVDGL resources Ewa Deelman Information Sciences Institute LIGO Acknowledgements Bruce Allen, Scott Koranda, Brian Moe, Xavier Siemens, University of Wisconsin Milwaukee, USA Stuart Anderson, Kent Blackburn, Albert Lazzarini, Dan Kozak, Hari Pulapaka, Peter Shawhan, Caltech, USA Steffen Grunewald, Yousuke Itoh, Maria Alessandra Papa, Albert Einstein Institute, Germany Many Others involved in the Testbed www.ligo.caltech.edu www.lsc- group.phys.uwm.edu/lscdatagrid/ http://pandora.aei.mpg.de/merlin/ LIGO Laboratory operates under NSF cooperative agreement PHY-0107417 Ewa Deelman Information Sciences Institute Montage (NASA and NVO) Montage Deliver science-grade custom mosaics on demand Produce mosaics from a wide range of data sources (possibly in different spectra) User-specified parameters of projection, coordinates, size, rotation and spatial sampling. Mosaic created by Pegasus based Montage from a run of the M101 galaxy images on the Teragrid. Ewa Deelman Information Sciences Institute Small Montage Workflow ~1200 nodes Ewa Deelman Information Sciences Institute Montage Acknowledgments Bruce Berriman, John Good, Anastasia Laity, Caltech/IPAC Joseph C. Jacob, Daniel S. Katz, JPL http://montage.ipac. caltech.edu/ Testbed for Montage: Condor pools at USC/ISI, UW Madison, and Teragrid resources at NCSA, PSC, and SDSC. Montage is funded by the National Aeronautics and Space Administration's Earth Science Technology Office, Computational Technologies Project, under Cooperative Agreement Number NCC5-626 between NASA and the California Institute of Technology. Ewa Deelman Information Sciences Institute Other Applications Using Chimera and Pegasus Other GriPhyN applications: High-energy physics: Atlas, CMS (many) Astronomy: SDSS (Fermi Lab, ANL) Astronomy: Biology Galaxy Morphology (NCSA, JHU, Fermi, many others, NVO-funded) BLAST (ANL, PDQ-funded) Neuroscience Tomography (SDSC, NIH-funded) Ewa Deelman Information Sciences Institute Current System Pegasus(Abstract Workflow) Concrete Worfklow DAGMan(CW)) Original Abstract Workflow Current Pegasus Ewa Deelman Workflow Execution Information Sciences Institute Workflow Refinement and execution User’s Request Workflow refinement Levels of abstraction Application -level knowledge Logical tasks Tasks bound to resources and sent for execution Relevant components Policy info Workflow repair Full abstract workflow Task matchmaker Not yet executed Ewa Deelman Partial execution executed time Information Sciences Institute Incremental Refinement Partition Abstract workflow into partial workflows PW A PW B PW C A Particular Partitioning Ewa Deelman New Abstract Workflow Information Sciences Institute Meta-DAGMan Pegasus(A) Su(A) DAGMan(Su(A)) Pegasus(B) Su(B) DAGMan(Su(B)) Pegasus(X) –Pegasus generates the concrete workflow and the submit files for X = Su(X) DAGMan(Su(X))—DAGMan executes the concrete workflow for X Ewa Deelman Pegasus(C) Su(C) DAGMan(Su(C)) Information Sciences Institute Future Directions Incorporate AI-planning technologies in production software (Virtual Data Toolkit) Investigate various scheduling techniques Investigating fault tolerance issues Selecting resources based on their reliability Responding to failures http://pegasus.isi.edu Ewa Deelman Information Sciences Institute