Organization of the Euclid Data Processing: dealing with complexity Fabio Pasian (INAF – O.A.Trieste) and Christophe Dabin, Marc Sauvage, Oriana Mansutti, Claudio Vuerli, Anna Gregorio on behalf of the Euclid SGS development team The presented document is Proprietary information of the Euclid Consortium. This document shall be used and disclosed by the receiving Party and its related entities (e.g. contractors and subcontractors) only for the purposes of fulfilling the receiving Party's responsibilities under the Euclid Project and that identified and marked technical data shall not be disclosed or retransferred to any other entity without prior written permission of the document preparer . ADASS XXIV, Calgary, 5-9 Oct 2014 Fabio Pasian – Euclid Data processing 1 The Euclid Mission M2 mission in the framework of the ESA Cosmic Vision Programme Euclid mission objective is to map the geometry and understand the nature of the dark Universe (dark energy and dark matter) Actors in the mission: ESA and the Euclid Consortium (institutes from 13 European countries and USA, funded by their own national Space Agencies) For more information see : http://sci.esa.int/science-e/www/area/index.cfm?fareaid=102 http://www.euclid-ec.org ADASS XXIV, Calgary, 5-9 Oct 2014 Fabio Pasian – Euclid Data processing 2 The Euclid Consortium • The Euclid Consortium is in charge of: – building and operating the instruments (VIS and NISP) – developing and running the data processing within a unified Science Ground Segment (SGS) – performing the science analysis on the Euclid data products • The Euclid Consortium is composed of 1300+ members – 350+ Consortium members participating in SGS (active: ~150) ADASS XXIV, Calgary, 5-9 Oct 2014 Fabio Pasian – Euclid Data processing 3 Euclid at a Glance ADASS XXIV, Calgary, 5-9 Oct 2014 Fabio Pasian – Euclid Data processing 4 The Ground Segment at a glance scientific community Euclid ESA/SOC and the EC SGS have developed, and are committed to maintain, a tight collaboration in order to design and develop a single, truly integrated SGS. VObs EA is built jointly by EC and SOC, and is managed by SOC. «Internal» and «public» EA functions – the latter allows access to a subset of EA data SOC Ground Station ESAC DDS MOC ESOC External data (KiDS, DES, ...) EA ECSGS Project Office System Team SDC SDC SDC SDC SDC SDC SDC SDC SDC This is an institutional view of the GS ADASS XXIV, Calgary, 5-9 Oct 2014 Fabio Pasian – Euclid Data processing 5 Ground Station SOC MOC The Ground Segment as seen in the high-level Euclid documents MOGS ADASS XXIV, Calgary, 5-9 Oct 2014 LE1 Euclid Consortium (ECSGS) SGS Fabio Pasian – Euclid Data processing 6 SOC Ground Station MOC OPS LE1 Level 1 Level E The Ground Segment as seen from the data processing point of view The coloured boxes correspond to the Processing Functions, which are a product of the Euclid SGS MOGS VIS SIM NIR SIR MER EXT Level 2 VIS/NIR/SIR/EXT cross-check Level S SPE SIR cross-check SGS SHE PHZ MER cross-check LE3 Calgary, 5-9 Oct 2014 ThisADASS is anXXIV, functional view of the SGS Fabio Pasian – Euclid Data processing Level 3 7 SWGs, OUs and SDCs • Science Working Groups – external to the SGS – turning science objectives into requirements placed on the pipeline products and performances – verifying that the requirements are met (define V&V procedures) • Organisation Units – providing the algorithmic definition of the processing to be implemented by the SDCs and validate the implementation • Science Data Centres – implementing the data processing pipelines as specified by OUs – procuring local h/w and s/w resources – different activities: • • SDC-DEV (development – i.e. transforming algorithms into robust code) SDC-PROD (integration on local infrastructure, production runs of pipeline) • individual Euclid scientists may belong to more than one of the above groups ADASS XXIV, Calgary, 5-9 Oct 2014 Fabio Pasian – Euclid Data processing 8 Development–Verification&Validation for every Processing Function requirements SWGs OU validation (on results) algorithms, test data code validation 1. in most cases, no interfaces but joint development 2. only for validation against high-level requirements SDC-DEV 3. common integration platform pipeline code, test data SDC-PROD pipelines verification SDC-PROD ADASS XXIV, Calgary, 5-9 Oct 2014 … SDC-PROD Fabio Pasian – Euclid Data processing 9 Development–Verification&Validation 1. Set of documents being prepared jointly between OUs and SDCs (by product – Processing Function – and not by organisation) : a. PF Requirements Specification Document b. PF Validation Plan c. Development Plans (organised by SDC) 2. Validation by SWGs of the high-level data processing requirements a. high-level data processing requirements attributed to PFs b. the SGS will be considered as validated if every high-level data processing requirement is validated c. the SGS is including in the top-level IV&V plans the inputs provided by the SWG coordinators regarding the principles of validation as well as the recommendations and typologies of Validation test – this top-level document will be co-signed by SGS and SWG coordinators • Responding to recommendations from the SGS-PRR: – Simplification/reduction of interfaces – «Best Practices» document issued to help OUs/SDCs ADASS XXIV, Calgary, 5-9 Oct 2014 Fabio Pasian – Euclid Data processing 10 ADASS XXIV, Calgary, 5-9 Oct 2014 Fabio Pasian – Euclid Data processing 11 Pillars of SGS development SGS-Level Services = shared tools and systems for SGS software development • Standards and guidelines • Development platform • Integration platform • Data model • Software infrastructure The System Team provides these to make the integration and operation of the Processing Functions a simple as possible [ Current status is wrt ADASS XXIII ] ADASS XXIV, Calgary, 5-9 Oct 2014 Fabio Pasian – Euclid Data processing 12 Standards and guidelines Standards and guidelines help developers take the right decisions • Show how/where to improve code to meet the demanding requirements of the Euclid data processing • Encourage the use of best practices • Provide tools to help developers improve their code Current status: • Standards being developed based on previous project experience and adapted to the Euclid context ADASS XXIV, Calgary, 5-9 Oct 2014 Fabio Pasian – Euclid Data processing 13 Development and integration platform The SGS uses a single development platform specifying • Operating system, Programming language, Support libraries CODEEN is the Euclid collaborative development and continuous integration platform • The cost of fixing bugs increases as the system integration approaches completion • Usage mandatory for main processing software Current status: • Python adopted as the second language allowed for pipeline development in addition to C++ ( Linux + C++ & Python ) • Drivers: More flexibility about who can contribute to development, long term direction of astronomical programming • The System Team will ensure that we get all of the benefits and avoid the (known!) pitfalls ADASS XXIV, Calgary, 5-9 Oct 2014 Fabio Pasian – Euclid Data processing 14 Data model Explicit data model built by OUs to describe the output of their processing functions (therefore input to other Processing Functions in most cases) • Many projects have an implicit data model, using conventions and shared code data structures • Change management of implicit data models is difficult, particularly for long-living projects where knowledge can be lost Current status: • Data Model Workshops held with great participation from OUs and System Team • First iterations of the DM very promising – real data products starting to be defined • Challenge now is to increase the coverage to all products and maintain a flexible process to allow the DM to evolve in a controlled way along with the Processing Functions. • CCB started, to accept new items and to evaluate change requests ADASS XXIV, Calgary, 5-9 Oct 2014 Fabio Pasian – Euclid Data processing 15 Software Infrastructure Three main systems • Data Management System (EAS) - Shared set of tools for managing the Euclid dataset: data discovery and exchange, data processing support, quality and lineage tracking • Abstraction Layer (IAL) – Processing management at the SDC computing facilities • Processing Orchestration (COORS) – Coordinating processing activities across all the SDCs Current status: • Prototypes exist for the core EAS system and IAL • Integration of these systems though EC SGS Challenges demonstrate real progress towards a working data processing system ADASS XXIV, Calgary, 5-9 Oct 2014 Fabio Pasian – Euclid Data processing 16 ST Challenge #3 Final goal of challenges : deploying transparently pipelines on all SDCs Technical objectives : • Demonstrate the capability to deploy IAL VM images into SDCs • Demonstrate the capability to deploy, in the context of each SDC, the TIPS, NIP and VIS simulators as Euclid pipeline objects • Demonstrate the capability of IAL, in the context of each SDC, to : • fetch, on the basis of the metadata provided by EAS prototype (in SDC-NL), the pipelines input data in the local SDC storage area • launch simulators jobs across clusters (when available in SDCs) or dedicated nodes, in accordance with PPOs defined remotely (through Jenkins) or locally (by each SDC leader) – orchestration mock-up • produce and store output data into the local SDC storage area • send the appropriate metadata to EAS prototype in SDC-NL Schedule: • Baseline availability for deployment into SDCs : end of December 2013 • By mid-February 2014, all SDCs had successfully fulfilled the challenge ADASS XXIV, Calgary, 5-9 Oct 2014 Fabio Pasian – Euclid Data processing 17 Thank you for your attention fabio.pasian@inaf.it ADASS XXIV, Calgary, 5-9 Oct 2014 Fabio Pasian – Euclid Data processing 18 ADASS XXIV, Calgary, 5-9 Oct 2014 Fabio Pasian – Euclid Data processing 19 Acknowledgments Thanks to ESA and to the Euclid Consortium, and in particular: ESA: John Hoar (ESAC), Guillermo Buenadicha (ESAC), René Laureijs (ESTEC), Giuseppe Racca (ESTEC), Pedro Osuna (ESAC), Bruno Altieri (ESAC), Michael Schmidt (ESOC), Cyril Colombo (ESTEC), Ralf Kohley (ESAC), ... EC: Yannick Mellier (IAP), Andrea Zacchei (INAF), Keith Noddle (UoE), Maurice Poncet (CNES), Rees Williams (RuG), Christian Neissner (PIC), Johannes Koppenhöfer (MPG), Pierre Dubath (Unige), Elina Keihänen (UHelsinki), Marco Frailis (INAF), Jean-Marc Delouis (IAP), Jean-Jacques Metge (CNES), Christian Surace (LAM), Nikos Apostolakos (Geneva), Laurent Vibert (IAS), Martin Melchior (FHNW), Stefan Müller (FHNW), Marco Soldati (FHNW), Andrey Belikov (RuG), Edwin Valentijn (RuG), Harry Teplitz (IPAC), OUs staff, SDCs staff, … The SGS PRR Panel and Board And many other people involved in the project This is a REAL team effort ... and thank you for your attention fabio.pasian@inaf.it ADASS XXIV, Calgary, 5-9 Oct 2014 Fabio Pasian – Euclid Data processing 20 Project Office ECSGS Manager F. Pasian ECSGS Scientist M. Sauvage PA/QA Lead C. Vuerli Config. Lead O. Mansutti Proj.Ctr. Support D. Fierro IOT Coordination A. Gregorio System Team L. C. Dabin ECSGS Deputy C. Dabin ECSGS Management SDCs OUs SDC-CH SDC-DE OU-VIS OU-NIR P. Dubath J. Koppenhoefer F. Raison H. Mc Cracken N. Shane A. Grazian R. Bouwens SDC-ES SDC-FI OU-EXT OU-SIR C. Dabin C. Neissner N. Tonello E. Keihanen H. Kurki-Suonio J. Mohr G.Verdoes-Kleijn M. Scodeggio C. Surace Archive Metadata Archive Data SDC-FR SDC-IT OU-SIM OU-MER A. Belikov A. Zacchei M. Frailis S. Serrano A. Ealet A. Fontana P. Osuna M. Poncet J-J. Metge Orchestration Common Tools SDC-NL SDC-UK OU-SPE OU-SHE K. Noddle M. Holliman O. Le Fèvre M. Mignoli A. Taylor OU-PHZ OU-LE3 Architecture Performance Abstraction Layer (IAL) K. Noddle M. Melchior Monitoring & Control Data Modeling L. Vibert K. Noddle M. Poncet O. R. Williams A. Belikov Data Quality LE1 common SDC-US J. Rector infrastructure Calgary, 5-9 Oct M. Brescia ADASS XXIV, H.2014 Teplitz M.Frailis S. Paltani Organisation Fabio Pasian – Euclid Data processing F. Castander Group M.Kuemmel,M.Douspis F.Courbin,T.Schrabback J-L. Starck F.Abdalla, 21 E.Branchini Processing Functions • Processing Functions – are a product of the Euclid SGS (to be eventually delivered to ESA at the end of the mission) – correspond to the processing steps which are performed within an «Euclid pipeline» – are algorithmically devised by the relevant OU and engineered by software development teams (SDC-DEV) – can in principle be run yielding the same results on any SDC site of the SGS (SDC-PROD, different HW environments) • In most cases, Processing Functions are developed jointly by OU members and their local SDC-DEV teams – formal OU-SDC interfaces not needed in most cases – easier to develop directly pipeline-quality code – SGS System Team provides tools/standards/support (SDC Leads are members of the ST) ADASS XXIV, Calgary, 5-9 Oct 2014 Fabio Pasian – Euclid Data processing 22 requirements SWGs OU validation (on results) algorithms, test data code validation SDC-DEV pipeline code, test data SDC-PROD pipelines verification SDC-PROD ADASS XXIV, Calgary, 5-9 Oct 2014 … SDC-PROD Fabio Pasian – Euclid Data processing 23 The Ground Segment at a glance scientific community Euclid ESA/SOC and the EC SGS have developed, and are committed to maintain, a tight collaboration in order to design and develop a single, truly integrated SGS. VObs EA is built jointly by EC and SOC, and is managed by SOC. «Internal» and «public» EA functions – the latter allows access to a subset of EA data SOC Ground Station ESAC DDS MOC ESOC External data (KiDS, DES, ...) EA An organization based on the decomposition in Organization Units (OU), corresponding to a subset of overall EUCLID Data Processing. EC-SGS Project Office Simulation OU-SIM SDC SDC SDC OU-PHZ OU-SHE OU-VIS Phot Red Sh. Morpho & Shear VIS Imag SDC OU-LE3 Level 3 SDC SDC OU-SPE Spectro Meas OU-SIR OU-MER Nir Spectro Euclidisation SDC SDC OU-NIR OU-EXT Nir Imag Ext Data SDC OUs are transnational OU coordinator OU Deputy Coordinator ADASS XXIV, Calgary, 5-9 Oct 2014 Fabio Pasian – Euclid Data processing 24