Organization of the Euclid Data Processing

advertisement
Organization of the Euclid Data
Processing: dealing with complexity
Fabio Pasian
(INAF – O.A.Trieste)
and Christophe Dabin, Marc Sauvage,
Oriana Mansutti, Claudio Vuerli, Anna Gregorio
on behalf of the Euclid SGS development team
The presented document is Proprietary information of the Euclid Consortium. This document shall be used and disclosed by the receiving Party and its related entities (e.g.
contractors and subcontractors) only for the purposes of fulfilling the receiving Party's responsibilities under the Euclid Project and that identified and marked technical data shall
not be disclosed or retransferred to any other entity without prior written permission of the document preparer .
ADASS XXIV, Calgary, 5-9 Oct 2014
Fabio Pasian – Euclid Data processing
1
The Euclid Mission
M2 mission in the framework of the ESA Cosmic Vision Programme
Euclid mission objective is to map the geometry and understand the nature of the
dark Universe (dark energy and dark matter)
Actors in the mission: ESA and the Euclid Consortium (institutes from 13 European
countries and USA, funded by their own national Space Agencies)
For more information see :
http://sci.esa.int/science-e/www/area/index.cfm?fareaid=102
http://www.euclid-ec.org
ADASS XXIV, Calgary, 5-9 Oct 2014
Fabio Pasian – Euclid Data processing
2
The Euclid Consortium
• The Euclid Consortium is in charge of:
– building and operating the instruments (VIS and NISP)
– developing and running the data processing within a unified
Science Ground Segment (SGS)
– performing the science analysis on the Euclid data products
• The Euclid Consortium is composed of 1300+ members
– 350+ Consortium members participating in SGS (active: ~150)
ADASS XXIV, Calgary, 5-9 Oct 2014
Fabio Pasian – Euclid Data processing
3
Euclid at a Glance
ADASS XXIV, Calgary, 5-9 Oct 2014
Fabio Pasian – Euclid Data processing
4
The Ground Segment at a glance
scientific
community
Euclid
ESA/SOC and the EC SGS have developed,
and are committed to maintain, a tight
collaboration in order to design and develop
a single, truly integrated SGS.
VObs
EA is built jointly by EC and SOC,
and is managed by SOC. «Internal» and
«public» EA functions – the latter allows
access to a subset of EA data
SOC
Ground
Station
ESAC
DDS
MOC
ESOC
External data
(KiDS, DES, ...)
EA
ECSGS
Project Office
System Team
SDC
SDC
SDC
SDC
SDC
SDC
SDC
SDC
SDC
This is an institutional view of the GS
ADASS XXIV, Calgary, 5-9 Oct 2014
Fabio Pasian – Euclid Data processing
5
Ground
Station
SOC
MOC
The Ground
Segment as seen
in the high-level
Euclid documents
MOGS
ADASS XXIV, Calgary, 5-9 Oct 2014
LE1
Euclid Consortium
(ECSGS)
SGS
Fabio Pasian – Euclid Data processing
6
SOC
Ground
Station
MOC
OPS
LE1
Level 1
Level E
The Ground
Segment as seen
from the data
processing point of
view
The coloured boxes
correspond to the
Processing Functions,
which are a product of
the Euclid SGS
MOGS
VIS
SIM
NIR
SIR
MER
EXT
Level 2
VIS/NIR/SIR/EXT
cross-check
Level S
SPE
SIR
cross-check
SGS
SHE
PHZ
MER
cross-check
LE3
Calgary, 5-9
Oct 2014
ThisADASS
is anXXIV,
functional
view
of the SGS Fabio Pasian – Euclid Data processing
Level 3
7
SWGs, OUs and SDCs
• Science Working Groups
– external to the SGS
– turning science objectives into requirements placed on the pipeline
products and performances
– verifying that the requirements are met (define V&V procedures)
• Organisation Units
– providing the algorithmic definition of the processing to be
implemented by the SDCs and validate the implementation
• Science Data Centres
– implementing the data processing pipelines as specified by OUs
– procuring local h/w and s/w resources
– different activities:
•
•
SDC-DEV (development – i.e. transforming algorithms into robust code)
SDC-PROD (integration on local infrastructure, production runs of pipeline)
• individual Euclid scientists may belong to more than one of the above groups
ADASS XXIV, Calgary, 5-9 Oct 2014
Fabio Pasian – Euclid Data processing
8
Development–Verification&Validation
for every
Processing Function
requirements
SWGs
OU
validation
(on results)
algorithms,
test data
code
validation
1. in most cases, no
interfaces but joint
development
2. only for validation
against high-level
requirements
SDC-DEV
3. common
integration
platform
pipeline code,
test data
SDC-PROD
pipelines
verification
SDC-PROD
ADASS XXIV, Calgary, 5-9 Oct 2014
…
SDC-PROD
Fabio Pasian – Euclid Data processing
9
Development–Verification&Validation
1. Set of documents being prepared jointly between OUs and SDCs
(by product – Processing Function – and not by organisation) :
a. PF Requirements Specification Document
b. PF Validation Plan
c. Development Plans (organised by SDC)
2. Validation by SWGs of the high-level data processing requirements
a. high-level data processing requirements attributed to PFs
b. the SGS will be considered as validated if every high-level data
processing requirement is validated
c. the SGS is including in the top-level IV&V plans the inputs
provided by the SWG coordinators regarding the principles of
validation as well as the recommendations and typologies of
Validation test – this top-level document will be co-signed by
SGS and SWG coordinators
• Responding to recommendations from the SGS-PRR:
– Simplification/reduction of interfaces
– «Best Practices» document issued to help OUs/SDCs
ADASS XXIV, Calgary, 5-9 Oct 2014
Fabio Pasian – Euclid Data processing
10
ADASS XXIV, Calgary, 5-9 Oct 2014
Fabio Pasian – Euclid Data processing
11
Pillars of SGS development
SGS-Level Services = shared tools and systems for SGS software
development
• Standards and guidelines
• Development platform
• Integration platform
• Data model
• Software infrastructure
The System Team provides these to make the integration and operation
of the Processing Functions a simple as possible
[ Current status is wrt ADASS XXIII ]
ADASS XXIV, Calgary, 5-9 Oct 2014
Fabio Pasian – Euclid Data processing
12
Standards and guidelines
Standards and guidelines help developers take the right decisions
• Show how/where to improve code to meet the demanding
requirements of the Euclid data processing
• Encourage the use of best practices
• Provide tools to help developers improve their code
Current status:
• Standards being developed based on previous project experience and
adapted to the Euclid context
ADASS XXIV, Calgary, 5-9 Oct 2014
Fabio Pasian – Euclid Data processing
13
Development and integration platform
The SGS uses a single development platform specifying
•
Operating system, Programming language, Support libraries
CODEEN is the Euclid collaborative development and continuous
integration platform
•
The cost of fixing bugs increases as the system integration
approaches completion
•
Usage mandatory for main processing software
Current status:
•
Python adopted as the second language allowed for pipeline
development in addition to C++  ( Linux + C++ & Python )
•
Drivers: More flexibility about who can contribute to development,
long term direction of astronomical programming
•
The System Team will ensure that we get all of the benefits and
avoid the (known!) pitfalls
ADASS XXIV, Calgary, 5-9 Oct 2014
Fabio Pasian – Euclid Data processing
14
Data model
Explicit data model built by OUs to describe the output of their
processing functions (therefore input to other Processing Functions in
most cases)
•
Many projects have an implicit data model, using conventions and
shared code data structures
•
Change management of implicit data models is difficult, particularly
for long-living projects where knowledge can be lost
Current status:
•
Data Model Workshops held with great participation from OUs and
System Team
•
First iterations of the DM very promising – real data products starting
to be defined
•
Challenge now is to increase the coverage to all products and
maintain a flexible process to allow the DM to evolve in a controlled
way along with the Processing Functions.
•
CCB started, to accept new items and to evaluate change requests
ADASS XXIV, Calgary, 5-9 Oct 2014
Fabio Pasian – Euclid Data processing
15
Software Infrastructure
Three main systems
• Data Management System (EAS) - Shared set of tools for managing
the Euclid dataset: data discovery and exchange, data processing
support, quality and lineage tracking
• Abstraction Layer (IAL) – Processing management at the SDC
computing facilities
• Processing Orchestration (COORS) – Coordinating processing activities
across all the SDCs
Current status:
• Prototypes exist for the core EAS system and IAL
• Integration of these systems though EC SGS Challenges demonstrate
real progress towards a working data processing system
ADASS XXIV, Calgary, 5-9 Oct 2014
Fabio Pasian – Euclid Data processing
16
ST Challenge #3
Final goal of challenges : deploying transparently pipelines on all SDCs
Technical objectives :
• Demonstrate the capability to deploy IAL VM images into SDCs
• Demonstrate the capability to deploy, in the context of each SDC, the
TIPS, NIP and VIS simulators as Euclid pipeline objects
• Demonstrate the capability of IAL, in the context of each SDC, to :
• fetch, on the basis of the metadata provided by EAS prototype (in
SDC-NL), the pipelines input data in the local SDC storage area
• launch simulators jobs across clusters (when available in SDCs) or
dedicated nodes, in accordance with PPOs defined remotely (through
Jenkins) or locally (by each SDC leader) – orchestration mock-up
• produce and store output data into the local SDC storage area
• send the appropriate metadata to EAS prototype in SDC-NL
Schedule:
• Baseline availability for deployment into SDCs : end of December 2013
• By mid-February 2014, all SDCs had successfully fulfilled the challenge
ADASS XXIV, Calgary, 5-9 Oct 2014
Fabio Pasian – Euclid Data processing
17
Thank you
for your
attention
fabio.pasian@inaf.it
ADASS XXIV, Calgary, 5-9 Oct 2014
Fabio Pasian – Euclid Data processing
18
ADASS XXIV, Calgary, 5-9 Oct 2014
Fabio Pasian – Euclid Data processing
19
Acknowledgments
 Thanks
to ESA and to the Euclid Consortium, and in particular:
 ESA: John Hoar (ESAC), Guillermo Buenadicha (ESAC), René Laureijs (ESTEC),
Giuseppe Racca (ESTEC), Pedro Osuna (ESAC), Bruno Altieri (ESAC), Michael
Schmidt (ESOC), Cyril Colombo (ESTEC), Ralf Kohley (ESAC), ...
 EC: Yannick Mellier (IAP), Andrea Zacchei (INAF), Keith Noddle (UoE), Maurice
Poncet (CNES), Rees Williams (RuG), Christian Neissner (PIC), Johannes
Koppenhöfer (MPG), Pierre Dubath (Unige), Elina Keihänen (UHelsinki), Marco
Frailis (INAF), Jean-Marc Delouis (IAP), Jean-Jacques Metge (CNES), Christian
Surace (LAM), Nikos Apostolakos (Geneva), Laurent Vibert (IAS), Martin Melchior
(FHNW), Stefan Müller (FHNW), Marco Soldati (FHNW), Andrey Belikov (RuG),
Edwin Valentijn (RuG), Harry Teplitz (IPAC), OUs staff, SDCs staff, …
 The SGS PRR Panel and Board
 And many other people involved in the project
 This is a REAL team effort


... and thank you for your attention
fabio.pasian@inaf.it
ADASS XXIV, Calgary, 5-9 Oct 2014
Fabio Pasian – Euclid Data processing
20
Project Office
ECSGS Manager
F. Pasian
ECSGS Scientist
M. Sauvage
PA/QA Lead
C. Vuerli
Config. Lead
O. Mansutti
Proj.Ctr. Support
D. Fierro
IOT Coordination
A. Gregorio
System Team L.
C. Dabin
ECSGS Deputy
C. Dabin
ECSGS
Management
SDCs
OUs
SDC-CH
SDC-DE
OU-VIS
OU-NIR
P. Dubath
J. Koppenhoefer
F. Raison
H. Mc Cracken
N. Shane
A. Grazian
R. Bouwens
SDC-ES
SDC-FI
OU-EXT
OU-SIR
C. Dabin
C. Neissner
N. Tonello
E. Keihanen
H. Kurki-Suonio
J. Mohr
G.Verdoes-Kleijn
M. Scodeggio
C. Surace
Archive
Metadata
Archive Data
SDC-FR
SDC-IT
OU-SIM
OU-MER
A. Belikov
A. Zacchei
M. Frailis
S. Serrano
A. Ealet
A. Fontana
P. Osuna
M. Poncet
J-J. Metge
Orchestration
Common
Tools
SDC-NL
SDC-UK
OU-SPE
OU-SHE
K. Noddle
M. Holliman
O. Le Fèvre
M. Mignoli
A. Taylor
OU-PHZ
OU-LE3
Architecture
Performance
Abstraction
Layer (IAL)
K. Noddle
M. Melchior
Monitoring &
Control
Data
Modeling
L. Vibert
K. Noddle
M. Poncet
O. R. Williams
A. Belikov
Data Quality
LE1 common
SDC-US
J. Rector
infrastructure
Calgary,
5-9
Oct
M. Brescia ADASS XXIV,
H.2014
Teplitz
M.Frailis
S. Paltani
Organisation
Fabio Pasian
–
Euclid
Data
processing
F. Castander
Group
M.Kuemmel,M.Douspis
F.Courbin,T.Schrabback
J-L. Starck
F.Abdalla,
21 E.Branchini
Processing Functions
• Processing Functions
– are a product of the Euclid SGS (to be eventually delivered to
ESA at the end of the mission)
– correspond to the processing steps which are performed within
an «Euclid pipeline»
– are algorithmically devised by the relevant OU and engineered by
software development teams (SDC-DEV)
– can in principle be run yielding the same results on any SDC site
of the SGS (SDC-PROD, different HW environments)
• In most cases, Processing Functions are developed jointly by OU
members and their local SDC-DEV teams
– formal OU-SDC interfaces not needed in most cases
– easier to develop directly pipeline-quality code
– SGS System Team provides tools/standards/support (SDC Leads
are members of the ST)
ADASS XXIV, Calgary, 5-9 Oct 2014
Fabio Pasian – Euclid Data processing
22
requirements
SWGs
OU
validation
(on results)
algorithms,
test data
code
validation
SDC-DEV
pipeline code,
test data
SDC-PROD
pipelines
verification
SDC-PROD
ADASS XXIV, Calgary, 5-9 Oct 2014
…
SDC-PROD
Fabio Pasian – Euclid Data processing
23
The Ground Segment at a glance
scientific
community
Euclid
ESA/SOC and the EC SGS have developed,
and are committed to maintain, a tight
collaboration in order to design and develop
a single, truly integrated SGS.
VObs
EA is built jointly by EC and SOC,
and is managed by SOC. «Internal» and
«public» EA functions – the latter allows
access to a subset of EA data
SOC
Ground
Station
ESAC
DDS
MOC
ESOC
External data
(KiDS, DES, ...)
EA
An organization based on the decomposition
in Organization Units (OU), corresponding to
a subset of overall EUCLID Data Processing.
EC-SGS Project
Office
Simulation
OU-SIM
SDC
SDC
SDC
OU-PHZ
OU-SHE
OU-VIS
Phot Red Sh. Morpho & Shear
VIS Imag
SDC
OU-LE3
Level 3
SDC
SDC
OU-SPE
Spectro Meas
OU-SIR
OU-MER
Nir Spectro Euclidisation
SDC
SDC
OU-NIR
OU-EXT
Nir Imag
Ext Data
SDC
OUs are transnational
OU coordinator
OU Deputy Coordinator
ADASS XXIV, Calgary, 5-9 Oct 2014
Fabio Pasian – Euclid Data processing
24
Download