Earth Observation Payload Data Long Term Archiving The ESA Multi-Mission Facility Infrastructure Gian Maria Pinna ESA Eberhard Mikusch, Manfred Bollner DLR Bernard Pruin Werum Software & Systems AG PV-2005 21-23 November 2005 MultiMission Facility Infrastructure (MMFI) Rationale In 2003 the Agency approved a strategy for the evolution of the several missions ground segments (handled and/or to be developed) into an open multi-mission architecture, in accordance with the Oxygen concept for harmonized ESA E.O. Ground Segment implementation Basic implementation principles: Adoption of a common architecture for all missions for generic ground segment, mission independent Decomposition of the facility architecture into functional block elements Identification of mission specific and common elements Harmonization and standardization of interfaces Maximum re-use of already proven elements Standardization of products and formats across missions Adoption of strategy for all future ESA handled missions Extension of concept to European National missions Harmonization and rationalization of archives Payload Data Ground Segment Decomposition Specific-to-mission elements (Processors, Acquisition, Q/C, etc.) Mission A Monitoring & Control Mission B Archives Mission D User Services I/F Data Management Networks Examples of multimission common elements Mission C Products Packaging Payload Data Ground Segment Logical Model (OAIS-based) PDGS Administration PDGS Data Management Data Metadata & browses Queries & orders PDGS Consumer Access Interactive User Services Orders & metadata PDGS Ingestion Data Producers Products PDGS Storage PDGS Order Archived Processing products Data Output productsConsumers PDGS Services Model A PDGS for a generic mission is composed of: a Multi-Mission Central Infrastructure component, consisting of all elements required to provide User Services (cataloguing, user access, data ordering, etc.), and Quality Assurance services (payload data quality control, sensor performance assessment, etc.) a distributed Multi-Mission Facility Ground Segment (FGS) component, consisting of all elements necessary for the acquisition, ingestion, long-term archive, order processing and data disseminations to end users. Producer & Product Registration Acquisition Ingestion Long Term Archive Order Processing Data Dissemination Data Management User Access QA Data FGS Services Data Producer & Product Registration FGS Services Multi-Mission Central Infrastructure Acquisition Ingestion Long Term Archive Order Processing Data Dissemination Data Circulation FEOMI The project FEOMI (Facility Evolution into an Open Multimission Infrastructure) aims at porting the ESA’s FGSs into an MMFI-based architecture, by: Analyzing the present existing operational requirements Consolidating the MMFI architecture in order to satisfy all identified operational requirements Implementing the specific configuration in the Core MMFI to support all operational missions Developing new or Adopting/Adapting existing elements to build the new MMFI, in line with the basic concepts and logical model Developing a generic infrastructure for the usage of the MMFI by future missions’ FGS FGS Logical Model Definition The FEOMI Requirements highlighted the need for: Analysis phase explicit interoperability model elements for data and metadata exchange, to take care of the dynamics of the distributed archive with federation and co-operation between sites for data exchange, and the synchronization with the central catalogue for metadata and browse images the mapping of the model elements onto components that were either already in operations in the ESA FGSs, could be created by configuration of COTS or required to be custom built Facility Ground Segment Logical Model Other FGS Multi-Mission Central Infrastructure DI Cataloging IDIP DI Ingest DI DI Access ISIP SIP Production Request Handling Exchange AIP Archiving ISIP SIP IDIP AIP Dissemination Monitoring & Control IDIP Processing IDIP Facility Ground Segment Legend: SIP – Submission Information Package ISIP – Internal Submission Information Package DI – Descriptive Information AIP – Archival Information Package IDIP – Internal Dissemination Information Package DI – Dissemination Information Package DIP DIP FGS Integration Following the logical model defined by the FEOMI project, the elements required to build the Facility Ground Segment for a specific mission are: The so-called Core MMFI, i.e. a set of unconfigured multimission elements and services deployed in a suitable infrastructure. Several optional Mission Specific Elements (MSEs), accounting for the specificities of the missions sensors data, as seen before typically processing systems. The specific configuration of the MMFI Elements to allocate the specific services required by the mission, e.g. the ingestion and cataloguing of the specific datasets generated in the FGS. Facility Ground Segment Decomposition MM Central Services Data Library ULS Core MMFI PSM CAR POH AMS PSM PR SatStore Online PSM Product Distribution PSM Processing System GFE FGS Request Handling Local Inventory Archive E-PFD . PFD Central Server PFD Network Server DRS NRT Site Cache Circulation Cache In Ingestion Monitoring & Alarm Processing Monitoring Logging & Alarm Circulation Cache Out Dissemination Monitoring and Control Operating Tool Data Consumers Data Producers Mission-Specific Elements MMFI configurations for missions MMFI Architecture Multi-Mission Central Infrastructure MMMC MMOHS Data Library Local Inventory AMS Request Handling CAR IF PR IF ULS POH Online Archive PSM Processing PSM Processing System Processing System System Other GFE GFE Product Distributor PFD IPF Other Proc. IPF Other Proc. IPF Proc. NRT DDS, ... Site Cache Circulation Cache In Processing Ingestion Monitoring & Alarm Monitoring & Alarm Logging Monitoring and Control Circulation Cache Out Dissemination Operating Tool MMFI Ingestion Controlled by the "Generic Front End" (GFE) Configurable workflow engine Registration of data producers and product types Ingestion of products with validation, metadata extraction, browse generation, etc. Generation of master catalogue (MMMC) update files Standard set of re-usable plug-ins relevant to ingestion Standard interface for the implementation of new plugins Access to data (including binary data) via Data Request Server (DRS) Metadata extraction and browse generation during ingestion based on the DRS MMFI Data Library Based on the "Archive Management System" (AMS) and the "Local Inventory" (LI) The AMS manages the actual long-term archiving of the data products Abstraction layer toward underlying storage technology unique storage technology adopted by ESA for its EO missions, in order to achieve maximum harmonization and standardization Automation of operations On-line access via client-server services Internal storage organization transparent to clients Clients data access rights management Products subsetting to handle HBR data with EAST and DRB The LI is the local catalogue indexing all products archived in the long term archive at the centre Indexing and management of on-line archive UpLoad Server (ULS) for metadata synchronization with MMMC MMFI Requests Handling Based on the “Product Order Handling” (POH) system Handles the on-demand production and dissemination requests received from the ESA’s Multi-Mission Order Handling System (MMOHS) Organizes the required workflow based on the product type and output medium requested in the order Scheduling of data validation and retrieval from Data Library, processing, QC verification, dissemination to users Reports to MMOHS the Production Request statuses The POH is supported by a set of auxiliary components that interface other MMFI elements and provide specific functionality for workflow management MMFI Dissemination “Product Formatting and Delivery” (PFD) Dissemination workflow management handling different dissemination channels in a configurable manner Optional reformatting and compression (different methods available, including wavelet/JPEG2000) Formatters with specific or generic plug-ins (DRB available for enhanced flexibility) Concurrent dissemination orders with priorities handling Media generation, on tape drives or CD/DVD. Small tape libraries for tapes and RImage CD/DVD Producer Generation of unique medium id with barcode printout, back inlay and delivery note for the end user PFD Network Server addition to perform electronic delivery with dynamic generation of random account and notification to user by e-mail. Notification back to POHMMOHS of URL for user. MMFI Dissemination (2) “Product Distributor“ (PD) Based on PSM (Processing System Management) workflow engine systematic dissemination management (subscription and standing requests handling) systematic product dissemination (by timers and triggers) dissemination via PFD dissemination via DDS and other satellite multicast systems direct dissemination to ftp accounts and file locations product circulation to ftp accounts and file locations product circulation management with state based circulation control dissemination and circulation reporting MMFI Processing PSM (Processing System Management) Main function is the abstraction of the processing systems to the higher level elements of the MMFI Allows integrating mission and sensor specific processing facilities with minimal effort and offering a choice of protocols for the definition of the MMFI - processing facility interface Also used in other elements thanks to the workflow model and interface with LI, able to perform complex scheduling of other elements: PSM-CAR (Check And Release) PSM-PR (Product Retrieval) PSM-PD (Product Distributor) Powerful subscription mechanism to LI, based on OQL, to be notified of data appearance in the archive (systematic processing, reprocessing, circulation, etc.) Systematic data-driven reprocessing gfe :GFE drs :DRS li:LI ams : AMS ps:PSM_B ips :Processor mmmc :MMMC 1 : subscribe () 2 : putItem (x ) Preceding 3 : notify () 4 : putProcessingRequest interaction suppressed 5 : getItem () 6 : productRetrieval () 7 : start () 8 : done 9 : smartPolling () 10 : productInspection () 11 : browseGeneration () 12 : metadataExtraction () 13 : archiveKey () 14 : productArchiving () 15 : putItem (register ) 16 : exchange (add ) () MMFI Use Cases - Circulation gfe1:GFE li1:LI ams1:AMS pd1: PD gfe2:GFE li2:LI 1: subscribe () Preceding interaction suppressed o1: subscribe () 2: putItem(x) 3 : notify() 4: putCirculationRequest () 5: getItem() 6: productRetrieval () 7: create() pr:ISIP 8: smartPolling () 9: ingest 10: productArchiving () 11: putItem(register) o2: notify() o4: destroyItem () o3: circulationAcknowledge () ams2:AMS MMFI Advanced Features The MMFI implements various features covering preservation and value adding concepts: Support to operational scenarii for preservation strategies, e.g. periodic migration of digital products to new information technology Encapsulation by self-describing items as defined by OAIS model information packages. The maintenance of metadata is performed and means for data consistency are supplied Support for automated production by means of sophisticated data access and processing management Modular design, open architecture and streamlined interfaces that permit an easier substitution of one or more of its elements, if the need will arise in the future, to ensure the long-term preservation of its data holdings and of its services Special attention was put on the architecture for a well balanced assignment of functionality-to-components. MMFI System Design Features Archiving technology migration, by shielding the physical data-sets from the applications, using several software layers, that can all be used to handle lower level technology changes: Hierarchical Storage Management (HSM) incorporated in the AMS. A potential change of the HSM technology would be fully resolved in the AMS. Limited number of components directly interfacing the AMS, thus also a change of the AMS or AMS interfaces could be handled with minimal archive impact. Technological evolution and scalability through the modularity, achieved by the architecture largely built from autonomous, networked components that can be combined by configuration: Simplified exchange of some components by other implementations Handling of increased load by instantiating more components in optimised configurations (e.g. increase the number of processing nodes) Processor integration, to cope with changes to the Data Processors used for adding value to the EO data: The PSM framework supports the integration of processor by natively supporting a variety of processor interfaces and by allowing integrating processor adapters The flexibility of this approach makes it possible to substitute processors and processor interfaces without undue effort and without affecting other parts of the participating workflows Product data model migration, to cope with the evolution of processing algorithms that requires changes in the data models of the products to be archived: Configurable product object models within the Data Library that can be extended MMFI Functional Features Cataloguing and archiving Basic feature to consistently manage data and metadata for longterm. In addition advanced access capabilities are available to search and retrieve products for automated production Automated Product Ingestion During automated product ingestion data products are checked for consistency before archival. Metadata are extracted and where applicable browse images are generated for catalogue applications Order driven processing and delivery This is the classic dissemination workflow initiated by a user order. Optionally a value adding processing step may occur before delivery Systematic data driven processing Systematic processing describes the capability to initiate automatically processing workflows for higher level products upon the reception of a lower level product Systematic re-processing Used to generate a new revision of a product collection due to processing algorithm or configuration update. It’s a processing schema with data from/to archive Systematic dissemination Subscription-type systematic dissemination is similar to systematic data driven processing with the difference that the newly arrived products are not processed but delivered to one or more customers Online archive access Online archive access allows to directly retrieving the product data with a file based transfer protocol Data circulation Data circulation function distributes the data between centres to serve data migration purposes incl. auxiliary products for remote processing MMFI Systematic Workflows Data Library Local Inventory Level x Notification AMS V1.0, V2.0, (V1.0+2.0) Level 0 Notification Acq. PSM Processing System 10110010 00010101 10111001 01000100 IPF 10110010 00010101 10111001 01000100 10110010 00010101 10111001 01000100 Other Proc. Processing Online Archive PSM Re-Processing Pointer PSM 10110010 00010101 10111001 01000100 User Circulation Cache Out Other Proc. Re-Processing Dissemination Processing 10110010 00010101 10111001 01000100 V 2.0 V 1.0 Level 0 PFD Processing System IPF 10110010 00010101 10111001 01000100 Product Distributor Level x User Access