From HARM to xxx

advertisement
From HARM to xxx
The ESA’s Proposal for a
Standard Archive Format for Europe
Gian Maria Pinna, Vincenzo Beruti
ESA
Stephane Mbaye, Mathias Moucha
Gael Consultant
Valter Spaventa, Davide Castellazzi ACS S.p.A.
PV-2005
21-23 November 2005
LTA Strategy and Rationale



The mandate for ESA’s EO missions is to
maintain the archive for at least 10 years
after the end of the mission
The management of its vast amount of
heterogeneous datasets poses problems for
their long-term preservation
The ESA’s EO Ground Segment Department
has recognized since many years the need to
establish a clear strategy for the management
of the long-term archives.
LTA Strategy and Rationale

This strategy has as its main goals:






archive maintenance in order to ensure data
integrity
archive and data management rationalization
data conversion to new technologies in order to
reduce cost of operations
enhancement of data access
standardization of the format in which the datasets
are preserved
Achievements aim at simplifying the exchange
and interoperability of data between ESA and
external operators
Long-Term Preservation Challenge
The data to be preserved for the long term require a
special attention, which reflects in costly operations for
their exploitation and maintenance:





datasets have to be regularly converted into new media
technology, to prevent the problems created by their
obsolescence
being the long term archive normally based on datasets with a
very low processing level, higher level products have to be
generated by processing systems
in the case of distribution of the data holdings directly from the
archive, it is normal practice that they are converted or
reformatted into a format more oriented to the end-user
utilization
it is common to have to extract from the archive a portion of the
dataset (subsetting) to create a “child” product
more and more the data are distributed to the end-users, and
exchanged among data holders, in electronic format over network
infrastructures (private Intranets, public Internet, academic
networks, etc.)
Cost of LTA Operations
One of the reasons that contribute to the high operations
and maintenance costs of the long term archives is the
excessive proliferation of diverse and heterogeneous data
formats, caused by mainly three reasons:



the lack of an agreed standard in the Earth Observation
community, reason for which the formats tended to be specific for
the sensor(s) each missions carried on board
legacies from old ground segments architectures, which tended not
to reuse previously developed elements
the non-mature status until recently of the information
technologies and standards used to describe and package the data,
preventing the creation of a unique format able to satisfy at the
same time the requirements for the long-term data preservation
and their handling in the processing centres
HARM
(Historical Archive Rationalization and Management)
HARM aims at achieving the following main goals:





analysis and definition of a new ESA standard long term archive
data format  SAFE
conversion of archived data from the original format into the new
format and transcription onto the ESA’s automated long term
archive, based on the ESA’s MMFI (Multi Mission Facility
Infrastructure) architecture
rationalization of the archives by centralization of historical data
(non-active mission) and archive of on-going missions into
specialized centres
optimization of the archives, reducing operator intervention for
system maintenance and improvement of archive capacity through
removal of redundant data (whenever applicable and due to orbit
overlap, duplicated transcriptions etc.)
generation of the related metadata and inventory information, to
be ingested in the centralized ESA catalogue
HARM activities
The operational plan for the HARM project is based on
four logical phases, some of them overlapping




loading of the data from loose media into the AMS in the original
data format, in order to avoid waiting for the SAFE format
consolidation, minimize the processing applied and maximize the
usage of existing hardware (media drivers and stackers) in order
to reduce as much as possible the operator’s intervention
conversion of the datasets into the SAFE format, using the AMS
library (fully automated operations). This phase can be
contemporary and concurrent with phase 1
concentration of homogeneous data (same mission, sensor and
mode) into specific locations, in SAFE format
orbits stitching at the target archives of the datasets already
fully converted in SAFE format, with regeneration of browse and
metadata for the ESA’s central catalogue.
ESA’s goal is to make SAFE:




Compliant with ISO 14721:2003 OAIS (Open Archival
Information System) reference model
Compliant with emerging CCSDS/ISO XFDU (XML
Formatted Data Units) packaging format
Designed to act as a standard format for archiving
and conveying data within ESA Earth Observation
archiving facilities
Although the primarily goal of SAFE is to handle EO
data with processing levels close to the usually called
“level 0”, no a-priori limitations exist for the
packaging of higher level products
SAFE Information Model
Product Information
Acquisition
Period
0..1
Platform/Sensor
Identification
1
Product History
1..*
SAFE Product
Product Data/Metadata
Metadata
Orbital Information
0..1
EO Data
*
1..*
Geolocation Information
0..1
Quality/Fixity Information
0..1
Representation Information
1
SAFE Product Information



Acquisition period: the acquisition period metadata
that provide the time extents of all the data
contained in the SAFE product
Platform/Sensor identification: the platform
identifies the system (satellite/aircraft) that
acquired the EO data wrapped by the SAFE product
Product History: a processing log collecting the
historical information dedicated to the maintenance
and the traceability of the product
SAFE Product Data/Metadata




Orbital information: the reference to the trajectory of the
platform that acquired the data. This information may locate one
or several orbit paths, the corresponding cycle, track, etc.
Geolocation information: the information locating the product
footprint on the Earth’s surface, either as a series of tie points or
by reference to a world reference system of the acquiring
platform. The Geolocation information, may also attach additional
information to each localized element, including cloud coverage
vote, meteorological information, etc.
Quality/Fixity information: information about the quality of an EO
dataset. SAFE makes use of techniques (i.e. XPath) that allows the
precise location of the corrupted or missing elements up to the bit
level
Representation information: any data contained in a SAFE product
shall be accompanied with its representation information formally
and numerically exploitable
SAFE Logical Model



A SAFE product is a logical tree of “Content
Units” forming the so-called “Information
Package Map”
The root Content Unit has predefined
associations to the information applicable to
the overall product, i.e. at least the
“Acquisition Period”, the “Platform/Sensor
Identification” and the “Product History”
The structure of the children Content Units is
less constrained and depends mainly of the
logical view of the wrapped data. In most
cases, one Content Unit matches one EO
dataset and its accompanying metadata.
Several Content Units may, however, share the
same metadata.
SAFE Logical Model
- Acquisition Period
- Platform/Sensor Identification
- Product History
SAFE Product
Content Unit
SAFE Product
Metadata Object
1
*
*
1..*
SAFE
Content Unit
1..*
*
Data Object
0..1
-
Orbital Information
Geolocation Information
Quality Information
Quality/Fixity Information
Representation Information
SAFE Physical Model
Contains:
- Information Package Map (Content Units)
- Metadata Objects
- Acquisition Period
- Platform/Sensor Identification
- Product History
- Orbital Information
- Geolocation Information
- Quality Information
- Quality/Fixity Information
Manifest
XML file
1
URL
SAFE
Product
URL
*
Binary or
XML files
Referenced/external
Metadata or Data Objects
*
XML Schema
files
Representation information is always
provided as external XML Schema documents
SAFE Physical Components



Manifest file: an XML document conforming the XFDU Manifest
file. It contains the definition of the Information Package Map,
the wrapped Metadata Objects, the wrapped Data Objects and
references to the external files containing the Metadata and Data
Objects
Binary or XML files: the data or metadata object contents.
Currently, only two types of files have been identified, i.e. binary
matching MIME octetstream definition and XML documents. Each
of these files shall be accompanied with one or more XML Schema
document controlling its content
XML Schema files: the representation information of the data
held by a SAFE product. In order to represent the binary
information, SAFE also defines specific markups that annotate the
XML Schema documents to provide information on the physical
structure (SDF markups). Thanks to these specific annotations,
the contents of the binary files are described up to the bit level
with a common technique as for XML documents
Preservation of Binary Objects


Annotated XML Schemas
Structured Data File (SDF) markups
<xsd:complexType>
<xsd:sequence>
<xsd:element name="record" type="xs:int"
minOccurs="0" maxOccurs="unbounded">
<xsd:annotation>
<xsd:appinfo>
<sdf:block>
<sdf:occurrence query="../header/numRec + 1"/>
<sdf:length unit=‘bit’>12</sdf:length>
</sdf:block>
</xsd:appinfo>
</xsd:annotation>
</xsd:element>
</xsd:sequence>
<xsd:complexType/>
SDF Markups (1)
SDF Markups (2)
SAFE Volume Set
Identification
Volume
Title
PGSI-GSEG-EOPG-FS-05-0001
Volume 1
Core Specifications
PGSI-GSEG-EOPG-FS-05-0002
Volume 2
Recommendation for specializations
PGSI-GSEG-EOPG-FS-05-0003
Volume 3 Book 1
ENVISAT ASAR Products
PGSI-GSEG-EOPG-FS-05-0004
Volume 3 Book 2
ENVISAT MERIS Products
PGSI-GSEG-EOPG-FS-05-0005
Volume 3 Book 3
ENVISAT AATSR Products
PGSI-GSEG-EOPG-FS-05-0006
Volume 3 Book 4
ENVISAT DORIS Products
PGSI-GSEG-EOPG-FS-05-0007
Volume 3 Book 5
ENVISAT GOMOS Products
PGSI-GSEG-EOPG-FS-05-0008
Volume 3 Book 6
ENVISAT MIPAS Products
PGSI-GSEG-EOPG-FS-05-0009
Volume 3 Book 7
ENVISAT RA2 Products
PGSI-GSEG-EOPG-FS-05-0010
Volume 3 Book 8
ENVISAT MWR Products
PGSI-GSEG-EOPG-FS-05-0011
Volume 3 Book 9
ENVISAT SCIAMACHY Products
PGSI-GSEG-EOPG-FS-05-0012
Volume 3 Book 10
ENVISAT Auxiliary Data Products
SAFE Volume Set
Identification
Volume
Title
PGSI-GSEG-EOPG-FS-05-0013
Volume 4
ERS Products
PGSI-GSEG-EOPG-FS-05-0014
Volume 5
Landsat Products
PGSI-GSEG-EOPG-FS-05-0015
Volume 6
Terra/Aqua MODIS Products
PGSI-GSEG-EOPG-FS-05-0016
Volume 7
NOAA AVHRR Products
PGSI-GSEG-EOPG-FS-05-0017
Volume 8
SeaStar SeaWiFS Products
PGSI-GSEG-EOPG-FS-05-0018
Volume 9
Nimbus CZCS Products
PGSI-GSEG-EOPG-FS-05-0019
Volume 10
SPOT HRV(IR) Products
PGSI-GSEG-EOPG-FS-05-0020
Volume 11
JERS SAR/OPS Products
PGSI-GSEG-EOPG-FS-05-0021
Volume 12
MOS MESSR/VTIR Products
PGSI-GSEG-EOPG-FS-05-0022
Volume 13
IRS-P3 MOS Products
SAFE I/O API
In addition to the Core SAFE specifications and the Specializations,
the HARM project is implementing a set of software APIs dedicated
to SAFE product users. The main use cases of these software
components are:










Create a SAFE product
Open a SAFE product
Remove a SAFE product
Move a SAFE product considering or not the referenced objects
Validate a SAFE product including or not the external object contents
Add/Remove/Browse Content Units of the Information Package Map
Add/Remove/Retrieve the Metadata Objects
Add/Remove/Retrieve the Data Objects
Browse/Select/Extract information from either Binary or XML objects
Identify/Sort Quality Information to a specific part of the objects
SAFE I/O API

XFDU Java API: this API provides the
general features common to all XFDU
packages. This API has been developed jointly
with the SAFE Java API and is presently used
in the framework of the activity of the
CCSDS Information Packaging and Registries
(IPR) Working Group in support to the
validation of the XFDU recommendation
developed by the group
XFDU C++ Wrapper: a C++ wrapper on top of
the XFDU Java API. Most of the functionality
is preserved from the Java API, apart the capability of browsing the
binary contents


Data Request Broker (DRB): from which SAFE inherits the capability to
access data independently from their formats. DRB supports in particular
the SDF markup language used for annotating the XML Schema
documents acting as representation information of the SAFE product
objects.
SAFE Benefits




SAFE is mainly devoted to the long-term preservation of
data, as its full compliance with the CCSDS/ISO OAIS
RM and XFDU standards reveal
SAFE permits an easier and more effective migration of
data to other future standards
SAFE permits an easier reformatting to other formats,
including end-user formats for products distribution
SAFE will easy the maintenance of the SW that use the
data, even historical, thus decreasing the chances of
obsolescence and improving their usability
SAFE Benefits



SAFE being self describing, it greatly facilitates the
format maintenance
SAFE format documents are built automatically from
the XML Schemas, so the maintenance of the SAFE
format is more robust with respect to “old-fashion”
formats
SW that access SAFE-formatted datasets will benefit
from the usage of the same SAFE XML Schemas for
reading and writing the datasets. This greatly improves
the overall cycle of creation/ingestion/quality control/
transformation/distribution of the datasets.
http://earth.esa.int/SAFE/
(under construction)
Download