Upcoming Enhancements to the HST Archive Mark Kyprianou Operations and Engineering Division

advertisement
Upcoming Enhancements to the HST Archive
Mark Kyprianou
Operations and Engineering Division
Data System Branch
Enhancements in DMS

JWST requirements
 Define a better archive

HST Mission Office
 Identify weaknesses/areas that could be improved
 Allocate resources to implement enhancements in a few areas

A “win” for all missions
 Common code to support
 Common system to operate
 Better services to our customers

Target areas for HST enhancements:
 Workflow Manager
 Reprocessing
 Online Cache
 Distribution/UI
01/19/2012
HST DMS Enhancements
2
Workflow Manager
HST Workflow Manager –
today it’s OPUS

OPUS Workflow manager in use since 1995

All current HST level-0 -> level-2 data processing is performed
using OPUS pipelines.

Why change? There are significant risks with using OPUS
throughout the remaining HST lifetime

Reliability – OPUS GUIs (OMG, PMG) are “fragile”
-
Susceptible to failure with additional network security
Use Java “thick-client” application technology (NOT web-friendly)
Have not been rebuilt for Windows platform in many years
HST Workflow Manager - Alternatives

A JWST Workflow Manager trade study has just been completed
and recommends a new workflow technology for JWST: Condor
/Open Workflow Layer (OWL)

Transition HST from OPUS to Condor/OWL?
-
Provides a “technology refresh” which should serve HST throughout its
remaining lifetime
– Condor has a huge world-wide user base and undergoes continuous
development and improvement

-
Allows STScI pipeline operations team to focus on a single workflow
manager system, rather than learn to operate more than one
-
Provides significant “upside” with flexibility for taking advantage of
distributed computing resources, both on-site and off
Workflow Framework Trade Study (FOO - Future of
OPUS) https://trac.stsci.edu/trac/DMS/wiki/FutureOfOpus
Reprocessing
Rationale for Reprocessing

Calibration improves over time

Science instrument performance better understood

Data better understood over time


01/19/2012
Reflected in improved calibration algorithms and reference files
Additional keywords and improved data formats
Pipeline software error corrections
HST DMS Enhancements
7
On the Fly Recalibration: OTFR


01/19/2012
Advantages

The user gets the benefit of the very latest data processing and
calibration enhancements at the time of their archive retrieval.

Less archive storage since calibrated data products not on disk

Unpopular data do not get reprocessed
Disadvantages

Delay in retrieval while reprocessing, could be substantial if there is
a large retrieval queue

No direct access to data

All data not accessible through VO protocols

Popular data get identically reprocessed many times
HST DMS Enhancements
8
Reprocess on Change

Advantages

Rapid data retrieval through direct synchronous access or batch
request

Data accessible through VO protocols

Disadvantages

Requires development of more complex reprocessing software
system
-
01/19/2012
Allows for data mining
Logic needed for when to initiate reprocessing and where to start in the
pipeline
HST DMS Enhancements
9
Reprocessing Concept (1/3)

The Reprocessing System will automatically recalibrate affected
observations when updates to calibration reference files or the
calibration software are approved and released.


The Reprocessing System will monitor changes in calibration
reference files and software.

01/19/2012
Other improvements to the quality of data products may trigger
reprocessing.
The Calibration Reference Data System will track changes to the
calibration reference files
HST DMS Enhancements
10
Reprocessing Concept (2/3)

The latest version of all data are stored in the archive.

Reprocessed data products replace their previous version in the
primary archive.

If the Archive User Interface indicates that the data being
requested do not have best calibration, archive users will be
notified prior to retrieval.

01/19/2012
Archive users accept existing calibration or wait until calibration is
updated.
HST DMS Enhancements
11
Reprocessing Concept (3/3)

01/19/2012
The order of data processing will take into account items such as:

Data designated to be processed immediately.

Processing on initial receipt of data from the telescope.

Reprocessing of an observation less than one year from execution
requested by an archive user.

Reprocessing of an observation more than one year from execution
requested by an archive user.

Reprocessing of data less than one year from execution.

Reprocessing of data more than one year from execution.
HST DMS Enhancements
12
Archive User Decision Tree for Direct Download
User waits
01/19/2012
HST DMS Enhancements
13
Data Storage:
Online Cache
Storage Broker Concept

Optimizes management of large scale distributed data storage
resources

Provides a uniform interface to heterogeneous data storage
resources over a network


ingest (adding files to the system)

accessing files

security
Uses common metadata for file storage and location

Utilizes a database schema for mapping of the logical file layer to
the physical disk locations on storage media.

Provides independence from the hardware platforms (mainframes,
intermediate systems, servers, PCs).

Provides transparent use of public network protocols (SFTP,
HTTP, etc.)

Simplifies file exchange between applications and mirror sites
01/19/2012
HST DMS Enhancements
15
Data Storage Key Features


01/19/2012
The Storage Broker (SB) supports:

Internal archive RAID based disk storage for long term data
preservation (Primary Data Store).

An online file storage of files for fast, immediate access.

An offline, offsite data backup of the file storage (Safestore).
The SB provides online access to the latest version of the
processed data.
HST DMS Enhancements
16
Distribution and Archive User Interface
Data Distribution Concept



01/19/2012
There are two complementary concepts for data distribution.

Batch distribution

Direct distribution
Batch distribution

XML request generated by Archive User Interface and passed to
Distribution.

No further user interaction is needed once the request is submitted.
Direct distribution

User has direct access to files through URL.

Supports VO services.

Necessary for data mining.
HST DMS Enhancements
18
Archive User Interface Concept

The Archive Users Interface (AUI) will provide means to search for
data including Program/PI searches; spatial, time and wavelength
searches.

After users identify data of interest the AUI will provide an option
of download method and prompt for authentication / authorization
information for use with proprietary data.

AUI will provide the status of the requested data (e.g. best
calibration available or data are in reprocessing queue.) and
permit user to select if they want to wait for new data.

Distribution shall record metrics for user transactions, such as IP
address, user ID, files selected, distribution mode and format, and
download size and time.
01/19/2012
HST DMS Enhancements
19
Download