“ HST DATA MANAGEMENT SYSTEM" Dr Rosa Diaz Space Telescopes Science Institute March 17, 2010 HST Data Management System Overview: What is DMS What we get from HST Processing HST Data The HST Pipeline New Infrastructure Archive The DMS System The Data Management System (DMS) is responsible for some of the development of the data processing and archive systems at STScI. These systems include the Data Archive and Distribution System, StarView, web interface to the Multimission Archive at Space Telescope. Science Data Receipt Pipeline (PACOR) CDBS Pipeline Pre-Archive Science Pipeline OTFR Pipeline Science End-to-End testing Sample Pipeline From HST to ground Tracking and Data Relay Satellite System (TDRSS) Average of ~ 20 TDRSS contacts/day using TDRS East and West Average ~7 SSA returns/day needed for SSR dumps S IC & DH (SSR) From HST to ground S IC & DH Tracking and Data Relay Satellite System (TDRSS) White Sands, New Mexico (SSR) From ground to STScI White Sands, New Mexico Domestic satellites Network Control center, Goddard Space Flight Center, MD Ground connection STScI On Average 6 hours since data was taken First stages of data manipulation Science data arrives at STScI in the form of telemetry packages • Science Data is converted to FITS files, calibrated, and archived • Engineering data is stored in a separate place in the SSR and downlinked at a different time. •Science data needs information from engineering data and cannot be calibrated until it is received. The Archive Operations/pipeline USER CALXXX Generic Conversion DB Catalog OTFR No Archive Yes POD Files Safe Store Reprocessing Ingest Mirror sites DMS Operations- Functional Architecture OPUS Condor A NHPPS Workflow The software release process CALXXX OPUS release Pass OPUS Testing OPUS fail Regression Test INS Team Testing CALXXX issue CALXXX development Pass CALXXX to OPUS OPUS Development A CALXXX (STSDAS) New Operations Server Architecture Note 2 : Test/Processing hardware identical to Operations systems and may function as failover – future clustering options Note 1 : Virtualized Development Environments successfully deployed Note 3 : FY11 purchase of additional Operation Database server for external service read access 11 HST DMS Storage EMC CX4-480 - HST Primary Archive on SAN CX-4 storage Sunfire 15K dev, test, ops HLSP - New Linux file systems - New Windows MS SQL database file systems on fiber channel drives - Replacing 1TB drives with 2TB to reclaim tray space HST MSR Multimission Archive OTFR & Static Archive Instrum ent OTFR ACS YES COS YES Static Archive Mission BEFS WFPC2 YES FOC YES FOS YES GHRS YES Coperinicus DSS EUVE FUSE GALEX GSC HPOL HUT IMAPS KEPLER IUE HSP YES TUES WFPC YES SDSS UIT VLA NICMOS YES STIS Pre SM4 WFC3 YES Post SM4 WUPPE Description Berkeley Extreme and FUV Spectrometer (FUV + NUV spectra) Digitalized Sky Survey Extreme UV Explorer FUV Explorer Galaxy Evol. Explorer Guide Star Survey Spectrometer Hopkins UV Telescope The ISM Abs. Profile Spectrograph International UV Explorer Tubingen UV Echelle Spectrometer Sloan DSS UV Imaging Telescope Very Large Array Wisconsin UV Photopolarimeter Experiment http://archive.stsci.edu/hst/ http://starview.stsci.edu/web/ HST data volume HST orbits the earth in 96-97 minutes 104 orbits per week Only about 80 orbits are used to take science data About 16 GB of data are received per day In January 2010 a total of 497.0 GB were archived (16.03 GB/day) and 3486.9 GB were retrieved from the archive (112.48 GB/day) Size of HST data per instrument Instrument ACS COS STIS WFC3 MB/dataset 140 258 29.3 115 Data volume processed by the STScI Archive Calibration pipelines available for all the instruments, including the legacy instruments. Data can be reprocessed when new calibration data becomes available 6000 HST Archive Activity 5000 4000 3000 2000 1000 0 Total Retrievals Science Retrievals Ingest Other missions supported by the Multiarchive System are Kepler and GALEX Average Number of Requests per week Average daily values for each week >500 TB in USE DEPOT Tier 3 Tier 1 EMD MAST(GALEX, HLA, GSC, DSS) 62 TB Tiered Storage Solutions TIER 1: High performance Random and Sequential I/O Database Access for Catalogs and Large Scale Indexed datasets GSC2, DSS, HLA catalogs and footprints, GALEX TIER 2: Online mid level performance for data access with high reliability (HRAS) HST DEPOT and OTFR Calibration Files (50TB w/SM4) JWST Primary Archive (100TB) TIER 3: Lower Cost with load balance and failover HLA and MAST Data Product Files, High Capacity (0.5 Petabyte) 21 Phased transition – Stage 1 SAN FIBER CHANNEL SWITCH New SAN – 8Gb FIBER CHANNEL SWITCH Kepler 15K 18 slots 10 boards Kepl. Ops (n) Kepl. Test (m) HST 15K 18 slots 72 CPUs HST Dev Linux Server PORT A/B Pipeline (32) Tbd ( ) Tbd ( ). PORT A Code Dev (12) DB1 Ops (4) Test DB2 Ops (4) (20) OS Test (4) DADS/Pipeline Windows DB Server Dev DB NEW HST Test Linux Compute Cluster Pipeline DADS HST Storage EMC 32TB Symmetrix Kepler Storage EMC Clarion 60TB HST Storage EMC CX-4 48 TB DB1 Ops Kepl. Ops DB1 Ops DB2 Ops Windows Test Databases Code Dev Migrate Kepl. Test Storage Test DB Test DB Test (4) Data Depot DB2 Ops Code Dev Test STIS WFPC2 HST Pipeline DB Test Data Depot HST DMS Kepler DMC Architecture Development: Infrastructure Testing and Validation Long Term - HST DMS Operations Architecture SAN – 8Gb FIBER CHANNEL SWITCH Kepler 15K 18 slots 10 boards HST Dev Linux Cluster HST Test HST OPS Linux Cluster Linux Cluster DADS DADS Pipeline Pipeline DADS/Pipeline Kepl. Ops (n) Kepl. Test (m) Tbd ( ) Windows Database Tbd ( ). DB1 Ops (4) DB Test (4) Test Databases OPS Databases HST Storage EMC CX-4 48 TB Kepler Storage EMC Clarion 60 TB DB1 Ops Kepl. Ops DB2 Ops Kepl. Test HST HLA EMC Clarion 40 TB HLA MAST/HLA EMC AX4 120 TB HLA HST HST GALEX DSS Code Dev MAST General Test DB Test CentralStore EMC Clarion 162 TB Science General Use STIS WFPC2 Data Depot HST Pipeline HLA Servers HLA Servers HLA Servers HST DMS Kepler DMC MAST/HLA STScI general Sunfire 15K Systems Cutoff TIB Server Other Servers