LSST Data Management Tim Axelrod Project Scientist - LSST Data

advertisement

LSST

Data Management

Tim Axelrod

Project Scientist - LSST Data

Management

Thursday, 28 Oct 2010

Outline of the Presentation

• LSST telescope and survey

• Functions and architecture of the LSST data management system

• Data Challenges

• Implementation of the LSST software stack

• Customization and use of the LSST software stack for other projects

• Availability of the software, and future plans

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 2

LSST - Large

Synoptic

Survey

Telescope

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 3

Keck

Telescope

Primary mirror diameter

Field of view

0.2 degrees

10 m

3.5 degrees

LS

ST

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 4

LSST - Essential Statistics

• Aperture diameter: 8.4m

• Effective aperture: 6.7m

• FOV: 3.5 deg

• Filters: u, g, r, i, z, y

• 3.2 gigapixels

• 2 sec, 5 electron noise readout

• Observing mode: pairs of 15 sec exposures, separated by

5 sec slew

• Single exposure depth:

~24.5

• Repetitively scan 20000 sq deg

• Site: Cerro Pachon, Chile

• Data flows at 0.5 GB/sec – all night

• 18 TB / night

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 5

5

1 arcminute

20 minute exposure on 8 m Subaru telescope

Point spread width 0.52 arcsec (FWHM)

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 6

Simulated Results of 10 yr Survey

5.3M Exposures

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 7

One Survey – Many Science Programs

• The LSST Observatory will produce a data stream which the Data Management System turns into data products.

– 0.5 GB/sec all night, every night for 10 years

– 104 PB of images at survey end

– 2.5 PB science database at survey end

• Many science programs are supported by the same data products

– Weak lensing

– Supernovae & transient astrophysics

– Milky Way structure

– Solar System inventory

– Many more in individual science collaborations

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 8

Data Management System Functions

• Process image stream from camera to generate real-time transient alerts

– Difference image based

• Periodically process entire set of survey data to produce a Data Release

– Self consistent set of data products, all processed with the same algorithms

– Full survey depth; meets SRD requirements

• Periodically produce calibration data products needed by other pipelines

• Make data available to scientists, with enough processing cycles and support to make it useful

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 9

DMS Data Products

(every 60 s)

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 10

Data Management Sites and Functions

SPIE 2010 June 30, 2010 San Diego, CA

NGSS Sky Survey Data Management – Edinburgh Oct, 2010

11

11

DMS Performance Requirements

SPIE 2010 June 30, 2010 San Diego, CA

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 12

Data Challenges progressively prototype more of the production DM System

#1

Jul - Oct, 2006

#2

Oct - Nov 2007

#3a

Sep, 2008 –

Aug, 2009

#3b

May, 2009 –

Dec, 2010

#4

Jul, 2010 –

Dec, 2011

• Validate infrastructure and middleware scalability to 5% of LSST required rates

• Use simulated data and algorithms for resource loading

• Validate nightly pipeline algorithms

• Create initial Application Framework and Middleware API and validate by creating functioning pipelines with them

• Validate infrastructure and middleware scalability to 10% of LSST required rates with CFHTLS data

• Validate complete Alert Production

• Extend Application Framework and Middleware API to support Alert Production

• Validate infrastructure and middleware scalability to 10% of LSST required rates with CFHTLS and simulated LSST image data

• Validate Data Release Production

• Assess end-to-end data quality

• Validate infrastructure and middleware reliability

• Validate infrastructure and middleware scalability to 15% of LSST required rates with full focal plane simulated LSST data at volume

• Validate database query performance

• Validate open interfaces and data access

• Validate SDQA and analytical tools

• Validate infrastructure and middleware scalability to 20% of LSST required rates

NSF Review

13

13

Data Challenge Infrastructure approaches operational DMS in complexity

SPIE 2010 June 30, 2010 San Diego, CA

LSST Data Management System (DMS)

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 15

DMS: Application Layer

• Applications Layer is organized as an Application

Framework (AFW), and packages built on top of

AFW

• AFW is object oriented, and manipulates objects from the application domain

– Imaging detectors

– Images and associated metadata

– Measurements made on images

– Catalogs of quantities derived from measurements

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 16

DMS: Middleware Layer

• Middleware Layer is organized as a set of packages

– Policy files for parameter acquisition

– Logging

– Exception handling

– Persistence

– Pipeline execution

– Data butler

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 17

Software Implementation

Object oriented design and implementation

– Use cases, domain model, detailed design in UML

Technology choices

– C++

– Python

– Swig bridge from C++ classes to python

• Access all of AFW from python prompt

– Multiple platforms (linux, OS/X, gcc, Intel icpc)

– All components open source

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 18

Using AFW

Much of our use of AFW, within pipelines or in standalone codes, is from python

All AFW classes and methods are made available to python via Swig import lsst.afw.image.imageLib as afwImage import lsst.afw.math.mathLib as afwMath im = afwImage.ImageF(filename) bctrl = afwMath.BackgroundControl(afwMath.Interpolate.NATURAL_SPLINE) backobj = afwMath.makeBackground(im, bctrl) im ­= backobj.getImageF()

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 19

AFW Beyond LSST

• We have designed AFW to work with nearly arbitrary mosaic detectors

– Supports four level hierarchy – focalplane; raft; detector; segment

– Geometry is flexible

– Readout parameters also flexible

• Prescan / Postscan columns

• Readout amplifier location

Specified in XML-like configuration file

• If AFW is used with LSST middleware layer, parallel processing of mosaics is easily managed

– Can be readily used with other middleware solutions

• AFW being actively configured and tested for

– CFHT Megacam

– CFA Megacam

– Subaru HSC

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 20

AFW Beyond LSST

• Different mosaic configurations are not the only challenge. Data organizations differ as well.

– Where are the calibration files?

– What metadata keyword identifies the filter?

– etc

• We have defined a “data butler” to mediate between a survey's data organization and processing code build on AFW, so that surveydependent code and configuration parameters are not visible at the application level

– Still in development

– Working well as part of our current Data Challenge, where it offers a uniform interface to our two data sources

• CFHTLS

• LSST simulated images

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 21

Documentation

We use doxygen to generate code documentation for both C++ and python

– Part of automated build process

– Available online: http://dev.lsstcorp.org/doxygen /

We use a publicly readable Trac wiki to tie together

– svn access to code tree

– Documentation above the doxygen level

– Ticketing system

– http://dev.lsstcorp.org/trac/wiki

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 22

Availability

• The code is currently available, but our ability to support users outside of the LSST project is very limited

• Our intent is to transition to a full open source model in which developers outside the LSST actively contribute to the code base

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 23

BACKUP SLIDES

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 24

DM Centers and Functions

• Base Center

• Near Real-time Alert Generation

• Long-term storage (copy 1)

• Archive Center

• Nightly Reprocessing

• Data Release Processing

• Long-term Storage (copy 2)

• Co-located Data Access Centers

• Data Access and User Services

• Provided via the Community Services Subsystem

• Shares Infrastructure with the Base/Archive Center also at Site

• System Operations Center

• Monitors/manages activities across centers

• Education & Public Outreach Center

• Specialized data access and services for outreach applications (not part of DM System)

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 25

Development Process

SPIE 2010 June 30, 2010 San Diego, CA

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 26

Software Development Environment

Trac integrates svn with ticketing system and development wiki

Automated checking of coding standards compliance using Parasoft

EUPS manages multiple versions of the stack at execution time

Scons used for building

Extensive testing performed as part of build process

Prebuilt, tagged versions of the stack distributed from central location

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 27

Data Challenges

Each DC is intentionally scoped to challenge the DM team on several fronts simultaneously:

• Computing, storage, and network infrastructure

• Astronomical data handling and algorithms

• Pipeline software framework (Middleware and Applications)

• Software development tools and procedures

• Project management and system engineering processes

SPIE 2010 June 30, 2010 San Diego, CA

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 28

External Components

We use a number of open source packages as components of our stack

– boost (C++ libaries)

– GIL (C++ templated imaging library; part of boost)

– numpy (python numerical math module)

– pyfits (python manipulation of FITS files)

– others – reference to full list is in the paper

Inclusion of external packages requires formal review

– Has (so far) prevented unbounded proliferation of external dependencies – important for maintainability

NGSS Sky Survey Data Management – Edinburgh Oct, 2010 29

Download