Thursday, 28 Oct 2010
Outline of the Presentation
• LSST telescope and survey
• Functions and architecture of the LSST data management system
• Data Challenges
• Implementation of the LSST software stack
• Customization and use of the LSST software stack for other projects
• Availability of the software, and future plans
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 2
LSST - Large
Synoptic
Survey
Telescope
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 3
Keck
Telescope
Primary mirror diameter
Field of view
0.2 degrees
10 m
3.5 degrees
LS
ST
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 4
LSST - Essential Statistics
• Aperture diameter: 8.4m
• Effective aperture: 6.7m
• FOV: 3.5 deg
• Filters: u, g, r, i, z, y
• 3.2 gigapixels
• 2 sec, 5 electron noise readout
• Observing mode: pairs of 15 sec exposures, separated by
5 sec slew
• Single exposure depth:
~24.5
• Repetitively scan 20000 sq deg
• Site: Cerro Pachon, Chile
• Data flows at 0.5 GB/sec – all night
• 18 TB / night
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 5
5
1 arcminute
20 minute exposure on 8 m Subaru telescope
Point spread width 0.52 arcsec (FWHM)
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 6
Simulated Results of 10 yr Survey
5.3M Exposures
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 7
One Survey – Many Science Programs
• The LSST Observatory will produce a data stream which the Data Management System turns into data products.
– 0.5 GB/sec all night, every night for 10 years
– 104 PB of images at survey end
– 2.5 PB science database at survey end
• Many science programs are supported by the same data products
– Weak lensing
– Supernovae & transient astrophysics
– Milky Way structure
– Solar System inventory
– Many more in individual science collaborations
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 8
Data Management System Functions
• Process image stream from camera to generate real-time transient alerts
– Difference image based
• Periodically process entire set of survey data to produce a Data Release
– Self consistent set of data products, all processed with the same algorithms
– Full survey depth; meets SRD requirements
• Periodically produce calibration data products needed by other pipelines
• Make data available to scientists, with enough processing cycles and support to make it useful
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 9
DMS Data Products
(every 60 s)
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 10
Data Management Sites and Functions
SPIE 2010 June 30, 2010 San Diego, CA
NGSS Sky Survey Data Management – Edinburgh Oct, 2010
11
11
DMS Performance Requirements
SPIE 2010 June 30, 2010 San Diego, CA
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 12
Data Challenges progressively prototype more of the production DM System
#1
Jul - Oct, 2006
#2
Oct - Nov 2007
#3a
Sep, 2008 –
Aug, 2009
#3b
May, 2009 –
Dec, 2010
#4
Jul, 2010 –
Dec, 2011
• Validate infrastructure and middleware scalability to 5% of LSST required rates
• Use simulated data and algorithms for resource loading
• Validate nightly pipeline algorithms
• Create initial Application Framework and Middleware API and validate by creating functioning pipelines with them
• Validate infrastructure and middleware scalability to 10% of LSST required rates with CFHTLS data
• Validate complete Alert Production
• Extend Application Framework and Middleware API to support Alert Production
• Validate infrastructure and middleware scalability to 10% of LSST required rates with CFHTLS and simulated LSST image data
• Validate Data Release Production
• Assess end-to-end data quality
• Validate infrastructure and middleware reliability
• Validate infrastructure and middleware scalability to 15% of LSST required rates with full focal plane simulated LSST data at volume
• Validate database query performance
• Validate open interfaces and data access
• Validate SDQA and analytical tools
• Validate infrastructure and middleware scalability to 20% of LSST required rates
NSF Review
13
13
Data Challenge Infrastructure approaches operational DMS in complexity
SPIE 2010 June 30, 2010 San Diego, CA
LSST Data Management System (DMS)
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 15
DMS: Application Layer
• Applications Layer is organized as an Application
Framework (AFW), and packages built on top of
AFW
• AFW is object oriented, and manipulates objects from the application domain
– Imaging detectors
– Images and associated metadata
– Measurements made on images
– Catalogs of quantities derived from measurements
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 16
DMS: Middleware Layer
• Middleware Layer is organized as a set of packages
– Policy files for parameter acquisition
– Logging
– Exception handling
– Persistence
– Pipeline execution
– Data butler
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 17
Software Implementation
Object oriented design and implementation
– Use cases, domain model, detailed design in UML
Technology choices
– C++
– Python
– Swig bridge from C++ classes to python
• Access all of AFW from python prompt
– Multiple platforms (linux, OS/X, gcc, Intel icpc)
– All components open source
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 18
Using AFW
Much of our use of AFW, within pipelines or in standalone codes, is from python
All AFW classes and methods are made available to python via Swig import lsst.afw.image.imageLib as afwImage import lsst.afw.math.mathLib as afwMath im = afwImage.ImageF(filename) bctrl = afwMath.BackgroundControl(afwMath.Interpolate.NATURAL_SPLINE) backobj = afwMath.makeBackground(im, bctrl) im = backobj.getImageF()
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 19
AFW Beyond LSST
• We have designed AFW to work with nearly arbitrary mosaic detectors
– Supports four level hierarchy – focalplane; raft; detector; segment
– Geometry is flexible
– Readout parameters also flexible
–
• Prescan / Postscan columns
• Readout amplifier location
Specified in XML-like configuration file
• If AFW is used with LSST middleware layer, parallel processing of mosaics is easily managed
– Can be readily used with other middleware solutions
• AFW being actively configured and tested for
– CFHT Megacam
– CFA Megacam
– Subaru HSC
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 20
AFW Beyond LSST
• Different mosaic configurations are not the only challenge. Data organizations differ as well.
– Where are the calibration files?
– What metadata keyword identifies the filter?
– etc
• We have defined a “data butler” to mediate between a survey's data organization and processing code build on AFW, so that surveydependent code and configuration parameters are not visible at the application level
– Still in development
– Working well as part of our current Data Challenge, where it offers a uniform interface to our two data sources
• CFHTLS
• LSST simulated images
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 21
Documentation
We use doxygen to generate code documentation for both C++ and python
– Part of automated build process
– Available online: http://dev.lsstcorp.org/doxygen /
We use a publicly readable Trac wiki to tie together
– svn access to code tree
– Documentation above the doxygen level
– Ticketing system
– http://dev.lsstcorp.org/trac/wiki
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 22
Availability
• The code is currently available, but our ability to support users outside of the LSST project is very limited
• Our intent is to transition to a full open source model in which developers outside the LSST actively contribute to the code base
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 23
BACKUP SLIDES
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 24
DM Centers and Functions
• Base Center
• Near Real-time Alert Generation
• Long-term storage (copy 1)
• Archive Center
• Nightly Reprocessing
• Data Release Processing
• Long-term Storage (copy 2)
• Co-located Data Access Centers
• Data Access and User Services
• Provided via the Community Services Subsystem
• Shares Infrastructure with the Base/Archive Center also at Site
• System Operations Center
• Monitors/manages activities across centers
• Education & Public Outreach Center
• Specialized data access and services for outreach applications (not part of DM System)
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 25
Development Process
SPIE 2010 June 30, 2010 San Diego, CA
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 26
Software Development Environment
Trac integrates svn with ticketing system and development wiki
Automated checking of coding standards compliance using Parasoft
EUPS manages multiple versions of the stack at execution time
Scons used for building
Extensive testing performed as part of build process
Prebuilt, tagged versions of the stack distributed from central location
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 27
Data Challenges
Each DC is intentionally scoped to challenge the DM team on several fronts simultaneously:
• Computing, storage, and network infrastructure
• Astronomical data handling and algorithms
• Pipeline software framework (Middleware and Applications)
• Software development tools and procedures
• Project management and system engineering processes
SPIE 2010 June 30, 2010 San Diego, CA
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 28
External Components
We use a number of open source packages as components of our stack
– boost (C++ libaries)
– GIL (C++ templated imaging library; part of boost)
– numpy (python numerical math module)
– pyfits (python manipulation of FITS files)
– others – reference to full list is in the paper
Inclusion of external packages requires formal review
– Has (so far) prevented unbounded proliferation of external dependencies – important for maintainability
NGSS Sky Survey Data Management – Edinburgh Oct, 2010 29