Observational Astrophysics Data Management Issues: Wide-field Time-Domain Surveys

advertisement
Observational Astrophysics
Data Management Issues:
Wide-field Time-Domain
Surveys
Kem Cook
Lawrence Livermore National Laboratory and
National Optical Astronomy Observatory
For the LSST Collaboration
KHC SLAC March 16, 2004
Wide-field Surveys
• CCD technology advances have resulted in 64 Mpixel arrays
common and a few 300+Mpixel arrays
• Modern precision optics producing well corrected 1+ square
degree fields
• Super Novae, microlensing, Kuiper Belt Objects, Potentially
Hazardous Asteroids: all require wide field surveys, but in the
time domain
• Huge focal planes, dedicated surveys yield huge data sets
 Microlensing (MACHO, OGLE, EROS) > 20 Tbyte
 2MASS 14 Tbyte (little time resolution)
 Sloan Digital Sky Survey 20 Tbyte (little time resolution)
2
• Some surveys need area and sacrifice time—maybe they don’t
need to!
KHC SLAC March 16, 2004
Why focus on Wide-field, Timedomain surveys?
Endorsed by three separate National Academy reports
Astronomy & Astrophysics in the New Millenium
New Frontiers in the Solar System
Connecting Quarks with the Cosmos.
3
KHC SLAC March 16, 2004
The Large Synoptic Survey Telescope –
Massively Parallel Astronomy
Survey the entire sky every 2-3 nights, to
simultaneously detect:
 Potentially hazardous near earth asteroids
 Tracers of the formation of the solar system
 Fireworks in the heavens – GRBs, quasars…
 Periodic and transient phenomena
 Dark Matter via Weak gravitational lensing
 Thousands of supernovae per year, in
multiple passbands
......
4
KHC SLAC March 16, 2004
LSST Requirements
In order to cover 15,000 square
degrees in 2 days with two visits
requires: reach 24 V mag in <20 sec
over ~5-10 sq deg field
100x current capabilities
5
KHC SLAC March 16, 2004
LSST static survey of
10,000 sq deg
• stellar population studies
• a billion galaxies
• 100,000 mass clusters
• large scale mass structures
• physics of dark energy
6
KHC SLAC March 16, 2004
LSST will probe the time domain.
• LSST will discover ~ 200,000 supernovae that will trace
departures from the smooth Hubble flow at small redshift and
measure cosmological parameters at large redshift.
• LSST will enable photometric monitoring of 100 million stars to
detect extrasolar planets by transits and microlensing.
• LSST will measure proper motions of millions of near-by stars
and parallax distance determinations to about 1 Kpc
• LSST will probe the unknown variable universe (GRBs, …)
LSST performs massively parallel science.
7
KHC SLAC March 16, 2004
LSST Planetary Reach
 Increase inventory of solar system x100
 10,000 NEAs, 90% complete >250m
 Over 10 million MBAs
 Cometary nucleii >15km @ Saturn
 Extend size-n of comets to <100m
 TNOs beyond 100AU

rare new objects
8
KHC SLAC March 16, 2004
Science requirements for speed
• Near-Earth asteroids: ~10 sec exposures,
due to trailing loss & matching confusion.
• Optical bursters: <20 sec, due to sources.
• Weak lensing: <20 sec, for control of PSF
systematics.
• Photometry: 1-20 sec exposures for transfer
standards and dynamic range.
9
KHC SLAC March 16, 2004
Relative survey power
160
Time (x10)
Stellar
Galactic (x2)
Figure of Merit
120
80
40
0
LSST
SNAP
PanSubaru
STARRS
CFHT
SDSS
MMT
10
KHC SLAC March 16, 2004
How to effect the LSST
Basic idea:
8.4m diameter primary mirror
Survey entire sky
every few nights
Subtract images to
find variability
7 square degree field of view
> 12 Terabytes of data per night
Real-time analysis
Add images to go
deeper
11
First light in 2011
KHC SLAC March 16, 2004
LSST Focal Plane
• 55cm diameter
• 2.3GPixels on 10 micron centers
• 10 sec exposures and
2 sec readout
• 1 Mpix/sec max read rate
for low noise
• Array of CCD or hybrid CMOS
12
KHC SLAC March 16, 2004
LSST Data Rates
• 2.3 billion pixels read out in less than 2 sec, every 12 sec
• 1 pixel = 2 Bytes (raw)
• Over 3 GBytes/sec peak raw data from camera
• Real-time processing and transient detection: < 10 sec
• Dynamic range: 4 Bytes / pixel
• > 0.6 GB/sec average in pipeline
• 5000 floating point operations per pixel
• 2 TFlop/s average, 9 TFlop/s peak
13
KHC SLAC March 16, 2004
Large Synoptic Survey Telescope Information Flow Diagram
LSST Image Pipeline
Photometrically &
Astrometrically
Calibrated Template
Register and
Convolve
Science
Goals
Sky
Optics
Detectors
Camera Controller
Flatten
Crosstalk, Linearity, etc.
Diagnostics (CTE, noise)
Camera QA
CTE, noise...
Calibration
Library
Image
Archive
Image QA
Source Detection
Astrometry
WCS update
Image Subtraction
Difference
Object Detection
Astrometrically
Calibrated
Flatfielded
Image
Known
Variable
Table
Photometry Pipeline
Classifier
Image
coaddition
Known?
Scheduler
Telescope Controller
Astrometric
Catalog Table
Image QA
Source and noise insertion
for detection efficiency
determination
Fake
object
table
Photometric
Standards
Calculate Phot. Calib.
Coeff’s
Phot. Calibration
Table
N
Co-added Image
Archive
Analysis Packages
1-N
QA
Alert Table
Detections
Table
Alert
Protocol
Source insertion
for detection efficiency
determination
Fake
Object
Table
Aggregation
Utility
Match and
Orbits Table
14
KHC SLAC March 16, 2004
What comes out of the pipelines?
• 4 Pbyte images, 4 Tbyte of DB per year
• 100 billion objects with about 70-100
attributes for each epoch
• Alerts for transient sources
• Orbits for moving objects
Access spans need for single data point to
correlations across the whole data set!
15
KHC SLAC March 16, 2004
LSST Data Processing Infrastructure – circa 2012
Archive Campus – LSST Science Center
Archive
Computing
Center
2 PB disk
20 PB tape
20+ TFLOP
103 km
4 Gb/sec
GRID,
Internet 2
Portal
Portal
Users
Users
Mirror
Sites
103 km
4 Gb/sec
GRID,
Internet 2
Telescope site
(20TB/night)
Real-time processing
Telescope
Site
Portal
User
10 TB disk
1 TFLOP
Focal
plane
100 TB disk
10 TFLOP
Alert
Processing
Center
1-102 km
10 Gb/sec
10-1 kM
20 Gb/sec
(2.3GPixels@2B/Pixel in 2 Sec)
16
KHC SLAC March 16, 2004
Software
Rate limiting aspect of many past projects
Budget-busting potential
Has traditionally been written by practicing scientists
Management challenge is to capture best of this
Optimize interaction between astron. & s/w professionals
Need informed choices as project scope evolves
HEP has experience with this type of project
Typically evolves over the project
Must be able to re-reduce data with new algorithms
Results change with re-processing
The developers are best able to maintain the code
17
KHC SLAC March 16, 2004
LSST Data Management
• LSST goal is to provide useful, timely
and accurate data to a diversity of
communities:
•
•
•
•
Scientists
Data archive engineers
Educators and their students
General public
• Analysis software, data management
and access tools are major challenges
• Data management system will be
developed and deployed in a distributed
environment
18
KHC SLAC March 16, 2004
Data management challenges
Analysis
Pipelines
Database
Access
Change detection
Aggregation of
“detections” into
“objects”
Useful and
effective data
access
Schema
Fusion with other
data sets
Classification of
variability
Optimal co-adding
Calibrations
Optimal indexing
Data pedigree
Scope
NVO interface
Detection
Efficiencies
19
KHC SLAC March 16, 2004
Example Algorithm Challenge
• LSST will provide detections of all
moving objects across the entire sky
• These need to be sorted and
“aggregated” into objects with orbits...
Nobservations * Nobjects is a big number!
Operationally, how do you execute an
SQL query that knows about Kepler’s
laws?
20
KHC SLAC March 16, 2004
How does one find unexpected
correlations in space and time
MACHO found an impossibly large
number of low signal-to-noise
microlensing events clustered in a small
region of the sky using object based
photometry
Concentric
The events
rings
of
were
‘microlensing’
clustered in
centered
onof
a region
SN1987a!
nebulosity
MACHO
—star
discovered
forming the
light
echo.
activity?
21
KHC SLAC March 16, 2004
SN 1987a light echo via Image Subtraction
22
KHC SLAC March 16, 2004
Current LSST precursor projects for
developing data management tools
ESSENCE supernova survey
SuperMacho microlensing survey
Deep Lens Survey
4 meter telescopes
½ sq deg cameras
Real-time analysis
LSST prototyping
Throughput AW=4 m2 deg2
LSST will capture lessons from these.
23
KHC SLAC March 16, 2004
What is needed in the next 5 years?
• New bridge projects to stress data management
system development
 Current precursor projects end in 2 years
 Need scientific participation for developers
• Need to capture HEP experience in distributed
software development
• Efficient parallel file systems
• Fault tolerant real-time systems for alerts
• Efficient database algorithms for space-time queries
• Orbit calculation center with capability to handle
millions of orbit calculations
• Change in sociology to data mining rather than
observing
24
KHC SLAC March 16, 2004
LSST is a new approach for Astronomy
• Genuinely public access through multipurpose
data set -- no proprietary data period whatsoever
• Merges 3 enabling technologies
• Private-Public (Multi-agency) partnership
• Widely distributed development
• Parallel execution of multiple compelling science
programs, from solar system to cosmology
25
KHC SLAC March 16, 2004
26
KHC SLAC March 16, 2004
Download