O6.2 The Palomar Transient Factory

advertisement
The Palomar Transient Factory
or
Adventures in High Fidelity Rapid Turnaround
Data Processing at IPAC
Jason Surace
Russ Laher, Frank Masci, Wei Mi (did the IPAC work)
Branamir Sesar, Eran Ofek, David Levitan (students & post-docs)
Vandana Desai, Carl Grillmair, Steve Groom, Eugean Hacopians,
George Helou, Ed Jackson, Lisa Storrie-Lombardi, Lin Yan (IPAC
Team)
Eric Bellm (Project Scientist), Shri Kulkarni (PI)
What was/is PTF/iPTF?
• PTF is a robotic synoptic sky survey system designed to study
transient (time-domain) phenomena.
• Surveys 1000-3000 square degrees a night, predominantly at
R-band to a depth of 20.5.
• Primarily aimed at supernova science.
• But also can study variable stars, exoplanets, asteroids, etc.
• And produces an imaging sky survey like SDSS over larger
area.
• PTF ran 4 years on-sky starting in 2009, now “iPTF” for
another 3. Early foray into the next big theme in astronomy.
• Total budget ~$3M.
Surace 2014
Former CFHT 12k Camera -> PTF Camera
Eliminated nitrogen dewar; camera now mechanically
cryo-cooled. New field flattener, etc. 7.8 square degree
active area.
Surace
2011
Surace
2014
The Venerable 48-inch Telescope
Surace 2014
PTF camera installed in late 2008; Operations started 2009
Fully robotic operation. Automatically opens, takes calibrations,
science data, and adapts to weather closures. Human
intervention used to guide science programs.
Surace 2014
Infrared Processing and Analysis Center
IPAC is NASA’s multi-mission science center and data archive
center for IR/submm astronomy. Specifically, we handle processing,
archiving, and/or control for numerous missions including: IRAS,
ISO, Spitzer, GALEX, Herschel, Planck, and WISE, as well as
2MASS, KI, and PTI. Also the seat of the Spitzer Science Center,
NExSci, NED, NStED, and IRSA. Approximately 150 employees in
two buildings on the CIT campus.
Surace 2014
2009
R-band Holdings
1292 nights, 3.1 million images
47 billion source apparitions (epochal detections)
Surace 2014
g-band Holdings
241 nights, 500 thousand images
Surace 2014
H-alpha Holdings
99 nights, 125 thousand images
Surace 2014
P48
NERSC Image Subtraction and
Transient Detection/RB Pipeline
Caltech/C
ahill
Ingest
Photometric
Pipeline
IPAC
Realtime Image
Subtraction
Pipeline
Epochal
Images and
Catalogs
Transient
Candidates
SSOs
Lightcurve
Pipeline
Lightcurves
Reference
Images
Reference
Catalogs
Reference
Pipeline
Surace 2014
Moving Object
Pipeline
IPAC Infrastructure
• Data transmission from Palomar via
microwave link to SDSC.
• ~1TB of data every 4-5 days.
• 24 drones with 240 cores. Mixed Sun
and Dell blade units running RHE.
• Roughly 0.5 PB spinning disk in
Nexsan storage units.
• Associated network equipment.
• Database and file servers.
• Archive servers.
• Tape backup.
IPAC Morrisroe
Computer Center
Surace 2014
Cluster/Parallelization Architecture
• PTF data are observed on a fixed system of spatial tiles on
the sky. Vastly simplifies data organization and processing.
PTF fields and CCD combinations are the basic unit to
parallelize processing over multiple cluster nodes. Each node
processes a CCD at a time.
• “Virtual Pipeline Operator” on a master control node
oversees job coordination and staging.
• Multi-tiered local scratch disk, “sandbox” (working area)
and archive disk structure; inherited architecture from
previous projects driven by issues with very large file counts
and I/O heavy processes.
• Disk system shared with archive for budget constraint issues.
Surace 2014
Software Structure
• Individual modules written predominantly in C, but also FORTRAN,
PYTHON, MATLAB, and IDL.
• Connected with PERL wrapper infrastructure into discrete pipelines.
• Postgres database used for tracking dataflow, data quality, etc. Relational
database not used in the operations system for catalog storage; not
needed, and flat file access is more efficient.
• Heavy use of community software: sextractor, swarp, scamp,
astrometry.net, daophot, hotpants. Cheaper not to re-invent the wheel.
• Software replaced as needed by new code development.
• Highly agile development program: unknown and changing science
requirements, small team, and no separate development system due to
budget constraints!
• Continuous refinement process. There’s a trap with big data
development on a new instrument.
Surace 2014
Realtime Pipeline
• Realtime – data is processed as received, turnaround in 20
minutes. Needed for same-night followup.
• Astrometric and photometrically calibrated.
• Image subtraction against a reference image library
constructed from all the data to-date. In-house software.
• “Streak detection” for fast-moving objects; moving object
pipeline constructs solar system object tracklets.
• Transient candidate detection and extraction via psf-fitting
and aperture extraction.
• Machine-learning “scores” candidates.
• Image subtractions and candidate catalogs are pushed to
an external gateway where they are picked up by the solar
system, ToO, and extragalactic marshalls.
Surace 2014
Realtime Image Subtraction and Transient Detection
Originally the community “HOTPANTS” package, now replaced with a more
sophisticated in-house image subtraction algorithm.
Surace 2014
Photometric Pipeline
• This pipeline processes data in the traditional manner.
• Starts up at the end of the night, after all the data has been received.
• Calibration is derived from the entire night’s worth of data. Specifically, the
bias and flat-fields are derived from the data themselves.
• Photometric calibration is derived from extracted photometry from all sources,
fitting color, extinction, time and large-scale spatial variations vs. the SDSS.
Typically reach an accuracy of a few %.
• Astrometric calibration is done individually at the CCD level, against a
combined SDSS and UCAC4 catalog. Typically good to 0.15”.
• Output from this pipeline are calibrated single-CCD FITS images and single-CCD
catalog FITS binary tables (both aperture and psf-fit). These are archived
through IRSA. Available 1-3 days after observation.
Photometric Pipeline Output
Single R-band thumbnail
image of Arp 220, 8
arcminutes across.
Aperture extractions
catalog (sextractor-based)
overlaid. All observations
and detections of
everything are saved in the
archive.
Products are a reduced
image, bit-encoded data
quality mask, and catalogs.
All products are FITS.
Reference Image Pipeline
• Once enough individual observations accumulate, the “reference
image” pipeline is triggered.
• This pipeline coadds the existing data, after selecting “best frames”,
e.g. best seeing, photometric conditions, astrometry, etc.
• Coaddition is done based on CCD id, PTF tile, and filter.
• These images are the reference of the static sky, at a level deeper
than the individual observations.
• “Reference Catalogs” are extracted from these images.
• This concept is important, because these are both the underlying
basis of the image subtractions, and also the basis of the light-curve
pipeline.
• Like PTF coverage, the depth of these is variable, but is current
5<n<50.
• Resulting products are FITS images and FITS binary tables.
Reference Images
Single Image 60 sec @R
Surace 2014
Field 5257, Chip 7, Stack of 34
Deep Sky Coadds aka “Reference Images”
* Results not typical. Near Galactic Center.
Surace 2014
Deep Coadds
Surace 2014
Light Curve Pipeline
• Each night, all detected sources from the photometric pipeline
are matched against the reference catalog (better than a
generic catalog-matching approach).
• All sources ever seen for a given CCD, PTF tile, and filter
combination are loaded and analyzed.
• Least variable sources used as anchors for the calibration.
• Image-by-image correction factors computed for that image as
a whole and stored as a lookup table.
• Application of these secondary correction factors improves
overall relative calibration to near-millimag levels for bright
sources (that part is important).
• Triggers less frequently (planned weekly updates).
• Highest level of our products.
Binary star light curves taken from PTF
processed images in Orion.
From Van Eyken
Surace 2014
Example Light Curves
Something a little
different, these are
relatively faint asteroid
light curves from Chang
et al. 2014.
Surace 2014
PTF Archive at IRSA
Data products can be searched and retrieved via sophisticated GUI tools
and also through an application program interface that allows integration
of the archive into other, 3rd party software.
Surace 2014
PTF Archive at IRSA
Surace 2014
IRSA is looking to hire a UI software developer , see the
Caltech website https://jobs.caltech.edu/postings/2254 or ask
Steve Groom at this meeting.
PTF “Marshals”
• PTF “Science Marshals” sit on top of the data archive.
• Marshals are like interactive science twikis.
• Marshals are predominantly written by science users
for their science collaborations, with coordinated
interaction between them and the ops/archive
system.
• The ops system produces science products (e.g.
data), the archive produces access to science
products, the marshals help turn the science
products into science results (e.g. papers).
• They can be used to classify data, listen for alerts, lay
down new observations for robotic followup,
coordinate collaborators, etc.
Surace 2014
iPTF Extragalactic Marshal
Surace 2014
iPTF Extragalactic Marshal
Surace 2014
NEA “Streaker” Marshal
Surace 2014
NEA “Streaker” Marshal
Surace 2014
GRB Target of Opportunity (ToO) Marshall
GRBs and (should they ever be
detected) gravity waves can only be
localized to tens to a few hundred
square degrees.
PTF and ZTF can survey these areas in
tens of minutes as targets of
opportunity to localize fading
electromagnetic counterparts.
Marshall receives alerts from Fermi
and Swift, automatically lays down
proposed ToO observations, and alerts
a user by phone to activate the
followup.
Surace 2014
iPTF ToO Marshall
iPhone App
Zwicky Transient Facility
More or less what PTF was, but an order of magnitude more of it.
ZTF was awarded full
funding through NSF-MSIP
(Mid-Scale Innovation
Program).
ZTF now a roughly 50:50
public:private partnership.
Total Budget ~$17M
Surace 2014
Wafer-Scale CCDs
e2v CCD231-C6 6k x 6k
form factor with 15
micron pixels. A little
under 4 inches on a side.
Focal plane readout time <10 seconds! 16 CCDs, 4 readouts
each. And they are cheap.
30-second cadence means 1.2 GB raw data every 45 seconds.
~16x current data rate from PTF.
5 CCDs in-hand, remaining 11 now ordered.
Surace 2014
ZTF camera FOV is
50 square degrees.
Largest camera on
>1m telescope by
area in the world.
Or, to make a little
clearer, here’s Orion.
The white box is the
ZTF imaging area.
The moon is in the
upper right corner of
the white box.
Surace 2014
Surace 2014
And to Process All This?
IPAC is the data processing and archive
center for all aspects of ZTF.
Continuous raw data flow of 30MB/s.
0.5-1 PB/yr of data products.
Drone farm of 128 computers.
Replication of proven PTF design in
subunits similar to PTF data load (camera
quadrants).
Surace 2014
Surace 2014
Transient Science Summer Schools
Surace 2014
Schedule
• Early 2014 – PTF data for selected high cadence fields
(M81, Beehive, Orion, Kepler, Stripe 82, Cass-A.
• 2015 – Complete PTF Archive release.
• 2016 – Rolling Releases of iPTF Archive, , including deep
reference images and light curves.
• 2017 – ZTF First Light (Jan), commissioning of camera,
building of new reference images.
• 2018 – First ZTF data release (images, catalogs, light
curves, transient candidates)
• 2019 – Release of transient alerts.
• 2020 – NSF funded period ends. Project continues with
private partners.
Surace 2014
http://ptf.caltech.edu
Surace 2014
Download