Zwicky Transient Facility Data System Jason Surace (Data Systems Lead)

advertisement

Surace 2014

Zwicky Transient Facility

Data System

Jason Surace

(Data Systems Lead)

Where is this Happening?

Infrared Processing and Analysis Center

IPAC is a multimission science center on the Caltech campus originally founded for the IRAS mission. Primarily funded by NASA, IPAC handles data processing, archiving, outreach, and/or command and control functions for IRAS, ISO, Spitzer, GALEX, Herschel, Planck, and WISE, as well as 2MASS, KI, and PTI. Also hosts NED, NStED and IRSA.

Located in two buildings with extensive modern data rooms in each.

IPAC does more than infrared! Recent ground-based project involvement includes LCO-GT and LSST.

Keith-Spalding Spitzer Science Center

Surace 2014

Morrisroe Astroscience Laboratory

Surace 2014

PTF Data System Development

IPAC began pipeline and archive development for the Palomar

Transient Factory in 2008. Small core team with significant contributions from partners, students, and post-docs.

PTF has been a continuous learning process, with evolution of the system to meet both a changing science program and evolving understanding of the science requirements. High agility level required.

Very large data volumes and limited resources. Significant leverage of existing IPAC expertise and in-house facility resources (e.g. IRSA, ICE, etc).

Current PTF Data Holdings

Surace 2014

R-band: 1247 nights, 3 million images.

IPAC Morrisroe

Computer Center

Surace 2014

Physical Infrastructure

• 24 drones with 240 cores.

• ~0.5 PB of data on spinning enterprise disks.

• Tape backup deep storage system.

• 86 TB database server.

• A lot of “junk”: 10G network switches, development computers, archive servers, fiber cards, racks, etc.

System will be scaled up by a factor of 10 for

ZTF.

PTF Processing Mechanics

• Software is a mixture of new, community, and IPAC heritage code. Wrappers knit it together, database used to track everything.

• Highly parallelized. Quantization is ccd (detector area) based.

0.65 square degrees at a time.

• Core team at IPAC, drawing on other IPAC expertise for specific tasks as needed.

• Graduate students and post-docs within collaboration provided significant analysis role. Ties to science community extraordinarily important.

Surace 2014

Data System Segments

• Realtime Data Processing – image subtraction, transient and solar system object detection.

• High Fidelity Daily Processing – nightly processing and recalibration for highest data quality images and source catalogs.

• Ensemble Processing – periodic construction of coadded images, processing of catalogs to create high precision light curves.

• Long-term Data Curation - storage of all raw data, processed data (images and extracted photometry), and an advanced data archive with data exploration tools, with public release.

Surace 2014

Realtime Pipeline

• The realtime pipeline triggers as data flows from the mountain as soon as it is taken.

• This is a modified version of the daily high fidelity pipeline, using fixed calibration.

• During the PTF era, this was led from LBL drawing on their SNe search expertise.

• During iPTF, IPAC developed a similar capability with new software development, with a focus on

SSOs.

Surace 2014

Realtime Pipeline

• Pipeline contains an image subtraction against a reference image library (described later).

• Transient source detection.

• Streak detection.

• Machine vetting via a 3 rd gen algorithm developed by

JPL/LANL.

• Roughly 10-minute phase lag on current system.

Surace 2014

Nightly Data Processing

• Data flows in realtime from 48-inch to IPAC via Cahill.

System kicks off after all data arrives.

• Data is archived to tape.

• Flats, biases, and other cal files assembled from ensembles of data taken throughout night.

• Astrometric and photometric calibration on a per-frame, per-chip basis. 2-3% photometry, 0.15” astrometry.

• Source extraction of all detected sources.

• Deposition into IRSA Archive system.

• Completion by next afternoon, available in archive in 1-

3 days.

Surace 2014

Deep Sky Coadds aka “Reference Images”

• Required inputs for image differencing. Also forms the backbone of the relative photometry pipeline via deep reference catalogs.

• Pipeline system designed to trigger when analysis of incoming data indicates enough new data has accumulated to increment existing deep coadds by half a magnitude.

• Images internally aligned to each other, reprojected and recombined with sophisticated outlier rejection.

• Many limits and checks on input data.

Surace 2014

Deep Sky Coadds aka “Reference Images”

Surace 2014

Single Image 60 sec @R Field 5257, Chip 7, Stack of 34

Deep Sky Coadds aka “Reference Images”

Surace 2014

* Results not typical. Near Galactic Center.

Relative Photometry aka “Lightcurve Pipeline”

• Source association through positions across epochal apparitions

(in catalog space), using reference catalogs derived from deep reference images.

• Additional processing that computes image-wide delta corrections to regular pipeline photometry, using all apparitions of sources on that chip/field. Layers on top of existing catalog data.

• Achieves few-millimag performance.

• Has been running in experimental form for several years. Now incorporated into the regular online processing.

• Significant computer science complexities in how to handle datastream in finite time for daily updates.

Surace 2014

Public Web Pages

As part of the public data release, we have commissioned a new web portal for PTF, which includes the path to the data archive as well as project documentation.

This was designed deliberately to be readily extensible to ZTF.

http://ptf.caltech.edu

Surace 2014

Released ~190k images and catalogs, or 10% of the existing PTFera data, from 6 separate regions on the sky.

Public Data Release

Surace 2014

Surace 2014

Public Archive

Both an interactive

GUI-based archive for image and catalog file discovery, as well as

VO-compliant software APIs, currently in use by several science programs.

Public Archive

Surace 2014

Lead-In to ZTF

• PTF and iPTF are the direct pathfinders for ZTF.

• Data system is now mature. A few remaining segments are being completed now and will be in place at least a year prior to ZTF.

• Most significant is the archive system interface for the catalogs and light curves, which will be implemented for the year 2 data release.

Surace 2014

Major ZTF Tasks

• Adaptation to substantially greater data rates.

Parallelization will still be spatial. Data system design allows replication in PTF-like subunits.

• Adaptation to any new detector peculiarities.

• Retuning of the realtime transient pipeline, specifically reworking the machine vetting process for transient candidates.

• Development of the VO alert subscription service.

Surace 2014

Surace 2013

Download