Analysis Tools for the New Instruments: NICMOS and STIS images. I.

Instrument Science Report NICMOS97-030
Analysis Tools for the New Instruments:
NICMOS and STIS images. I.
Daniela Calzetti, Ivo Busko, Howard Bushouse, Michele De La Peña
December 5, 1997
This ISR presents the first set of software tools developed in IRAF/STSDAS and designed
to handle the data format and organization of NICMOS and STIS image files. Whenever
applicable, the tools propagate the error and data quality information of the input files to
the output files. The tools can be divided into two major categories: (1) general-purpose
MSJOIN, PSTACK), which target the user-community; and (2) calibration-oriented utilities (MSSTREAKFLAT, NLINCORR, MSREADNOISE, MSBADPIX, NDARK), which generate the reference files used in the NICMOS calibration pipeline. The individual tools and
their functionality are briefly described in this ISR.
1. Introduction: the Necessity for New Software Tools
An important change relative to the existing instruments introduced by NICMOS and
STIS is the data format. The data produced by the two new instruments come as FITS
files. Each NICMOS science observation comes with five extensions: the science array, the
error array, the data quality array, the number-of-samples array, and the integration-time
array (SCI,ERR,DQ,SAMP,TIME). STIS files carry the first three extensions only. In
addition, NICMOS has a variety of readout capabilities, which is reflected in the structure
of the data files. The ACCUM, BRIGHTOBJ, and RAMP readout modes produce a single
science image; thus the FITS file contains only one quintuplet of arrays. The MULTIACCUM readout mode produces (N+1) quintuplets, where N is the number of nondestructive readouts specified in the exposure and the additional quintuplet corresponds to
the 0th readout. A MULTIACCUM science file thus contains (N+1)x5 arrays. Conventionally, each NICMOS quintuplet (or STIS triplet) is called an image set or IMSET.
Dedicated software is needed for handling IMSETs (and multi-IMSETs in the case of
MULTIACCUM science files) and for providing error and data quality propagation, thus
fully exploiting the information contained in the data. This ISR presents the first set of
software tools which have been developed within IRAF/STSDAS for the purpose of filling
this gap.
2. The New Software Tools.
The software tools for operating on the NICMOS and STIS FITS files are available within
the IRAF/STSDAS environment for compatibility with pre-existing analysis software.
They can be found in the STSDAS packages:
hst_calib.nicmos (for NICMOS-specific tools).
The new tools are available as of September 12, 1997, as part of the new TABLES/STSDAS release. The tools have either been written in ANSI-C or are CL scripts interfacing
with pre-existing IRAF/STSDAS tasks. Whenever possible, the new tools have been
designed to accept a variety of data formats, OIF, GEIS, FITS files of STIS and NICMOS
images, and eventually will replace obsolete STSDAS tasks (e.g., MSSTATISTICS in
place of GSTATISTICS). The new tasks can be grouped into two major categories:
General-purpose utilities. They include tools for mathematical and statistical operations on science images, and for the analysis and display of reduced and raw data. In
most cases, the new utilities extend existing routines to include error and data quality
propagation. These are the utilities of largest interest to the user community. Under
this category are found some of the tasks described in this ISR: MSARITH, MSCOMBINE, MSSTATISTICS, MSJOIN and MSSPLIT, NDISPLAY and MARKDQ, and
PSTACK. The first 5 are found in the package toolbox.imgtools.mstools, the remaining
ones are in the package hst_calib.nicmos.
Calibration-oriented utilities. These generate reference files (e.g., readnoise arrays,
dark files, flatfields, non-linearity correction arrays, badpixel arrays) to feed the calibration database and support the calibration pipelines. The tasks are specifically
designed for the calibration of NICMOS, and will not be of general utility. The tools
and are all located in the calibration package hst_calib.nicmos.
All the tools have been tested using NICMOS science images. Below a brief, introductory
description is given of each task, highlightling the potential use; for specific details on the
tools capabilities, the user should refer to the help file of each individual task.
3. General Purpose Utilities
This tool complements the IRAF task IMARITH, in that it specifically handles NICMOS
and STIS images format and includes error and data quality propagation. MSARITH sup-
ports the 4 basic arithmetic operations (+, -, *, /) and can operate on individual or multiIMSETs. The input operands can be either files or numerical constants; the latter can
appear with an associated error, which will be propagated into the error array(s) of the output file. The NICMOS SCI, ERR, DQ, TIME, and SAMP arrays are combined following
the scheme in Table 1 below:
Table 1: MSARITH operations
In Table 1 we have assumed that the first operand (op1) is a file, and the second operand
(op2) can be either a constant or a file; the ERR arrays of the input files (σ1 and σ2) are
added in quadrature; if the constant is given with an error (σ2), the latter is added in
quadrature to the input ERR array. Finally, in Table 1 the pixels in the SCI images are in
counts. MSARITH can also operate on count rates and supports both NICMOS and STIS
data formats.
This tool is an extension of GSTATISTICS in the STSDAS package, which is in turn an
extension of IMSTATISTICS. The main novelty relative to GSTATISTICS is the inclusion
of the error and data quality information in computing statistical quantities. In addition to
the standard statistical quantities (min, max, sum, mean, standard deviation, median,
mode, skewness, kurtosis), two additional quantities have been added to take advantage of
the error information: the weighted mean and the weighted variance of the pixel distribution. If xi is the value at the i-th pixel, with associated σi error, the weighted mean and
variance used in the task are:
∑ σ--------------×
⟨ x⟩ w = ---------------------1
--------------∑ σi × σi
⟨ σ⟩ w2 = ----------------------1
∑ σ--------------i × σi
The data quality information carried by the NICMOS and STIS files is used to reject pixels in the statistical computation. Additional ‘masks’ can be input by the user to reject
objects/regions from the science arrays. MSSTATISTICS supports OIF, GEIS, the FITS
format of NICMOS and STIS data, and has independent pset parameters for each of these
This is a CL script which allows one to run the STSDAS task GCOMBINE on NICMOS
data files (image combination of STIS data is performed by the task OCRREJECT in the
hst_calib.stis package of STSDAS). The basic idea is to expand each NICMOS multiextension image into its basic components (SCI, ERR, DQ, SAMP, TIME) to make them
‘digestible’ for GCOMBINE. The SCI extensions become the inputs proper to the underlying GCOMBINE task, the ERR extensions become the error maps. The DQ extensions
are first combined with a user-specified Boolean mask (which allows selective pixel masking), and then fed into the data quality maps. If scaling by exposure time is requested, the
exposure times of each IMSET are read from the header keyword PIXVALUE in the
TIME extensions.
Once GCOMBINE finishes, the output is re-assembled back into a NICMOS datafile: the
output images and error maps from GCOMBINE will form the SCI and ERR extensions
of the output IMSET. The DQ extension will be a combination of the masking operations
and the rejection algorithms executed by GCOMBINE. The TIME extension will be the
sum of the TIME values from the input files minus the rejected values, divided on a pixelby-pixel basis by the number of valid pixels in the output image. The final TIME array
will be consistent with the output SCI image (average or median of the science data). The
SAMP extension is built from all the input SAMP values, minus the discarded ones via
MARKDQ reads the DQ array from a NICMOS image and marks the DQ flags on the displayed image. Each flag value can be set independently to a different color or be turned
off. NDISPLAY combines the capabilities of the IRAF task DISPLAY and the task
MARKDQ: it displays a NICMOS image and overlays the DQ flags according to the userspecified color-code. Both tasks are useful for locating specific DQ values, e.g. the cosmic
rays rejected by calnica in a MULTIACCUM image.
MSSPLIT extracts user-specified IMSETs from a NICMOS MULTIACCUM or STIS data
file and copies them into separate files. Each output file will contain a single IMSET and
will be given the primary header of the original file. This task may be useful in those cases
where the user wants to reduce the size of a NICMOS MULTIACCUM or STIS file or
wants to perform analysis on a specific IMSET only.
MSJOIN performs the opposite operation of MSSPLIT: it assembles separate IMSETs
into a single data file.
PSTACK plots all the samples of the specified pixels from a NICMOS MULTIACCUM
image as a function of time. This task is useful to track the time behavior of an image on a
pixel-by-pixel basis. For instance, the temporal position of cosmic ray hits, or the on-setting of saturation, can be identified in the course of an exposure for a defined set of pixels.
4. Calibration-Oriented Tools
The task is an extended version of the STREAKFLAT tool used with WFPC2 images to
extract flat-field images from Earth flats. Exposures of the bright Earth, routinely
employed to obtain flat-fields for WFPC2, will be used for the same purpose for NICMOS. Earth observations simulate the illumination pattern of astronomical observations
on the detectors better than the internal lamps. However, because of the retative motion
between the telescope and the Earth, the images obtained pointing at the Earth show a pattern of ‘streaks’ due to clouds and land/sea passages. The WF/PC Instrument Definition
team developed an algorithm to remove the streaks from the images without altering the
medium and large scale structure of the flat-fields. The algorithm is an iterative procedure
which takes advantage of the fact that the flat-field features due the the detector’s response
will be the same from one frame to the next, while the streak patterns will have random
angles. The final flat-field is built through subsequent approximations which lead to the
determination and removal of the streak pattern in each of the input images.
The MSSTREAKFLAT task, which can be used for both WFPC2 and NICMOS data formats, is an improvement relative to the STREAKFLAT task in that it allows for a more
flexible handling of the Data Quality flags (which can be user-specified through a pset)
and for the fact that there are no limitations to the maximum number of input files.
For what concerns NICMOS, the input error arrays are ignored. The output error array is
built from the streak-removed input images (‘flat-field estimates’) as a pixelwise standard
deviation relative to the output flat-field. Thus the error at each pixel is the square root of
the sum of all the residuals (individual flat-field estimates minus the output flat-field)
squared divided by the number of input files. Output sample and time arrays are the sum of
the sample and time values from the input arrays.
This task generates the non-linearity reference file NLINFILE for the NICMOS calibration pipeline software calnica.
The observed response of the NICMOS detectors can conveniently be represented by 3
regimes: 1) at low signal levels the response is linear and no correction is needed; the low
signal level for NICMOS is pixel- and Camera-dependent, and is about 14,500 DN and
below, with a standard deviation of about 400 DN; 2) at intermediate levels the detector
response deviates in a linear fashion from the incident flux and is easily correctable via the
F c = ( a1 + a2 × F ) × F
where a1 and a2 are the correction coefficients, F is the uncorrected flux (in DN) and Fc is
the corrected flux; 3) at high signal levels - as saturation sets in - the response becomes
highly non-linear and is not correctable to a scientifically useful degree; the saturation
level is about 30,500 DN, with a standard deviation of about 2,000 DN. The non-linearity
correction is thus derived in the flux range f1~14,500 DN -- f2~30,500 DN, where f1 and f2
are called the ‘‘nodes’’. The tool NLINCORR has the task of deriving the exact value of
the nodes and determining the coefficients a1 and a2 in the formula above, independently
for each pixel.
The input files are exposures of the same source spanning a range of integration times
(from low to high intensity levels, up to saturation). The input images must be bias and
dark subtracted. One output FITS file is produced, containing an array for each coefficient
or node determined and associated error and data quality arrays, including the covariance
between the two coefficients a1 and a2. The basic algorithm is a chi-square fitting between
the model and the data; initial guesses of f1 and f2 must be given for the routine to work.
This task generates readout noise reference files for the NICMOS calibration pipeline software calnica. In addition to NICMOS, the task is able to process STIS data format.
The basic algorithm works by determining, on a pixel-by-pixel basis, the standard deviation σ around the mean of a set of images, each image being the difference between two
short-exposure dark images. As the difference is made between darks with the same exposure time, the expected value of the mean is zero. The determination of σ includes an
optional sigma-clip cleaning step, as well as an optional histogram clipping procedure, to
eliminate cosmic rays and other outliers. If the dark exposure times are short enough that
Poisson noise of the dark current is much less than the read noise, the standard deviation
of the differences is:
σ =
2 × ( RN ) 2
where RN is the readnoise. The use of differences, rather than the dark exposures directly,
allows the processing of pairs of dark images with different exposure times from one pair
to the next and helps to control the effect of bad pixels and cosmic rays.
The input images are fed to the task in the form of two matching lists (by exposure time)
of NICMOS or STIS files. In the case of NICMOS, both ACCUM and MULTIACCUM
files can be input in the lists. The NICMOS output is a single IMSET file with the SCI
image populated with the values of the readnoise generated by the algorithm.
The task generates badpixel reference files for both NICMOS and STIS data formats. In
the case of NICMOS, the reference files are used as inputs in the calibration pipeline
The program is built to identify as bad pixels those pixels which are consistently above or
below the mean in a large sample of input files. To this end, the task computes for each
pixel a local mean and standard deviation (σ) after removing outliers (bad pixels, cosmic
rays, etc.) within a user-settable square window around the pixel. If in a particular IMSET,
the pixel under consideration deviates more than n-σ from the local mean, the pixel will be
flagged as bad in that particular IMSET. If more than x% (where x is a value set by the
user) of the IMSETs have that pixel flagged as bad, than the pixel will be carried on to the
output badpixel reference file. In addition to a sigma-clipping algorithm to remove outliers
from the local mean and σ, a histogram clean-up algorithm precedes any analysis to
remove bad columns. The clean-up algorithm consists of sorting the pixel values inside the
windows and clipping off a given fraction of them at both high and low extremes.
The input files are supposed to be a large number of homogeneous exposures (e.g., flatfield images), which must have any low-frequency trend removed so that the input science
images can be decribed by a constant plus noise only.
NDARK is a simple CL script that assembles the reference files used by the NICMOS calibration pipeline calnica to perform the dark current subtraction. Calnica uses one
reference file per detector (DARKFILE), which is made up of a sequence of IMSETs, each
one containing a dark image of a given exposure time.
Given a sequence of NICMOS dark images as a list of input files, NDARK builds the dark
reference file by extracting the first IMSET from each input file and packing the individual
IMSETs together in a single output file, while ordering them by increasing exposure time.
Additional header keywords needed by calnica are also added to the output file.