Instrument Science Report NICMOS97-030 Analysis Tools for the New Instruments: NICMOS and STIS images. I. Daniela Calzetti, Ivo Busko, Howard Bushouse, Michele De La Peña December 5, 1997 ABSTRACT This ISR presents the first set of software tools developed in IRAF/STSDAS and designed to handle the data format and organization of NICMOS and STIS image files. Whenever applicable, the tools propagate the error and data quality information of the input files to the output files. The tools can be divided into two major categories: (1) general-purpose utilities (MSARITH, MSSTATISTICS, MSCOMBINE, NDISPLAY, MARKDQ, MSSPLIT, MSJOIN, PSTACK), which target the user-community; and (2) calibration-oriented utilities (MSSTREAKFLAT, NLINCORR, MSREADNOISE, MSBADPIX, NDARK), which generate the reference files used in the NICMOS calibration pipeline. The individual tools and their functionality are briefly described in this ISR. 1. Introduction: the Necessity for New Software Tools An important change relative to the existing instruments introduced by NICMOS and STIS is the data format. The data produced by the two new instruments come as FITS files. Each NICMOS science observation comes with five extensions: the science array, the error array, the data quality array, the number-of-samples array, and the integration-time array (SCI,ERR,DQ,SAMP,TIME). STIS files carry the first three extensions only. In addition, NICMOS has a variety of readout capabilities, which is reflected in the structure of the data files. The ACCUM, BRIGHTOBJ, and RAMP readout modes produce a single science image; thus the FITS file contains only one quintuplet of arrays. The MULTIACCUM readout mode produces (N+1) quintuplets, where N is the number of nondestructive readouts specified in the exposure and the additional quintuplet corresponds to the 0th readout. A MULTIACCUM science file thus contains (N+1)x5 arrays. Conventionally, each NICMOS quintuplet (or STIS triplet) is called an image set or IMSET. Dedicated software is needed for handling IMSETs (and multi-IMSETs in the case of MULTIACCUM science files) and for providing error and data quality propagation, thus fully exploiting the information contained in the data. This ISR presents the first set of 1 software tools which have been developed within IRAF/STSDAS for the purpose of filling this gap. 2. The New Software Tools. The software tools for operating on the NICMOS and STIS FITS files are available within the IRAF/STSDAS environment for compatibility with pre-existing analysis software. They can be found in the STSDAS packages: toolbox.imgtools.mstools and hst_calib.nicmos (for NICMOS-specific tools). The new tools are available as of September 12, 1997, as part of the new TABLES/STSDAS release. The tools have either been written in ANSI-C or are CL scripts interfacing with pre-existing IRAF/STSDAS tasks. Whenever possible, the new tools have been designed to accept a variety of data formats, OIF, GEIS, FITS files of STIS and NICMOS images, and eventually will replace obsolete STSDAS tasks (e.g., MSSTATISTICS in place of GSTATISTICS). The new tasks can be grouped into two major categories: • General-purpose utilities. They include tools for mathematical and statistical operations on science images, and for the analysis and display of reduced and raw data. In most cases, the new utilities extend existing routines to include error and data quality propagation. These are the utilities of largest interest to the user community. Under this category are found some of the tasks described in this ISR: MSARITH, MSCOMBINE, MSSTATISTICS, MSJOIN and MSSPLIT, NDISPLAY and MARKDQ, and PSTACK. The first 5 are found in the package toolbox.imgtools.mstools, the remaining ones are in the package hst_calib.nicmos. • Calibration-oriented utilities. These generate reference files (e.g., readnoise arrays, dark files, flatfields, non-linearity correction arrays, badpixel arrays) to feed the calibration database and support the calibration pipelines. The tasks are specifically designed for the calibration of NICMOS, and will not be of general utility. The tools are MSTREAKFLAT, MSBADPIX, NDARK, NLINCORR, and MSREADNOISE, and are all located in the calibration package hst_calib.nicmos. All the tools have been tested using NICMOS science images. Below a brief, introductory description is given of each task, highlightling the potential use; for specific details on the tools capabilities, the user should refer to the help file of each individual task. 3. General Purpose Utilities MSARITH This tool complements the IRAF task IMARITH, in that it specifically handles NICMOS and STIS images format and includes error and data quality propagation. MSARITH sup- 2 ports the 4 basic arithmetic operations (+, -, *, /) and can operate on individual or multiIMSETs. The input operands can be either files or numerical constants; the latter can appear with an associated error, which will be propagated into the error array(s) of the output file. The NICMOS SCI, ERR, DQ, TIME, and SAMP arrays are combined following the scheme in Table 1 below: Table 1: MSARITH operations Oper. Operand2 SCI ERR DQ TIME SAMP ADD file op1+op2 sqrt(σ12+σ22) OR T1+T2 S1+S2 SUB file op1-op2 sqrt(σ12+σ22) OR T1 S1 MULT file op1*op2 (op1*op2)*sqrt[(σ1/op1)2+(σ2/op2)2] OR T1 S1 DIV file op1/op2 (op1/op2)*sqrt[(σ1/op1)2+(σ2/op2)2] OR T1 S1 ADD constant op1+op2 sqrt(σ12+σ22) ... ... ... SUB constant op1-op2 sqrt(σ12+σ22) ... ... ... MULT constant op1*op2 (op1*op2)*sqrt[(σ1/op1)2+(σ2/op2)2] ... T1*op2 ... DIV constant op1/op2 (op1/op2)*sqrt[(σ1/op1)2+(σ2/op2)2] ... T1*op2 ... In Table 1 we have assumed that the first operand (op1) is a file, and the second operand (op2) can be either a constant or a file; the ERR arrays of the input files (σ1 and σ2) are added in quadrature; if the constant is given with an error (σ2), the latter is added in quadrature to the input ERR array. Finally, in Table 1 the pixels in the SCI images are in counts. MSARITH can also operate on count rates and supports both NICMOS and STIS data formats. MSSTATISTICS This tool is an extension of GSTATISTICS in the STSDAS package, which is in turn an extension of IMSTATISTICS. The main novelty relative to GSTATISTICS is the inclusion of the error and data quality information in computing statistical quantities. In addition to the standard statistical quantities (min, max, sum, mean, standard deviation, median, mode, skewness, kurtosis), two additional quantities have been added to take advantage of the error information: the weighted mean and the weighted variance of the pixel distribution. If xi is the value at the i-th pixel, with associated σi error, the weighted mean and variance used in the task are: 3 i ∑ σ--------------× σi i x i 〈 x〉 w = ---------------------1 --------------∑ σi × σi i and: 1 〈 σ〉 w2 = ----------------------1 ∑ σ--------------i × σi i The data quality information carried by the NICMOS and STIS files is used to reject pixels in the statistical computation. Additional ‘masks’ can be input by the user to reject objects/regions from the science arrays. MSSTATISTICS supports OIF, GEIS, the FITS format of NICMOS and STIS data, and has independent pset parameters for each of these formats. MSCOMBINE This is a CL script which allows one to run the STSDAS task GCOMBINE on NICMOS data files (image combination of STIS data is performed by the task OCRREJECT in the hst_calib.stis package of STSDAS). The basic idea is to expand each NICMOS multiextension image into its basic components (SCI, ERR, DQ, SAMP, TIME) to make them ‘digestible’ for GCOMBINE. The SCI extensions become the inputs proper to the underlying GCOMBINE task, the ERR extensions become the error maps. The DQ extensions are first combined with a user-specified Boolean mask (which allows selective pixel masking), and then fed into the data quality maps. If scaling by exposure time is requested, the exposure times of each IMSET are read from the header keyword PIXVALUE in the TIME extensions. Once GCOMBINE finishes, the output is re-assembled back into a NICMOS datafile: the output images and error maps from GCOMBINE will form the SCI and ERR extensions of the output IMSET. The DQ extension will be a combination of the masking operations and the rejection algorithms executed by GCOMBINE. The TIME extension will be the sum of the TIME values from the input files minus the rejected values, divided on a pixelby-pixel basis by the number of valid pixels in the output image. The final TIME array will be consistent with the output SCI image (average or median of the science data). The SAMP extension is built from all the input SAMP values, minus the discarded ones via masking/rejection. 4 NDISPLAY and MARKDQ MARKDQ reads the DQ array from a NICMOS image and marks the DQ flags on the displayed image. Each flag value can be set independently to a different color or be turned off. NDISPLAY combines the capabilities of the IRAF task DISPLAY and the task MARKDQ: it displays a NICMOS image and overlays the DQ flags according to the userspecified color-code. Both tasks are useful for locating specific DQ values, e.g. the cosmic rays rejected by calnica in a MULTIACCUM image. MSSPLIT and MSJOIN MSSPLIT extracts user-specified IMSETs from a NICMOS MULTIACCUM or STIS data file and copies them into separate files. Each output file will contain a single IMSET and will be given the primary header of the original file. This task may be useful in those cases where the user wants to reduce the size of a NICMOS MULTIACCUM or STIS file or wants to perform analysis on a specific IMSET only. MSJOIN performs the opposite operation of MSSPLIT: it assembles separate IMSETs into a single data file. PSTACK PSTACK plots all the samples of the specified pixels from a NICMOS MULTIACCUM image as a function of time. This task is useful to track the time behavior of an image on a pixel-by-pixel basis. For instance, the temporal position of cosmic ray hits, or the on-setting of saturation, can be identified in the course of an exposure for a defined set of pixels. 4. Calibration-Oriented Tools MSSTREAKFLAT The task is an extended version of the STREAKFLAT tool used with WFPC2 images to extract flat-field images from Earth flats. Exposures of the bright Earth, routinely employed to obtain flat-fields for WFPC2, will be used for the same purpose for NICMOS. Earth observations simulate the illumination pattern of astronomical observations on the detectors better than the internal lamps. However, because of the retative motion between the telescope and the Earth, the images obtained pointing at the Earth show a pattern of ‘streaks’ due to clouds and land/sea passages. The WF/PC Instrument Definition team developed an algorithm to remove the streaks from the images without altering the medium and large scale structure of the flat-fields. The algorithm is an iterative procedure which takes advantage of the fact that the flat-field features due the the detector’s response will be the same from one frame to the next, while the streak patterns will have random 5 angles. The final flat-field is built through subsequent approximations which lead to the determination and removal of the streak pattern in each of the input images. The MSSTREAKFLAT task, which can be used for both WFPC2 and NICMOS data formats, is an improvement relative to the STREAKFLAT task in that it allows for a more flexible handling of the Data Quality flags (which can be user-specified through a pset) and for the fact that there are no limitations to the maximum number of input files. For what concerns NICMOS, the input error arrays are ignored. The output error array is built from the streak-removed input images (‘flat-field estimates’) as a pixelwise standard deviation relative to the output flat-field. Thus the error at each pixel is the square root of the sum of all the residuals (individual flat-field estimates minus the output flat-field) squared divided by the number of input files. Output sample and time arrays are the sum of the sample and time values from the input arrays. NLINCORR This task generates the non-linearity reference file NLINFILE for the NICMOS calibration pipeline software calnica. The observed response of the NICMOS detectors can conveniently be represented by 3 regimes: 1) at low signal levels the response is linear and no correction is needed; the low signal level for NICMOS is pixel- and Camera-dependent, and is about 14,500 DN and below, with a standard deviation of about 400 DN; 2) at intermediate levels the detector response deviates in a linear fashion from the incident flux and is easily correctable via the expression: F c = ( a1 + a2 × F ) × F where a1 and a2 are the correction coefficients, F is the uncorrected flux (in DN) and Fc is the corrected flux; 3) at high signal levels - as saturation sets in - the response becomes highly non-linear and is not correctable to a scientifically useful degree; the saturation level is about 30,500 DN, with a standard deviation of about 2,000 DN. The non-linearity correction is thus derived in the flux range f1~14,500 DN -- f2~30,500 DN, where f1 and f2 are called the ‘‘nodes’’. The tool NLINCORR has the task of deriving the exact value of the nodes and determining the coefficients a1 and a2 in the formula above, independently for each pixel. The input files are exposures of the same source spanning a range of integration times (from low to high intensity levels, up to saturation). The input images must be bias and dark subtracted. One output FITS file is produced, containing an array for each coefficient 6 or node determined and associated error and data quality arrays, including the covariance between the two coefficients a1 and a2. The basic algorithm is a chi-square fitting between the model and the data; initial guesses of f1 and f2 must be given for the routine to work. MSREADNOISE This task generates readout noise reference files for the NICMOS calibration pipeline software calnica. In addition to NICMOS, the task is able to process STIS data format. The basic algorithm works by determining, on a pixel-by-pixel basis, the standard deviation σ around the mean of a set of images, each image being the difference between two short-exposure dark images. As the difference is made between darks with the same exposure time, the expected value of the mean is zero. The determination of σ includes an optional sigma-clip cleaning step, as well as an optional histogram clipping procedure, to eliminate cosmic rays and other outliers. If the dark exposure times are short enough that Poisson noise of the dark current is much less than the read noise, the standard deviation of the differences is: σ = 2 × ( RN ) 2 where RN is the readnoise. The use of differences, rather than the dark exposures directly, allows the processing of pairs of dark images with different exposure times from one pair to the next and helps to control the effect of bad pixels and cosmic rays. The input images are fed to the task in the form of two matching lists (by exposure time) of NICMOS or STIS files. In the case of NICMOS, both ACCUM and MULTIACCUM files can be input in the lists. The NICMOS output is a single IMSET file with the SCI image populated with the values of the readnoise generated by the algorithm. MSBADPIX The task generates badpixel reference files for both NICMOS and STIS data formats. In the case of NICMOS, the reference files are used as inputs in the calibration pipeline (calnica). The program is built to identify as bad pixels those pixels which are consistently above or below the mean in a large sample of input files. To this end, the task computes for each pixel a local mean and standard deviation (σ) after removing outliers (bad pixels, cosmic rays, etc.) within a user-settable square window around the pixel. If in a particular IMSET, the pixel under consideration deviates more than n-σ from the local mean, the pixel will be 7 flagged as bad in that particular IMSET. If more than x% (where x is a value set by the user) of the IMSETs have that pixel flagged as bad, than the pixel will be carried on to the output badpixel reference file. In addition to a sigma-clipping algorithm to remove outliers from the local mean and σ, a histogram clean-up algorithm precedes any analysis to remove bad columns. The clean-up algorithm consists of sorting the pixel values inside the windows and clipping off a given fraction of them at both high and low extremes. The input files are supposed to be a large number of homogeneous exposures (e.g., flatfield images), which must have any low-frequency trend removed so that the input science images can be decribed by a constant plus noise only. NDARK NDARK is a simple CL script that assembles the reference files used by the NICMOS calibration pipeline calnica to perform the dark current subtraction. Calnica uses one reference file per detector (DARKFILE), which is made up of a sequence of IMSETs, each one containing a dark image of a given exposure time. Given a sequence of NICMOS dark images as a list of input files, NDARK builds the dark reference file by extracting the first IMSET from each input file and packing the individual IMSETs together in a single output file, while ordering them by increasing exposure time. Additional header keywords needed by calnica are also added to the output file. 8