Data Compression for ACS

advertisement
Instrument Science Report ACS-97-02
Data Compression for ACS
M. Stiavelli and R.L. White
November 1997 - DRAFT Version 1.0
ABSTRACT
The algorithm for on-board compression on the fly of ACS data is briefly reviewed and its
benefits discussed. On the basis of this discussion we recommend a compression strategy
and briefly list a plan to establish the optimal compression factor. Once the planned test is
completed we will recommend an implementation strategy for compression during SMOV
and the following cycles.
1. Introduction
A single ACS WFC image has a size 6.6 times larger than a full WFPC2 frame, due to
the larger number of pixels (4096 square rather than 1600 square). Note that WFPC2 only
uses 12 bits per pixel rather the full 16 bits of a two byte word, so that in terms of actual
information the ratio raises to 8.7. The 32 MBytes buffer memory of ACS (34 Mbytes
including the ‘‘epsilon’’ memory) can accomodate just a single uncompressed WFC frame
(or, alternatively, 16 uncompressed HRC or SBC images). The sheer size of these frames
makes it interesting to explore the costs and benefits of on-board compression as suggested by the ACS IDT. Note that there is currently no plan to compress the much smaller
HRC and SBC images.
Although one might debate the relative advantages and disadvantages of lossy compression algorithms, the path followed by the ACS IDT for ACS was that of lossless
compression, i.e., a compression scheme that is entirely reversible so as not to lose any
data. Such lossless compression is for example regularly carried out in the HST Archive in
a way which is completely transparent to the users. From information theory we know that
it is impossible to compress a generic image without loss of information. This is because
the number of bits measures, by definition, the information there contained. All lossless
compression algorithms are then simply ways of reducing the size of some particular
image at the expense of increasing the size of some other image, so that a suitably averaged size will be unchanged. A good compression algorithm is very good at compressing
the most commonly encountered data sets but is unable to compress rarely encountered
1
data sets. Thus, the effectiveness of a particular compression algorithm will depend on the
properties of the particular class of data sets to which it is applied, e.g.,UNIX compress or
gzip are very effective in compressing ASCII files but are much less effective on binary
files. In Section 2, we will briefly summarize the basic ideas behind the compression algorithm envisaged for ACS.
In order to be useful on HST, a compression algorithm must have a minimum guaranteed compression factor, so that software and memory allocations can be made under the
assumption that each frame will never exceed some given size. For the reason discussed
above, this cannot be guaranteed for a generic data set. However, astronomical images are
not generic data sets but are characterized by some common properties, e.g., nearby pixels
are correlated. In Section 3, we will discuss how the proposed compression algorithm can
provide interesting minimum compression factors. Our plan to obtain a robust estimate of
the the minimum safe compression factor is given in Section 4. Our recommendations are
summarized in Section 6. We expect to revise these recommendation after TV.
2. The Rice and the White pair algorithms
Since, as we have seen, any lossless compression algorithm must expand some data set
in order to compress others, it is reasonable to adopt a method which is best suited for the
typical astronomical image. A common characteristic of astronomical (and many non
astronomical) images is the presence of correlations between neighbouring pixels. Such
correlations are due to the intrinsic nature of the sources and to the fact that point spread
functions tend to be wider than one pixel, particularly for ACS which is better sampled
than, e.g., WFPC2. Another common property of astronomical image is their relatively
low filling factor, i.e. many pixels containing only sky and detector noise. Clearly the latter
is not verified when imaging bright nearby galaxies, but it applies to distant galaxies and
stellar clusters in our own galaxy. The Rice algorithm, used in some space applications
outside the Hubble project, takes advantage of these properties of astronomical images by
subdividing an image into small blocks (typically 16 pixels each) and providing for each
block or each row a starting value (in general the value in the first pixel in the block) and
the difference between the value in each pixel and the starting value. These differences
will have the least significant bits dominated by noise and thus highly variable, while the
remaining bits will only vary very little. The Rice algorithm does not compress the highly
variable bits and just compresses the most significant, not highly variable, ones. The reason why the algorithm works is that indeed the differences in value between two
neighbouring pixels are mostly clustered around zero. The probability that a difference has
some value ∆y is exponentially decreasing with a typical scale h that varies from image to
image. It is this scale h that determines the separation between the uncompressed and the
compressed bits. Note that cosmic ray hits are hard to compress in such a scheme, since
they are characterized by sharp changes in pixel value for neighbouring pixels.
2
Unfortunately, this algorithm even though very fast, was not fast enough for the 386
CPU installed in ACS. As a consequence, one of us (R. White) developed a new, faster
algorithm loosely based on the same ideas but now compressing pixel pairs. The inclusion
of flags every 8 pixel pairs allows one to keep track of whether each particular pair has
been compressed. Thus, pixel pairs that are hard to compress can be left uncompressed
and flagged as such. This feature makes the new algorithm somewhat more robust against
the occasional cosmic ray hit. On the other hand the new algorithm cannot compress as
much in the most ideal circumstances.
The theoretical limit for the White pair algorithm approaches a factor 4. In practice,
the implementation considered for ACS with a block size of 16 pixels has a theoretical
maximum compression rate of 32 ⁄ 9 ≅ 3.56. Benchmarks have shown that this algorithm
allows compression at the rate of 1.2 ∗105 pixels/s.
3. Guaranteed Compression Factors
The White pair algorithm achieves compression factors that depend largely on the
noise level of frames. For internal frames (darks and biases) the noise properties are rather
well known and we should expect high compression factors, certainly exceeding 2 and
probably exceeding 3. In general, we should expect images taken with A-to-D gains of 2
or higher to be easier to compress than those at gain=1. In fact, higher gain ratios are
equivalent to reducing the sampling of the image noise doing, for all practical purposes, a
lossy compression in hardware. Similarly, we should expect long exposures and exposures
with high mean signal level to be more difficult to compress. The former have more pixels
negatively affected by cosmic rays; the latter have higher (Poisson) noise level and thus
higher variations from pixel to pixel.
It is likely that typical minimum guaranteed compression factors of about 2 or higher
will be achieved. In order to have robust estimates of the kind of compression factors that
can be achieved, it will be necessary to carry out an extensive set of experiments. So far
one of us (R. White) has done a number of tests on real WFPC2 images, in particular:
•
a dense star cluster
•
a big, field-filling, elliptical galaxy
•
a deep HDF exposure
•
a long, low S/N, UV exposure
•
several short exposures
These tests always showed compression factors exceeding 2. More tests will be carried
out at Ball. However, already at this stage we can see how an optimal strategy would
involve the definition of a table specifing a guaranteed compression level as a function of
gain setting, filter, and exposure time. Should such an optimal strategy be considered
3
unfeasible, the convenience of data compression on the fly will depend on whether
detailed simulations will show that our expectation of minimum guaranteed compression
factors exceeding 2 will indeed be possible.
One concern on the actual implementation has to do with the number of simultaneous
amplifiers used for the WFC readout. Indeed, the planned scheme is to compress on the fly
the output of one amplifier and to compress in buffer the other three. The compression of
one forth of the data on the fly, can free up sufficient space to start the compression of the
remaining outputs. In case of two amplifiers operation the cpu load for on the fly compression would probably be too high, i.e. about 97 per cent, and therefore one would probably
be unable to compress the data due to lack of buffer space. Subarrays are always compressed on the fly since they are read with a single amplifier.
It is worth noticing that in the current implementation of the compression algorithm
the minimum guaranteed compression is also the maximum since buffer space is allocated
according to the minimum guaranteed compression rate and any unused space (in case of a
better compression) is padded. If the guaranteed compression is not achieved on a single
2048 pixels data segment, some data are lost and compression continues on the following
segments. Clearly such losses, if rare, can be acceptable for calibration images like biases,
or earth flats but would be unacceptable for science data, which must therefore use a conservative compression factor. According to current plans the compression factor is an
Engineering only Phase II parameter.
4. A Strategy for Deriving the Compression Factor
The previous discussion makes it clear that deriving a firm minimum compression factor is very important. For this reason we have sketched a test plan aimed at obtaining and
verifying such a quantity. In order to carry out the test on images as realistic as possible we
suggest:
•
the test has to be carried out on artificial images simulating real ACS images as well as
possible.
•
the artificial images should be constructed from WFC dark frames, and should include,
in addition to the astronomical objects and background, also source and background
poisson noise, read-out noise, and dark current.
•
given the higher cosmic ray rate observed on orbit, additional cosmic rays will have to
be added to the artificial images.
•
the procedures to produce artificial images and test them will have to be prepared in
suitable script, so as to make it easy to repeat them if and when improved dark frames
(from e.g. thermal vac and, later, SMOV) will be available.
•
test images will be produced to simulate the whole range of HST targets (galaxies,
globular clusters, planetary nebulae, deep exposures, clusters of galaxies, biases,
darks, internal and earth flat fields, etc).
4
•
tests will probably need to be repeated on real data during SMOV, e.g., by observing
with ACS a galactic globular cluster and a nearby elliptical.
If the minimum compression factor estimated from the synthetic image tests is found
to be significantly in excess of 2 one could baseline the use of compression already for
Cycle 9, after it has been verified in SMOV. Should the compression factor be found,
unexpectedly, to be close to or lower than 2 we should revise the HST DRM for Cycle 9
(which assumes a compression factor of 2) and perhaps consider delaying the beginning of
routine compression to a later cycle.
5. Recommendations
Assuming that the synthetic image tests confirm that compression factors in excess of
2 can be obtained, our tentative recommendations in order of priority can be summarized
as follows:
1. adopt an exposure dependent minimum compression rate. A table would need to be
implemented in software to identify the compression rate. Probably three compression settings (low, medium, high) could be sufficient. Dependent on the structure
of the available and planned software this solution may end up being too expensive.
2. adopt a special, gain-dependent, minimum compression factor for calibration
observations (darks, biases, flats) and a standard, gain-dependent, minimum compression rate for all other observations.
3. adopt a gain-dependent minimum compression factor for all images.
4. adopt a constant minimum compression rate.
We expect to be able to formulate final recommendations after TV (when the final
flight CCD parameters will be determined)
We believe that compression is essential for a proper parallel operation of ACS (see
e.g. ISR ACS-97-01) and, therefore, it should be abandoned only if serious problems with
the algorithm and its implementations are uncovered.
6. Acknowledgements
Many thanks Chris Blades for useful discussions.
7. References
Advanced Camera for Surveys (ACS) Science Operations Requirements Document - Part
B (Op-01), version 10/27/97, Sections 2.1.4, 3.1, and 3.5.
HST Reference Mission for Cycle 9 and Ground System Requirements, ACS ISR-97-01.
Flight Software CDR
5
Download