Word - Java.net

advertisement

[JAI_IMAGEIO_CORE-125]

Reduce excessive memory use when decoding large tiffs

Created: 23/Apr/07 Updated: 11/Mar/08

Status:

Project:

Open jai-imageio-core

Component/s: implementation

Affects

Version/s: current

Fix Version/s: milestone 1

Type: Improvement

Reporter:

Resolution:

Labels: gerardkoppelaar

Unresolved

None

Not Specified Remaining

Estimate:

Time Spent:

Original

Estimate:

Not Specified

Not Specified

Priority:

Assignee:

Votes:

Minor jai-imageio-core-issues

0

Environment: Operating System: All

Platform: All

Attachments: datablocks-read-into-bytearrays.txt tifflzwfix.zip

Issuezilla Id:

Description

125

When reading large tiffs, we found the imagelib memory usage sometimes increased to about double the expected amount, therefore throwing out-of-memory exceptions more often than it should.

After some investigation, we found this happened with TIFFs that do not use the suggested number of scanlines per block settings or tiles of decent size, but rather put all image data in one large block. While the TIFF specs !! SUGGEST !! a data blocksize of about 64K bytes for easy handling, this is by no means required and a 4-byte int size field shows a much larger data block size is perfectly valid.

As the tileDecoder (incorrectly) asumes the data block is of a small size, it fully reads the data block into a newly created byte[].

For TIFF files that contain only one large data block, this will read the FULL image file data into this byte buffer.

As the buffer is only released AFTER the decoding process, during the decoding there must be enough memory for BOTH the complete data byte[] AND the

BufferedImage destination raster, thereby sometimes almost doubling the amount of memory that is neccessary to read such a large TIFF. As memory becomes a more

important factor especially when reading very large images, this is most inconvenient.

Unfortunately, we often get such very large TIFF files (10k x 10k, 15k x 10k,

20k x 20k pixels, 24bit color or 8-bit grey). These images are produced by our customers' scanners (which, by configuration or design, do not split the data into tiles or small blocks of scanlines).

When for example documents of size A0 are scanned using such hardware, or when a very high DPI setting is used on a smaller document, the imageio lib easily runs out of memory when trying to read these files.

An example error stack trace for a LZW encoded large tiff is given here: java.lang.OutOfMemoryError: Java heap space at com.sun.media.imageioimpl.plugins.tiff.TIFFLZWDecompressor.decodeRaw(TIFFLZWDecompressor.java:118) at com.sun.media.imageio.plugins.tiff.TIFFDecompressor.decode(TIFFDecompressor.java:2527) at com.sun.media.imageioimpl.plugins.tiff.TIFFImageReader.decodeTile(TIFFImageReader.java:1137) at com.sun.media.imageioimpl.plugins.tiff.TIFFImageReader.read(TIFFImageReader.java:1417)

Note that the decodeTile is called when there are no tiles (i.e. the image is stored in one large tile).

And the cause of the memory shortage can be found on lines 118 and 119 of (in this case) TIFFLZWDecompressor:

[118] byte[] sdata = new byte[byteCount];

[119] stream.readFully(sdata);

Solution

--------

As the decoding of the LZW-encoded image data is actually a linear byte-reading process, there is absolutely no need for reading the data block completely into memory. Linearly reading bytes from the ImageInputStream when needed is sufficient, although in that case performance becomes very dependent on the underlying ImageInputStream. A buffered approach with possibly a read size limitation (as the total data size is usually known already, and can be easily checked upon) and a small read buffer of, say, max 1024 bytes, would also reduce the excessive memory usage while keeping the performance reasonable high in case of a slow implementation of the underlying ImageInputStream.

General remarks

---------------

While we detected this behaviour for certain LZW-encoded TIFFs, there might be other situations where there also is a potentially high memory usage when reading of very large images, due to the use of (almost) unlimited temporary decoding buffers.

Scanning all current imagedecoders for this problem and using the linear-

decoding aspect of most en/decoding algorithms, and/or creating a generic, re- usable buffering solution would improve imageio's handling of large images considerably.

Also, the current image test set (if there is any) should be extended with some very large images for every type, and a rough memory usage check should be added to the module tests to detect possible "very non-optimal" implementations.

Test files

----------

Our current test files are scans of our customers' documents and therefore can not be sent to anyone.

If however some test images are needed, we should be able to create some ourselves (by writing a large TIFF with providing adjusted metadata to force the

TIFF writer to put all data into one large block).

Workaround

----------

We have found no easy software workaround using the current imageio library.

Our options are to add more memory, or simply skip processing for very large images by checking their width and height first. a proof-of-concept using a 1024-byte buffer between the data reading and decoding process for LZW-encoded TIFF images seemed to work fine, but the iomageio library is too large for us to easily see the full impact of such a change through all layers, or the other places such a change would be useful.

Priority

--------

Although this appears to be an optimizing request and therefore of lower priority, the double memory usage problem occurs especially when memory is needed most, which is when our software is processing extremely large images.

A fix for this problem would be highly appeciated.

Comments

Comment by robertengels

[ 23/Aug/07 ]

If a TIFF is G4 compress (or the G3-2D) then you need more than 1 scan line, as the compression works in

2 dimensions. Still, it could allocate a compression buffer smaller than the tile.

It seems that only minor changes need to be made to TiffFaxDecompressor. Instead of reading the entire tile in decodeRaw, just keep a reference to the stream, and read as needed into a "line buffer".

Comment by gerardkoppelaar

[ 11/Oct/07 ]

The workaround to add more memory is unfortunately not working (al least not on

Windows 32 bit that our application is running on).

We updated the amount of RAM from 2GB to 4GB but Java seems incapable of using more than about 1.6GB. According to some internet articles, this has something to do with Windows fragmenting the memory by loading dll's in fixed positions and the Java runtime requiring a contiguous block of heap space.

Comment by jxc

[ 12/Oct/07 ]

Can you try to disable codecLib to see if the TIFF reader still uses more memory than expected? java -Dcom.sun.media.imageio.disableCodecLib=true YourApp

For the TIFF reader, codecLib is used only for G3/G4 decoding.

In other words, tiling is not managed by the native codecLib.

Comment by gerardkoppelaar

[ 31/Dec/07 ]

This issue is probably filed under the wrong subcomponent; it's about the java code, NOT the native codeclib, using gigantic blocks of byte[] instead of a streaming decode loop that does not cost much memory.

I suggest changing subcomponent to "implementation"? And re-assigning to the proper developer.

Comment by jxc

[ 02/Jan/08 ]

Changed subcomponent to "implementation" as suggested.

In the case of LZW encoded TIFF, codecLib is not used.

Comment by gerardkoppelaar

[ 28/Feb/08 ]

Created an attachment (id=127)

Example fix

Comment by gerardkoppelaar

[ 28/Feb/08 ]

I attached a proposed (partial) fix for the problem (example, feel free to make changes).

This one works for TIFF LZW-compressed files, but note that TIFFFax- TIFFJpeg-

TIFFDeflate- TIFFPackbits- etc most likely also suffer from the same problem, we just happened to run into the problem with LZW-encoded tiffs.

I also noticed (while browsing the source code) that for chaining decompressors

(for color conversions etc) and also in JPegDecompressor, often a byte array is created, then wrapped into an ImageInputStream, which in the next decoder is again fully read into a byte buffer; there must be a way to indicate a preferred input (i.e. byte or stream) to avoid these unneccessary and costly

(both performance-wise and for memory use) wrappings!

The attached zip file contains two files:

1. The (new) class LimitedBufferedByteReader, which is a simple sample of a buffer around an ImageInputStream, with a limit to prevent reading too much compared to the old unbuffered implementation; note that most decompressors know the compressed data block size before starting the decompression!

I put this class in the package com.sun.media.imageioimpl.common as it seemed a good location.

2. The (patched) class TIFFLZWDecompressor, which is the slightly adapted original and now uses the LimitedBufferedByteReader to read bytes from, instead of a byte[].

If you need an example LZW TIFF, please contact me by mail, our examples are customers' data and we will not publish them in public.

Comment by gerardkoppelaar

[ 11/Mar/08 ]

Here is a simple application that generates numerous test images to demonstrate the problem. don't forget to change the output path in the code, and if running on a low- memory system, you can reduce the image sizes if neccessary. But larger images better show how much memory is eaten up package tiffwritertest; import java.awt.Color; import java.awt.Dimension; import java.awt.GradientPaint; import java.awt.Graphics2D; import java.awt.Paint; import java.awt.image.BufferedImage; import java.io.File; import java.io.FileOutputStream; import java.io.OutputStream; import java.util.Iterator; import javax.imageio.IIOImage; import javax.imageio.ImageIO; import javax.imageio.ImageTypeSpecifier; import javax.imageio.ImageWriteParam; import javax.imageio.ImageWriter; import javax.imageio.metadata.IIOMetadata; import javax.imageio.metadata.IIOMetadataNode; import javax.imageio.spi.IIORegistry; import javax.imageio.spi.ImageWriterSpi; import javax.imageio.stream.ImageOutputStream;

/* This app writes some generated images as TIFF files.

 Make sure you give Java enough heap space (java -Xmx1600m ...),

 otherwise the prog will run out of heap space before the error occurs.

 you can also reduce the size of the images, if neccessary.

*/ import org.w3c.dom.Node; public class Main { public static void main(String[] args) { try {

// note: com.sun.media.imageioimpl.plugins.jpeg.CLibJPEGImageReader crashes VM on large images

// note: com.sun.media.imageio.plugins.jpeg.JPEGImageWriter crashes

VM on large images

// note: the readers/writers above are called for TIFFs containing

JPEG, because it picks the first available reader/writer from the list

// re-order jpeg image-writers to prefer imageio jpegwriter above imageioimpl jpegwriter

Iterator jpegWriters = ImageIO.getImageWritersByMIMEType("image/ jpeg");

ImageWriter imageIOImplPluginsWriter = null;

ImageWriter imageIOWriter = null; while (jpegWriters.hasNext()) {

ImageWriter jpegWriter = (ImageWriter) jpegWriters.next();

String name = jpegWriter.getClass().getName(); if (name.equals

("com.sun.media.imageioimpl.plugins.jpeg.CLibJPEGImageWriter"))

{ imageIOImplPluginsWriter = jpegWriter; } else if (name.equals

("com.sun.imageio.plugins.jpeg.JPEGImageWriter"))

{ imageIOWriter = jpegWriter; }

}

IIORegistry theRegistry = IIORegistry.getDefaultInstance(); if (imageIOImplPluginsWriter != null && imageIOWriter != null)

{ //System.out.println(imageIOWriter.getOriginatingProvider()); //System.out.println

(imageIOImplPluginsWriter.getOriginatingProvider()); theRegistry.setOrdering(ImageWriterSpi.class, imageIOWriter.getOriginatingProvider(), imageIOImplPluginsWriter.getOriginatingProvider()); }

// output dir for generated images

String outputPath = "c:/temp/testimages";

// Encodings

String[] compressionTypes = new String[]

{"CCITT RLE", "CCITT T.4", "CCITT T.6", "LZW", "JPEG", "PackBits", "ZLib", "Deflate"}

; boolean[] oneBitOnly = new boolean[]

{true, true, true, false, false, false, true, true}

;

// Image sizes

Dimension[] sizes = new Dimension[]

{new Dimension(8000, 6000)}

;

// Offsets (to write sizes less than, equals to, and larger than the specified sizes) int[] xOffsets = new int[]

{0}; // {-1, 0, 1}; int[] yOffsets = new int[]{0}

; // {-1, 0, 1};

File outputDir = new File(outputPath); if (!outputDir.exists())

{ outputDir.mkdirs(); } for (Dimension size : sizes) { int width = size.width; int height = size.height; int i = 0; for (String encoding : compressionTypes) { boolean oneBit = oneBitOnly[i++]; for (int xOffset : xOffsets) { int testWidth = width + xOffset; for (int yOffset : yOffsets) { int testHeight = height + yOffset;

/*

// enlarge one-bit images to create comparable memory blocks if (oneBit)

{ testWidth *= 3; testHeight *= 3; }

*/ for (boolean largeBlock : new boolean[]

{false, true}) { for (boolean tiled : new boolean[]{false, true}

) {

System.out.println("Writing " + encoding +

"-encoded TIFF, size=" + testWidth + "x" + testHeight + ", " +

(largeBlock ? "largeblock" :

"defaultblock") + ", " + (tiled ? "tiled" : "non-tiled")); try {

String fileName = outputPath +

File.separator +

"testimg_" + encoding + "_" + testWidth + "x" + testHeight +

"_" + (largeBlock ?

"largeblock" : "defaultblock") +

"_" + (tiled ? "tiled" : "non- tiled") + ".tif";

File file = new File(fileName); if (!file.exists()) { // skip if already there, easy for re-run after OOME or VM crash...

// create an image

BufferedImage bufferedImage; if (oneBit)

{ bufferedImage = new BufferedImage(testWidth, testHeight,

BufferedImage.TYPE_BYTE_BINARY); // one bit } else

{ bufferedImage = new BufferedImage(testWidth, testHeight,

BufferedImage.TYPE_INT_RGB); // 24-bit RGB }

// draw something nice in it

Graphics2D graphics = (Graphics2D) bufferedImage.getGraphics();

Paint paint = new GradientPaint(0,

0, Color.RED, 5, 10, Color.YELLOW, true); graphics.setPaint(paint); graphics.fillRect(0, 0, testWidth, testHeight); for (int y = 0; y < testHeight; y

+= 3) { for (int x = 0; x < testWidth; x += 3)

{ graphics.setColor(new Color ((int) (Math.random() * 16777216))); graphics.drawLine(x, y, x

+ 2, y + 2); }

}

// write the image

FileOutputStream fos = new

FileOutputStream(fileName); try

{ int myRowsPerStrip = largeBlock? testHeight:8; int myWidth = tiled? testWidth/2:0; int myHeight = tiled? testHeight:0; writeImage(fos, bufferedImage, encoding, myWidth, myHeight, myRowsPerStrip); } finally

{ fos.close(); }

System.out.println("ok");

}

} catch (Exception e)

{ e.printStackTrace(); }

}

}

}

}

}

}

} catch (Exception e)

{ e.printStackTrace(); }

} public static synchronized void writeImage(OutputStream outputStream,

BufferedImage bufferedImage, String compressionType, int tileWidth, int tileHeight, int rowsPerStrip) throws Exception { try {

Iterator<ImageWriter> imageWriterIt =

ImageIO.getImageWritersByMIMEType("image/tiff"); if (imageWriterIt == null || !imageWriterIt.hasNext())

{ throw new Exception("Could not find any imageWriter"); } else { boolean success = false; while (imageWriterIt.hasNext() && !success) {

ImageWriter imageWriter = null; try { imageWriter = imageWriterIt.next();

System.out.println("Trying imageWriter " + imageWriter.getClass().getName());

ImageWriteParam imageWriteParam = imageWriter.getDefaultWriteParam(); imageWriteParam.setCompressionMode

(ImageWriteParam.MODE_EXPLICIT); imageWriteParam.setCompressionType(compressionType);

IIOMetadata newMetadata = null; if (tileWidth > 0 && tileHeight > 0) { if (imageWriteParam.canWriteTiles())

{ imageWriteParam.setTilingMode (ImageWriteParam.MODE_EXPLICIT); imageWriteParam.setTiling(tileWidth, tileHeight, 0, 0); } else

{ throw new Exception("Writer cannot set tiled mode"); }

} if (rowsPerStrip > 0) {

ImageTypeSpecifier its = new ImageTypeSpecifier

(bufferedImage); newMetadata = imageWriter.getDefaultImageMetadata

(its, imageWriteParam); if (newMetadata == null)

{ throw new Exception("Writer cannot get metadata: " + imageWriter.getClass().getName()); }

String nativeName = newMetadata.getNativeMetadataFormatName();

Node imageNode = newMetadata.getAsTree(nativeName); addTIFFIFDShortValueNode(imageNode, 278, rowsPerStrip); newMetadata.setFromTree(nativeName, imageNode);

}

ImageIO.setUseCache(false);

ImageOutputStream ios = ImageIO.createImageOutputStream

(outputStream); try { imageWriter.setOutput(ios);

IIOImage iioimage = null; try { if (newMetadata != null)

{ iioimage = new IIOImage(bufferedImage, null, newMetadata); } else

{ iioimage = new IIOImage(bufferedImage, null, null); } imageWriter.write(null, iioimage, imageWriteParam);

} finally

{ iioimage = null; }

} finally

{ ios.close(); ios = null; } success = true;

} catch (Exception e)

{ e.printStackTrace(); }

finally { if (imageWriter != null)

{ imageWriter.dispose(); imageWriter = null; }

}

} if (!success)

{ throw new Exception("Could not write image"); }

}

} catch (Exception e)

{ e.printStackTrace(); throw e; }

} public static void addTIFFIFDShortValueNode(Node node, int tagNumber, int value) throws Exception

{ Node tagSetNode = node.getFirstChild(); IIOMetadataNode fieldNode = new

IIOMetadataNode("TIFFField"); tagSetNode.appendChild(fieldNode); fieldNode.setAttribute("number", "" + tagNumber); IIOMetadataNode shortsNode = new

IIOMetadataNode("TIFFShorts"); fieldNode.appendChild(shortsNode); IIOMetadataNode shortNode = new IIOMetadataNode("TIFFShort"); shortNode.setAttribute("value", "" + value); shortsNode.appendChild(shortNode); }

}

Comment by gerardkoppelaar

[ 11/Mar/08 ]

Created an attachment (id=130)

Sample log showing size of byte[] buffers created when decoding different images

Generated at Sun Mar 06 03:51:24 UTC 2016 using JIRA 6.2.3#6260sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.

Download