August 2010 iText Presentation

advertisement
PDF Generation with iText
Presented by Greg Holling
What is iText?

Java/C# library

Open source

Generates PDF on-the-fly

Servlet- and JSP-friendly


PDF can be generated by a servlet
Supports lots of PDF functionality

Bookmarks, watermarks

PDF forms

Digital signatures
The Good

Mature Library

Deep & broad PDF support

Open Source

Easy to create “preview” PDF from servlet/JSP

Active user base & mailing list

Training & consulting available

Lots of online examples

Book “iText in Action” (preview 2nd edition)
The Bad

Javadoc (and code comments) are often sparse

Need the book to use iText effectively

eBook costs $35, and has problems

Current edition is somewhat dated

New edition is incomplete (MEAP)

Explanations are buried in examples

Some information is difficult to find
The Ugly


Multiple ways to accomplish something

Sort of like Unix, but...

Sometimes one works, and the other doesn't
Responses on mailing list often begin with “did
you read the book?”


Or the corollary: “That method isn't intended to
be used that way...”
Example: getYLine()

JavaDoc says “gets the Y Line”

Book: no description, only a source example
Background


Goal: Brochures for community college students

Students create brochures

Admin customizes brochure look & feel

Pricing: Monthly subscription for college
Web-based

Deployed on Windows Server 2003

Original plan: shrinkwrap



Complex distribution and pricing
Inexperienced sysadmins
Future: mobile deployment (students)
Software Stack

JDK 1.6 + Servlet/JSP

Tomcat 6.0.28

Apache Commons File Upload 1.2.1

itext 2.0.8

Jdom

opencsv 2.1

jUnit 4.8.2

[tagsoup, htmlcleaner, flying saucer, mongoDB]
DEMO

Student interface


Generated PDF
Administrative interface

PDF preview
Two Different Worlds
Display (esp. web)
Print
Color Model
RGB
CMYK, Pantone
Photo Representation
jpg/tiff/png
jpg/eps
Explicit Layout Control
Nice to have
Critical
Fonts
Arial/Helvetica/Whatever
Helvetica Bold, NOT Bold
Helvetica
Photos
Highly compressed (<100k) Uncompressed (1 MB+)
Sizes
Points
DPI; point ~ 1/72”
Leading
Huh???
Depends on context
Whitespace
Dynamic layout
Design element
Graphics (e.g. dotted line)
Whatever is there
Very precise segment
length, endcap shape, ...
y=0
Top of page
Bottom of page
iText “Hello World”

Cookie cutter steps:

Create a new Document object


Create a PdfWriter



initializes margins, other generic properties
associates a document with a file/stream
Stream can be a ServletOutputStream
Open the document

prepares for writing

Add content

Close the document
iText Key Classes

Document – margins, orientation, etc.

PdfReader – reads an existing PDF

PdfWriter – low-level output


Can be written to BAOS /
ServletOutputStream
PdfContentByte - “layer”, for low-level output

Can be overlaid

PdfStamper – add content to existing PDF

PdfCopy – combine pages from PDF's
More Key Classes

Element – logical element

Chunk – StringBuffer containing font info

Phrase – ArrayList of Chunk, includes Leading

Paragraph – Phrases + newline + alignment

List, ListItem – Bulleted list

Anchor – Hypertext link

ColumnText – a column of text & images

PdfPTable / PdfPCell – a table
Fonts

Two primary font classes:

BaseFont


Font


Font size, other modifiers
Font is used by most text-related classes


Font name, embedded?, font file name
BaseFont is used by PdfWriter
Font constructor takes a BaseFont or
Font.FontFamily object

BaseFont for embedded fonts

FontFamily for predefined fonts
Predefined Fonts

All PDF readers are required to handle these

Readers may substitute a similar font

Helvetica => Arial, e.g.

Use embedded fonts to avoid substitution

No space penalty for using these in PDF

Fonts:

Courier, Helvetica, Symbol, Times,
ZapfDingbats

Bold & Italic variants for all except
ZapfDingbats
Leading

Pronounced like “sledding”

Origin: lead separator inserted above a line

PDF (iText): spacing above a line of text

Aliases: line spacing

Note: 1 inch = 72 points (approx.)


Note: Spacing before a paragraph is different
than leading
Can be specified in points or % of font size
Embedded Fonts

Obtain font information from a file

Adobe Type 1 (.afm, .pfm, .pfb), TrueType
(.ttf), OpenType (.otf)

OpenType gives the best cross-platform
behavior

Font file is specified in BaseFont constructor

Increase PDF size


Only the glyphs used in PDF are embedded

Size increase may still be significant, esp. CJK
Watch for licensing restrictions
Hypertext Links

Can be included in PDF

To create:

Create a Chunk with the appropriate font color

Chunk.setAction (new PdfAction(...));

Embed the Chunk in a Paragraph or other
iText element
Graphics

PdfContentByte can create rudimentary
graphics

Line segments, solid or dashed


Color, line end/cap style, dash style
Filled or unfilled polygons

Fill color/tint can be specified

All units are relative to the edge of the page

stroke() renders the graphics


Nothing is rendered until stroke() is called
NOTE: LineSeparator can be used for a
horizontal line in the Document
iText and Java2D


PdfTemplate.createGraphicsShapes() returns a
Graphics2D object

Can be passed to a paint() method

The template object can be passed to
PdfContentByte.addTemplate()

Allows arbitrary Java2D graphics in PDF
AffineTransform can be passed to some iText
methods:

addImage()

setTextMatrix()

Image/text scaling, rotation, trasformation
Images

iText class: com.itextpdf.text.Image

Image formats:

JPEG[2000], GIF, PNG, BMP, WMF, TIFF,
JBIG2

Color models: RGB, CMYK



NOTE: imageio throws an exception when
reading CMYK images
Operations: scaling, transparency, masking
NOTE: Scaling doesn't reduce image quality or
size

Just affects rendering

Big image files => big PDF's
ColumnText

Logical column, positioned explicitly on the
page

Rectangular or complex shape

Content is added top-to-bottom

go() renders content

Nothing happens until go()

Can be used to make sure content will fit

go(true) simulates output

go(false) or go() renders content
PDF Preview in Servlet


PdfWriter constructor takes an OutputStream
argument

Can be any OutputStream

Including ServletOutputStream
This allows servlet to generate a preview PDF

PdfWriter => ServletOutputStream


Small PDF's only
temp file => ServletOutputStream

More flexibility, can be used for larger files
PdfStamper

Adds content to an existing PDF

Can read and write stream or byte array

Allows chaining of PDF generation ops

Content can be written on top or underneath

Useful for:

Table of contents

“Page x of y” in header/footer

Watermarks or “Confidential” notation
General iText Cautions

72 points = 1” (approx)

Units are float, not double

Font + bold modifier ≠ bold font

Spacing before paragraph ≠ leading

Watch font licensing restrictions


Images are automatically centered & resized if
they reside in a PdfCell
∑ Image size => PDF size (approx)


Scaling images doesn't affect PDF size
Beware HTML caching, especially IE
PDF Size

Big issue for this project

Two primary things affect PDF size:

Images



scaling doesn't affect size/resolution
Embedded fonts
First example PDF was 10 MB+

Rejected by email server

5+ second download

Changing image size/resolution => 300k PDF

Moral: Use small, low-res images
IE Browser Caching

IE Browser Caching

GET requests only

Symptoms: page not cleared

Workaround: Use POST or HTTP headers



Also consider session.invalidate()
Note: doesn't help with tabs
JSP workaround:
<%
response.setHeader (“Cache-Control”, “no-cache”);
response.setHeader (“Pragma”, “no-cache”);
response.setDateHeader (“Expires”, -1);
%>
References

iText website:


Book:


http://www.itextpdf.com/
http://www.itextpdf.com/book/
Examples (from the book):

http://www.itextpdf.com/examples/index.php
Questions?
Greg Holling
303/274-9001
greg@holling-co.com
Download