Applications of Data Hiding in Digital Images

advertisement
Applications of Data Hiding in Digital
Images
Tutorial for The ISSPA’99,
Brisbane, Australia
August 22-25, 1999
Jessica Fridrich
Center for Intelligent Systems
SUNY Binghamton, Binghamton, NY 13902-6000, U.S.A,
and
Mission Research Corporation
1720 Randolph Rd. SE, Albuquerque, NM 87105, U.S.A
Fax/Ph: (607) 777-2577
E-mail: fridrich@binghamton.edu
Http://www.ssie.binghamton.edu/fridrich
Outline
• Introduction to Data Hiding
- History
- Motivation
- Definition
- Terminology
- Properties
• Covert communication (steganography)
• Digital watermarking (robust message embedding)
• Watermarking for tamper detection and authentication
• Attacks on hiding schemes
• Open problems, challenges
Data Hiding in Digital Imagery
• Relatively very young and fast growing
• Well over 90% of all publications published in the last 6 years
• Highly multidisciplinary field combining image and signal
processing with cryptography, communication theory,
coding theory, signal compression, and the theory of visual
perception
• Tremendous interest from industry and military
Data Hiding - History
• First techniques included invisible ink, secret writing using
chemicals, templates laid over text messages, microdots,
changing letter/word/line/paragraph spacing, changing fonts
• Images, video, and audio files provide sufficient redundancy
for effective data hiding
• Postscript files, PDF files, and HTML can also be used for
non-robust data hiding to a limited extent
• Executable files, provide very little space for data hiding
• Fonts
The Need for Data Hiding
• Covert communication using images (secret message is
hidden in a carrier image)
• Ownership of digital images, authentication, copyright
• Data integrity, fraud detection, self-correcting images
• Traitor-tracing (fingerprinting video-tapes)
• Adding captions to images, additional information,
such as subtitles, to video, embedding subtitles or audio
tracks to video (video-in-video)
• Intelligent browsers, automatic copyright information,
viewing a movie in a given rated version
• Copy control (secondary protection for DVD)
Requirements
Application
Covert communication
Copyright protection of images (authentication)
Fingerprinting (traitor-tracing)
Adding captions to images, additional information,
such as subtitles, to videos
Image integrity protection (fraud detection)
Copy control in DVD
capacity
robustness
invisibility
security
embedding complexity
detection complexity
Intelligent browsers, automatic copyright
information, viewing movies in given rated version
Requirements
Low
High
make data hiding possible
and
• Information-theoretic
• Removed by lossless
compression
• Perceptual
• Removed by lossy
compression
 2 gray
levels
+
=
 5 gray
levels
+
=
 31 gray
levels
Original
+
=
Data Hiding - Definition
Key
Carrier
document
Secret
message
Embedding
algorithm
Secret
message
Transmission
via network
Detector
Key
• Relationship carrier - message
• Who extracts the message? (source versus destination coding)
• How many recipients are there?
• Is the key a public knowledge or a shared secret?
• Do we embed different messages into one carrier?
• Embedding / detection bundled with a key in a tamper-proof hardware?
• Is the speed of embedding / detection important?
Properties of hiding schemes
Robustness
The ability to extract hidden information after common image processing operations:
linear and nonlinear filters, lossy compression, contrast adjustment, recoloring,
resampling, scaling, rotation, noise adding, cropping, printing / copying / scanning, D/A
and A/D conversion, pixel permutation in small neighborhood, color quantization (as in
palette images), skipping rows / columns, adding rows / columns, frame swapping,
frame averaging (temporal averaging), etc.
Undetectability
Impossibility to prove the presence of a hidden message. This concept is inherently
tied to the statistical model of the carrier image. The ability to detect the presence does
not automatically imply the ability to read the hidden message. Undetectability should
not be mistaken for invisibility  a concept related to human perception.
Invisibility
Perceptual transparency. This concept is based on the properties of the human visual
system or the human audio system.
Security
The embedded information cannot be removed beyond reliable detection by targeted
attacks based on a full knowledge of the embedding algorithm and the detector
(except a secret key), and the knowledge of at least one carrier with hidden message.
The “Magic” Triangle
Capacity
Naïve steganography
Secure steganographic
techniques
Undetectability
There is a trade-off
between capacity,
invisibility, and robustness
Digital watermarking
Robustness
Additional factors: • Complexity of embedding / extraction
• Security
Outline
• Introduction
• Covert communication (steganography)
Message hiding in RGB images
- Absolutely secure steganographic method
- LSB encoding
Message hiding in palette images
- Permuting the palette
- LSB encoding in the palette
- EZ Stego
- Improved EZ Stego
• Digital watermarking (robust message embedding)
• Watermarking for tamper detection and authentication
• Attacks on watermarks
• Open problems, challenges
Covert Communication
Purpose:
Encryption:
To conceal the very presence of communication,
to make the communication invisible.
To make the message unintelligible
Warden
Willie
Andy
Bob
Secret communication??!!
I just posted a picture of my
cat on my web page!
Covert Communication
Secret
message
- Encryption and steganography
provide double protection
- Randomized message is easier
to hide
Encryption
Unit
Carrier
Image
Embedding
Algorithm
Modified
Carrier
Steganography for RGB images
Absolutely secure steganographic technique
Method:
Embed a small message (8 bits), by repeated scanning of
a cover image till a certain password-dependent messagedigest function returns the required 8-tuple of bits.
Comments:
• Absolute secrecy tantamount to one time pad used in
cryptography
• Guarantees correct noise distribution and undetectability.
• Time consuming, very limited capacity, not applicable to
image carriers for which we only have one copy.
Steganography for RGB images
LSB Encoding (Least Significant Bit)
Method:
• Replace the LSB of each pixel with the secret message
• Pixels may be chosen randomly according to a secret key
• Pixels may be chosen adaptively according to neighborhood
•Message should always be encrypted
Comments:
• The simplest and most common steganographic technique
• Premise = changes to the least significant bit will be masked by
noise commonly present in digital images.
• Color images provide more room for hiding messages
• If more than one LSB is used, statistically detectable changes may
result
• A provably secure method should introduce changes consistent
with the noise model
Steganography for palette images
LSB encoding cannot be directly applied to palette-based
images because new colors, that are not present in the palette,
would be created.
Two sources of palette images:
1. Color truncation + dithering of photographs
2. Computer generated images (fractals, cartoons, animations)
A secure steganographic method will produce modified carriers
compatible with the source
Possibilities
Hiding in the palette
Hiding in the image data
Non-adaptive techniques
Adaptive techniques
Artifacts
Palette artifacts
Image data artifacts
Possible approaches
Message hiding in the palette
Permuting palette entries
- Image is not modified
- Very limited capacity of log2(256!)=215 bytes
- Too fragile (resaving)
- Suspicious palette order is an artifact
LSB encoding in the palette
- Very limited capacity (at most 3256 bits)
- Palette artifacts?
Common disadvantage: Capacity is severely limited and
independent of the image size
Possible approaches
Message hiding in the image data - greedy techniques
Decrease color depth and expand
1. Collapse 256 colors  128 colors
2. Expand 128 colors  256 colors by including a close color
(e.g., flip the LSB of the blue channel)
3. Embed a binary message into the LSB of the blue channel
of randomly selected pixels
1 bpp
4. Read the message from the LSB of the blue channel
Alternatively
1. Decrease color depth to 32 colors and include all colors obtained
from LSB shuffling of all 32 colors (one color produces 23 new
colors)
3 bpp
2. Encode messages into the LSB of pixel colors
Possible approaches
Message hiding in the image data
Parity embedding
1. Assign parity to palette colors
2. Embed message bits as the parity of colors
Message: 0 1 1 0 0 1 0 1 0 1 1 1 0 1 0 1 0 1 0 0 0 1 1 1 1
Randomly chosen pixel with color
C1
Find the color
C1
in the sorted palette
index = 30 = 00011110
00011110
00011111
C2
Replace the LSB of the index to
color C1 with the message bit
The new index now points to a
neighboring color C2
Replace the index of the pixel in
the original image to point to the
new color C2.
Sorted palette
Critical assumption: Colors close in the luminance-sorted palette
are also close in the color space.
New approach using color parities
Message hiding in the image data
(1) For each message bit randomly select a pixel
(2) Calculate the set of the closest palette colors (in Euclidean norm)
The distance d between colors (R1G1B1) and (R2G2B2) is
d 2 = (R1–R2)2+ (G1–G2)2+ (B1–B2)2
(3) Find the closest color whose parity agrees with the message
bit. Parity of a color is defined as R+G+B mod 2.
1 bpp
(4) Change the index for the pixel to point to the new color.
To extract the secret message, pixels are selected using a key and
the secret message is simply read by extracting the parity bits of
the colors of selected pixels.
Message: 0 1 1 0 0 1 0 1 0 1 1 1 0 1 0 1 0 1 0 0 0 1 1 1 1
Randomly chosen pixel with color
C1
Find the closest colors in the palette
…
…
Replace C1 with the closest color that has the same
parity as the message bit
Color parity of (R,G,B) = R+G+B mod 2.
Advantages over EZ Stego:
• The total change to the image due to message embedding is always
smaller
• We avoid occasionally large changes in color that are possible with
EZ Stego
Optimal parity assignment
Oblivious reading requirement:
The optimal parity assignment has to be reconstructable from
the modified image at the receiving end.
Optimal parity
embed
message
Optimal parity
= Modified
=
carrier
Extract
message
• Efficient algorithm for optimal parity assignment
• Optimal parity depends only on the palette and does not
depend on the image content!
• The optimal palette is also optimal for multiple-pixel embedding
The average decrease in the RMS error due to optimal palette parity
is about 25-35%.
Adaptive Steganography
Non-adaptive steganography = modifications due to message
embedding are uncorrelated with image features. Examples are LSB
encoding in randomly selected pixels, modulation of randomly
selected frequency bins in a fixed band, etc.
Adaptive steganography = modifications are correlated with the
image content (features).
- Pixels carrying message bits are selected adaptively
depending on the image
- Avoiding areas of uniform color
- Selecting pixels with large local standard deviation
Potential problem with message recovery: We have to be able to
extract the same set of message carrying pixels at the receiving
end from the modified image.
Computer
generated
Julia set
• Large areas of uniform color
• Internal structure of the image - it is a fractal Julia set
• Fonts
Artifacts caused by non-adaptive methods
Artifacts around the Julia set.
Artifacts in the fonts.
Method 1: Adaptive block embedding
Message embedding
• Divide the image into disjoint 33 blocks
• Randomly choose blocks and evaluate some local
statistical quantity, such as standard deviation or
number of colors and decide whether or not a
message bit can be embedded (good vs. bad block)
• If block is bad, skip it and do not insert message bit
• If block is good, insert the bit into the block parity
• If after embedding the block becomes bad, keep
the change but repeat the same message bit in the
next block
Message extraction
• Generate the same random walk through the
image blocks
• Read the parity from all good blocks
Limitations
Ultimately, image understanding is important for secure adaptive
steganography. A human can easily recognize that a pixel is actually
a dot above the letter "i" and must not be changed. However, it would
be very hard to write a computer program capable of making such
intelligent decisions in all possible cases.
Example of a difficult area for secure
adaptive message embedding - fonts on
a complex background
Embedding while dithering
True-color images are converted to palette images via
- color quantization
- dithering
Idea: To embed message bits while doing the dithering
Quantize
256 color
image
Increase color depth
by interpolating
True color
image
Or start directly with
the true-color image
Compute
palette
Dither
and
Embed
Embedding while dithering
1. Select a random collection of pixels that will carry message bits.
2. For non-message pixels use classical dither to the closest palette color
3. For message pixels dither to the closest color with the right parity.
Rounding error is added to the next pixel
+
p11
p12+E11
Q
E11 = q11 - p11
q11
Original 24-bit image
Q
q1
Dithered quantized image
2
Q: Non-message pixels:
Message pixels:
Palette P = {q1, …, q256}
q is the closest palette color
q is the closest palette color
with the right parity
Performance example
Test image in JPEG format
Original
Non-adaptive
Embedding
while dithering
Outline
• Introduction, history, motivation, definition,
terminology, properties
• Covert communication (steganography)
• Digital watermarking (robust message embedding)
- Copyright protection of digital images (authentication)
- Fingerprinting (traitor-tracing)
- Adding captions to images, additional information to videos
- Methods for Robust Data Hiding (Watermarking)
- Image integrity protection (fraud detection)
- Copy control in DVD
• Watermarking for tamper detection and authentication
•Attacks on watermarks
• Open problems, challenges
Download