Moinul Zaber's presentation on Steganography

advertisement
Steganography and Approaches of
Data Hiding in Digital Images
Presenter : Moinul I Zaber,
Dept.of CS, Kent State University
Based on
 F5 – a Steganographic algorithm, by ,Andres Westfeld,
Technische Universit¨at Dresden, Institute for System
Architecture Dresden, Germany
 Applications of data hiding in digital images,by
Jessica Fridrich, Center for Intelligent Systems SUNY
Binghamton, Binghamton, NY 13902-6000, U.S.A.
 Steganalysis of JPEG Images: Breaking the F5
Algorithm, by Jessica Fridrich, Miroslav Goljan, Dorin
Hogea, SUNY Binghamton, Binghamton, NY 13902-6000,
USA
Prologue
- Steganography is the art of invisible communication. Its
purpose is to hide the very presence of communication
by embedding messages into innocuous-looking cover
objects.
- Data Hiding in digital image is getting attention by
cryptographers and security engineers
- Existing methods are not strong against attacks
- Goal is to devise a method that is strong against attacks
and can control the image quality
Outline of the Presentation
 Introduction and History of Data hiding
 Digital Image representation by Computer technology
 Exiting data hiding methods (in Jpeg)
-JSteg (LSB modification)
-F5 (Matrix embedding)
 Discussions
Data Hiding in Digital Imagery
• Relatively very young and fast growing
• Well over 90% of all publications published in the last 10 years
• Highly multidisciplinary field combining image and signal
processing with cryptography, communication theory, coding
theory, signal compression, and the theory of visual perception.
• Tremendous interest from industry and military
History of Data Hiding
• First techniques included invisible ink, secret writing using
chemicals, templates laid over text messages, microdots,
changing letter/word/line/paragraph spacing, changing fonts
• Images, video, and audio files provide sufficient redundancy
for effective data hiding
• Postscript files, PDF files, and HTML can also be used for
non-robust data hiding to a limited extent
• Executable files, provide very little space for data hiding
• Fonts can also be used
The Need for Data Hiding
• Covert communication using images (secret message is
hidden in a carrier image)
• Ownership of digital images, authentication, copyright
• Data integrity, fraud detection, self-correcting images
• Traitor-tracing (fingerprinting video-tapes)
• Adding captions to images, additional information,
such as subtitles, to video, embedding subtitles or audio
tracks to video (video-in-video)
• Intelligent browsers, automatic copyright information,
viewing a movie in a given rated version
• Copy control (secondary protection for DVD)
Requirements
Application
Covert communication
Copyright protection of images (authentication)
Fingerprinting (traitor-tracing)
Adding captions to images, additional information,
such as subtitles, to videos
Image integrity protection (fraud detection)
Copy control in DVD
Intelligent browsers, automatic copyright
information, viewing movies in given rated version
capacity
robustness
invisibility
security
embedding complexity
detection complexity
Requirements
Low
High
Redundancy and Irrelevancy is needed for data hiding
 2 gray
levels
+
=
 5 gray
levels
+
=
 31 gray
levels
Original
+
=
Definition of Data Hiding
Key
Carrier
document
Secret
message
Embedding
algorithm
Secret
message
Transmission
via network
• Carrier – message Relationship
•Who extracts the message? (source versus destination coding)
• How many recipients are there?
• Is the key a public knowledge or a shared secret?
• Do we embed different messages into one carrier?
• Embedding / detection bundled with a key in a tamper-proof hardware?
• Is the speed of embedding / detection important?
Detector
Key
The “Magic” Triangle
Capacity
Naïve steganography
Secure steganographic
techniques
Undetectability
There is a trade-off
between capacity,
invisibility, and robustness
Digital watermarking
Robustness
Additional factors: • Complexity of embedding / extraction
• Security
Properties of hiding schemes
Robustness
The ability to extract hidden information after common image processing operations:
linear and nonlinear filters, lossy compression, contrast adjustment, recoloring,
resampling, scaling, rotation, noise adding, cropping, printing / copying / scanning, D/A
and A/D conversion, pixel permutation in small neighborhood, color quantization (as in
palette images), skipping rows / columns, adding rows / columns, frame swapping,
frame averaging (temporal averaging), etc.
Undetectability
Impossibility to prove the presence of a hidden message. This concept is inherently
tied to the statistical model of the carrier image. The ability to detect the presence does
not automatically imply the ability to read the hidden message. Undetectability should
not be mistaken for invisibility  a concept related to human perception.
Invisibility
Perceptual transparency. This concept is based on the properties of the human visual
system or the human audio system.
Security
The embedded information cannot be removed beyond reliable detection by targeted
attacks based on a full knowledge of the embedding algorithm and the detector
(except a secret key), and the knowledge of at least one carrier with hidden message.
Detecting secret messages
@The ability to detect secret messages in images is related
to the message length. Obviously, the less information we
embed into the cover-image, the smaller the probability of
introducing detectable artifacts by the embedding process.
@Each steganographic method has an upper bound on the
maximal safe message length(or the bit-rate expressed in bits
per pixel or sample) that tells us how many bits can be safely
embedded in a given image without introducing any
statistically detectable artifacts. Determining this maximal
safe bit-rate (or steganographic capacity) is a non trivial task
even for the simplest methods.
Choice of Cover-Image
 The choice of cover-images is important because it
significantly influences the design of the stego-system and its
security.
 Images with a low number of colors, computer art, images
with a unique semantic content, such as fonts, should be
avoided.
 grayscale images are considered the best cover-images.
uncompressed scans of photographs or images obtained with
a digital camera containing a high number of colors, can be
considers to be safest for steganography.
Choice of Image format
@The choice of the image format also makes a very big impact on
the design of a secure steganographic system. Raw, uncompressed
formats, such as BMP, provide the biggest space for secure
steganography, but their obvious redundancy makes them
very suspicious.
@Fridrich et al. have recently shown that cover-images stored in the
JPEG format are a very poor choice for steganographic methods
that work in the spatial domain.
@Consequently, one should avoid using decompressed JPEG images
as covers for spatial steganographic methods, such as the LSB
embedding or its variants.
JPEG is the chosen one!
 The JPEG format attracted the attention of researchers as the
main steganographic format due to the following reasons:
 It is the most common format for storing images, JPEG
images are very abundant and they are almost solely used for
storing natural images.
 Modern steganographic methods can also provide reasonable
capacity without necessarily sacrificing security. Pfitzmann
and Westfeld proposed the F5 algorithm as an example of a
secure but high capacity JPEG steganography. The authors
presented the F5 algorithm as a challenge to the scientific
community at the Fourth Information Hiding Workshop in
Pittsburgh in 2001
Digital Image
Bitmap Image
 Black colors are
represented by 1
 White colors are
represented by 0
Image representation
Color Image
RGB Image:
Numbers representing
Images !!
Data Hiding
So how do we
actually hide data
in to Digital Image?
•We can hide in the Pixel
Value
•We can hide in the
Coefficient
LSB and how to get it?
LSB:
3 modulus 2 = 1
4 modulus 2 = 0
LSB:
Odd Values = 1
Even Values= 0
JSteg Method
U. Derek. "Jsteg Staganographic Method,”
Jsteg
Jsteg Algorithm
LSB method is Secure?
•LSB modification
methods are not
secure.
•Secret message is
easily retrievable
The steganographic tool Jsteg embeds messages in lossy compressed
JPEG files. It has a high capacity—e. g., 12 % of the steganogram’s
size—and, it is immune against visual attacks. However, a statistical
attack discovers changes made by Jsteg
F5 Method
A. Westfeld, "F5: A steganographic algorithm: High capacity despite better
steganalysis," in Lecture Notes in Computer Science, vol. 2137, pp. 289-302,
(2001)
F5 Method (cont..2/2)
1 0 1
H 

0
1
1


x  2 3 4
m  1 1
5 mod 2 1 
H * x'm'    
 
6
 
0 
x" 3 3 4
m  H * x"
7 mod 2 1
m'    
 
7
 
1
Steps of F5 Algorithm
1. Get the RGB representation of the input image.
2. Calculate the quantization table corresponding to quality factor Q and compress
the image while storing the quantized DCT coefficients.
3. Compute the estimated capacity with no matrix embedding
C = hDCT – hDCT /64 – h(0) – h(1) + 0.49h(1), where hDCT is the number of all DCT
coefficients, h(0) is the number of AC DCT coefficients equal to zero, h(1) is the
number of AC DCT coefficients with absolute value 1, hDCT/64 is the number of
DC coefficients, and –h(1)+0.49h(1) = –0.51h(1) is the estimated loss due to
shrinkage (see Step 5). The parameter C and the message length together determine
the best matrix embedding.
4. The user-specified password is used to generate a seed for a PRNG that
determines the random walk for embedding the message bits. The PRNG is also
used to generate a pseudo-random bit-stream that is XOR-ed with the message to
make it a randomized bit-stream. During the embedding, DC coefficients and
coefficients equal to zero are skipped.
5. The message is divided into segments of k bits that are embedded into a group of
2k–1 coefficients along the random walk. If the hash of that group does not match
the message bits, the absolute value of one of the coefficients in the group is
decreased by one to obtain a match. If the coefficient becomes zero, the event is
called shrinkage, and the same k message bits are re-embedded in the next group
of DCT coefficients (we note that LSB(d)= d mod 2, for d > 0, and LSB(d)=1– d
mod 2, for d < 0).
6. If the message size fits the estimated capacity, the embedding proceeds,
otherwise an error message showing the maximal possible length is displayed.
There are rare cases when the capacity estimation is wrong due to a larger than
anticipated shrinkage. In those cases, the program embeds as much as possible
and displays a warning.
Take Home Information
@ we assume that the steganographic method is publicly
known with the exception of a secret key. The method is
secure if the stego-images do not contain any detectable
artifacts due to message embedding. In other words, the set
of stego-images should have the same statistical properties as
the set of cover-images.
@ If there exists an algorithm that can guess whether or not a
given image contains a secret message with a success rate
better than random guessing, the steganographic system is
considered broken.
Conclusion
 Digital data hiding is the research topic of the time.
 Existing algorithms are not secured.
 Protocol is known but key is unknown.
 F5 has High security due to matrix embedding
 F5 has Less modification but less vulnerability
 F5 has high capacity and better steganalysis
 F5, is probably one of the most advanced programs publicly
available, it uses methods to compensate for the introduced
changes so statistical analysis is difficult.Yet it is considered
broken.
 Demonstration of StegoMagic.
 Discussion
Download