Steganography and Approaches of Data Hiding in Digital Images Presenter : Moinul I Zaber, Dept.of CS, Kent State University Based on F5 – a Steganographic algorithm, by ,Andres Westfeld, Technische Universit¨at Dresden, Institute for System Architecture Dresden, Germany Applications of data hiding in digital images,by Jessica Fridrich, Center for Intelligent Systems SUNY Binghamton, Binghamton, NY 13902-6000, U.S.A. Steganalysis of JPEG Images: Breaking the F5 Algorithm, by Jessica Fridrich, Miroslav Goljan, Dorin Hogea, SUNY Binghamton, Binghamton, NY 13902-6000, USA Prologue - Steganography is the art of invisible communication. Its purpose is to hide the very presence of communication by embedding messages into innocuous-looking cover objects. - Data Hiding in digital image is getting attention by cryptographers and security engineers - Existing methods are not strong against attacks - Goal is to devise a method that is strong against attacks and can control the image quality Outline of the Presentation Introduction and History of Data hiding Digital Image representation by Computer technology Exiting data hiding methods (in Jpeg) -JSteg (LSB modification) -F5 (Matrix embedding) Discussions Data Hiding in Digital Imagery • Relatively very young and fast growing • Well over 90% of all publications published in the last 10 years • Highly multidisciplinary field combining image and signal processing with cryptography, communication theory, coding theory, signal compression, and the theory of visual perception. • Tremendous interest from industry and military History of Data Hiding • First techniques included invisible ink, secret writing using chemicals, templates laid over text messages, microdots, changing letter/word/line/paragraph spacing, changing fonts • Images, video, and audio files provide sufficient redundancy for effective data hiding • Postscript files, PDF files, and HTML can also be used for non-robust data hiding to a limited extent • Executable files, provide very little space for data hiding • Fonts can also be used The Need for Data Hiding • Covert communication using images (secret message is hidden in a carrier image) • Ownership of digital images, authentication, copyright • Data integrity, fraud detection, self-correcting images • Traitor-tracing (fingerprinting video-tapes) • Adding captions to images, additional information, such as subtitles, to video, embedding subtitles or audio tracks to video (video-in-video) • Intelligent browsers, automatic copyright information, viewing a movie in a given rated version • Copy control (secondary protection for DVD) Requirements Application Covert communication Copyright protection of images (authentication) Fingerprinting (traitor-tracing) Adding captions to images, additional information, such as subtitles, to videos Image integrity protection (fraud detection) Copy control in DVD Intelligent browsers, automatic copyright information, viewing movies in given rated version capacity robustness invisibility security embedding complexity detection complexity Requirements Low High Redundancy and Irrelevancy is needed for data hiding 2 gray levels + = 5 gray levels + = 31 gray levels Original + = Definition of Data Hiding Key Carrier document Secret message Embedding algorithm Secret message Transmission via network • Carrier – message Relationship •Who extracts the message? (source versus destination coding) • How many recipients are there? • Is the key a public knowledge or a shared secret? • Do we embed different messages into one carrier? • Embedding / detection bundled with a key in a tamper-proof hardware? • Is the speed of embedding / detection important? Detector Key The “Magic” Triangle Capacity Naïve steganography Secure steganographic techniques Undetectability There is a trade-off between capacity, invisibility, and robustness Digital watermarking Robustness Additional factors: • Complexity of embedding / extraction • Security Properties of hiding schemes Robustness The ability to extract hidden information after common image processing operations: linear and nonlinear filters, lossy compression, contrast adjustment, recoloring, resampling, scaling, rotation, noise adding, cropping, printing / copying / scanning, D/A and A/D conversion, pixel permutation in small neighborhood, color quantization (as in palette images), skipping rows / columns, adding rows / columns, frame swapping, frame averaging (temporal averaging), etc. Undetectability Impossibility to prove the presence of a hidden message. This concept is inherently tied to the statistical model of the carrier image. The ability to detect the presence does not automatically imply the ability to read the hidden message. Undetectability should not be mistaken for invisibility a concept related to human perception. Invisibility Perceptual transparency. This concept is based on the properties of the human visual system or the human audio system. Security The embedded information cannot be removed beyond reliable detection by targeted attacks based on a full knowledge of the embedding algorithm and the detector (except a secret key), and the knowledge of at least one carrier with hidden message. Detecting secret messages @The ability to detect secret messages in images is related to the message length. Obviously, the less information we embed into the cover-image, the smaller the probability of introducing detectable artifacts by the embedding process. @Each steganographic method has an upper bound on the maximal safe message length(or the bit-rate expressed in bits per pixel or sample) that tells us how many bits can be safely embedded in a given image without introducing any statistically detectable artifacts. Determining this maximal safe bit-rate (or steganographic capacity) is a non trivial task even for the simplest methods. Choice of Cover-Image The choice of cover-images is important because it significantly influences the design of the stego-system and its security. Images with a low number of colors, computer art, images with a unique semantic content, such as fonts, should be avoided. grayscale images are considered the best cover-images. uncompressed scans of photographs or images obtained with a digital camera containing a high number of colors, can be considers to be safest for steganography. Choice of Image format @The choice of the image format also makes a very big impact on the design of a secure steganographic system. Raw, uncompressed formats, such as BMP, provide the biggest space for secure steganography, but their obvious redundancy makes them very suspicious. @Fridrich et al. have recently shown that cover-images stored in the JPEG format are a very poor choice for steganographic methods that work in the spatial domain. @Consequently, one should avoid using decompressed JPEG images as covers for spatial steganographic methods, such as the LSB embedding or its variants. JPEG is the chosen one! The JPEG format attracted the attention of researchers as the main steganographic format due to the following reasons: It is the most common format for storing images, JPEG images are very abundant and they are almost solely used for storing natural images. Modern steganographic methods can also provide reasonable capacity without necessarily sacrificing security. Pfitzmann and Westfeld proposed the F5 algorithm as an example of a secure but high capacity JPEG steganography. The authors presented the F5 algorithm as a challenge to the scientific community at the Fourth Information Hiding Workshop in Pittsburgh in 2001 Digital Image Bitmap Image Black colors are represented by 1 White colors are represented by 0 Image representation Color Image RGB Image: Numbers representing Images !! Data Hiding So how do we actually hide data in to Digital Image? •We can hide in the Pixel Value •We can hide in the Coefficient LSB and how to get it? LSB: 3 modulus 2 = 1 4 modulus 2 = 0 LSB: Odd Values = 1 Even Values= 0 JSteg Method U. Derek. "Jsteg Staganographic Method,” Jsteg Jsteg Algorithm LSB method is Secure? •LSB modification methods are not secure. •Secret message is easily retrievable The steganographic tool Jsteg embeds messages in lossy compressed JPEG files. It has a high capacity—e. g., 12 % of the steganogram’s size—and, it is immune against visual attacks. However, a statistical attack discovers changes made by Jsteg F5 Method A. Westfeld, "F5: A steganographic algorithm: High capacity despite better steganalysis," in Lecture Notes in Computer Science, vol. 2137, pp. 289-302, (2001) F5 Method (cont..2/2) 1 0 1 H 0 1 1 x 2 3 4 m 1 1 5 mod 2 1 H * x'm' 6 0 x" 3 3 4 m H * x" 7 mod 2 1 m' 7 1 Steps of F5 Algorithm 1. Get the RGB representation of the input image. 2. Calculate the quantization table corresponding to quality factor Q and compress the image while storing the quantized DCT coefficients. 3. Compute the estimated capacity with no matrix embedding C = hDCT – hDCT /64 – h(0) – h(1) + 0.49h(1), where hDCT is the number of all DCT coefficients, h(0) is the number of AC DCT coefficients equal to zero, h(1) is the number of AC DCT coefficients with absolute value 1, hDCT/64 is the number of DC coefficients, and –h(1)+0.49h(1) = –0.51h(1) is the estimated loss due to shrinkage (see Step 5). The parameter C and the message length together determine the best matrix embedding. 4. The user-specified password is used to generate a seed for a PRNG that determines the random walk for embedding the message bits. The PRNG is also used to generate a pseudo-random bit-stream that is XOR-ed with the message to make it a randomized bit-stream. During the embedding, DC coefficients and coefficients equal to zero are skipped. 5. The message is divided into segments of k bits that are embedded into a group of 2k–1 coefficients along the random walk. If the hash of that group does not match the message bits, the absolute value of one of the coefficients in the group is decreased by one to obtain a match. If the coefficient becomes zero, the event is called shrinkage, and the same k message bits are re-embedded in the next group of DCT coefficients (we note that LSB(d)= d mod 2, for d > 0, and LSB(d)=1– d mod 2, for d < 0). 6. If the message size fits the estimated capacity, the embedding proceeds, otherwise an error message showing the maximal possible length is displayed. There are rare cases when the capacity estimation is wrong due to a larger than anticipated shrinkage. In those cases, the program embeds as much as possible and displays a warning. Take Home Information @ we assume that the steganographic method is publicly known with the exception of a secret key. The method is secure if the stego-images do not contain any detectable artifacts due to message embedding. In other words, the set of stego-images should have the same statistical properties as the set of cover-images. @ If there exists an algorithm that can guess whether or not a given image contains a secret message with a success rate better than random guessing, the steganographic system is considered broken. Conclusion Digital data hiding is the research topic of the time. Existing algorithms are not secured. Protocol is known but key is unknown. F5 has High security due to matrix embedding F5 has Less modification but less vulnerability F5 has high capacity and better steganalysis F5, is probably one of the most advanced programs publicly available, it uses methods to compensate for the introduced changes so statistical analysis is difficult.Yet it is considered broken. Demonstration of StegoMagic. Discussion