Secure Steganography using the Most Significant Bits George Hamer Computer Science Department South Dakota State University Brookings, SD 57006 USA George.Hamer@sdstate.edu William Perrizo Computer Science Department North Dakota State University Fargo, ND 58102 USA William.Perrizo@ndsu.nodak.edu Abstract The goal of steganography is to embed a hidden message in a cover. Typical covers are audio and image files and use the least significant bits to hide the message. Steganalysis is the classification of files as stego-bearing or not. Many techniques exist using statistical analysis to classify files and rely on the fact that hidden messages tend to be hidden in the low order bits and this upsets the distribution of ones and zeros in a detectable way. This paper will explore using the most significant bits for hiding the message without changing the cover file and changing the distribution of bits. The authors explore the maximum length of a message that can be found in the high order bits using a masking technique. Instead of changing bits in a message we mark the location of bits that make up the hidden message and transmit the mask as a separate file from the cover. The paper uses BMP image files to find the message and then coverts the images to JPEG for transmission. The receiver will then covert the file back to BMP format and extract the message. Small messages of up to approximately 2000 characters can easily be found and recovered using this method and will not be detected using standard steganalysis techniques. Keywords Steganography, Steganalysis, Image processing 1. Introduction The word Steganography, commonly abbreviated stego, is derived from the Greek words steganos (covered) and graphy (writing) [Johns1]. The goal of stego is to hide a message that will not draw the attention of someone examining the cover message. The most common carrier files are images, such as jpegs, and audio files, such as mp3s. We will look at some common programs for hiding cover messages in section 2. Steganalysis techniques tend to be statistical in nature and they attempt to uncover and make useless the hidden message [Johns2]. Again a brief survey of Steganalysis techniques will be presented in section 3. In most stego techniques an attempt is made to hide the message in low order bits of the cover image or through the manipulation of coefficients of the file. In section 4 we will present a new algorithm that looks for the message in the high order bits of the cover image. We will use jpeg images that are downloaded from the eBay (www.ebay.com) website and develop a masking technique to locate a message up to approximately 2000 characters in length. Section 5 will present experimental results of the proposed algorithm as implemented and future work will be laid out in section 6. 2. Steganography The use of stego can be traced back to the ancient Greek’s use of wax covered tablets that hid messages under the wax. During World War II the use of Steganography involved invisible inks and micro dots. More recently the spacing in text documents has been altered to represent the message [Brassil]. Current research is focused on using audio and picture files to hide the message. The rest of this sections discussion will concentrate on using image files as the cover message. The simplest technique involves using the least significant bit (LSB) in a sample. Most digital files use an eight bit sample and the LSB is then used to encode one bit of the cover message. An 800 x 600 pixel 24 bit color bitmap image would yield 1,440,00 bits available for hiding the message which is 180,000 characters. S-Tools is a Windows based freeware program that hides the message in the LSB of a bmp file [Brown]. The use of bitmap images on the Internet today is rare due to their large size and compressed images in the form of jpegs are more commonly used. The jpeg format is a lossy compression algorithm. Jpeg uses the discrete cosine transform (DCT) compression scheme to store image data. Since DCT removes some info from the file storing the message in the LSB will most likely result in loss of the message. To overcome this problem instead of storing the message in the data the message is instead stored using the compression coefficients. To store a one the coefficient is rounded up and conversely rounded down to store a zero. The J-Steg program is a freeware implementation of this technique [JSteg]. The limitation of using the jpeg format is the reduced hiding capability. According to work done by Jonathan Watkins a 500 Kb jpeg image has the capacity for a 30 Kb message [Wat]. A more complete list of stego tools and techniques can be found at http://www.jjtc.com/Security/stegtools.htm. 3. Steganalysis As mentioned earlier Steganalysis is an attempt to discover and render useless a hidden message. As an example, one of the easiest ways to defeat LSB stego is to convert the original bmp file to the jpeg format. When using a lossy compression method most of the least significant bits will be removed as redundant and therefore the message is removed. When the file is then converted back to a bitmap the original info will be missing. Early work in this area was performed by Sushil Jajoda, Andreas Pfitzmann, Niels Provos and Andreas Westfield [West] [Johns2] [Pro]. There are three basic approaches to detecting hidden messages: Visual – Since many methods remove part of the image and replace it with the message a well trained set of human eyes can in many cases detect that an image has been altered. Structural – The format of a data file will often change as hidden information is added resulting in a detectable pattern. Statistical – Patterns in the pixels and their least significant bits can reveal the existence of a message. Visual methods can involve examining the image by eye or by using programs to extract and examine individual bit planes of the image. In most images that are free from embedded data the LSB plane will still show a outline of the original data and when the LSB has higher amounts of data embedded this will reduce to random noise. The drawback to visual methods is the difficulty in automating the process, a good set of human eyes is still needed to examine the results. A second shortcoming is the time it takes to train a good set of eyes. Structural methods rely on the actual methods used by the software used to create the cover message. Most will manipulate the image in know ways and can be detected by checking an image file for these effects. This is most common in some of the early stego programs. Statistical methods rely on the fact that a hidden message will appear to be more random than the data that it replaces. The simplest statistical test is the χ2 (chi-squared) test. This can be used to determine the number of times the LSB is 1 (or 0). Low scores indicate the presence of embedded data and high scores will show that the message has not been altered. Jessica Fridrich and Miroslav Goljan of SUNY Binghamton state that the battle between Steganography and Steganalysis is never-ending [Fri]. As new steganographic techniques are developed newer more sophisticated analysis techniques will be needed to detect the hidden message. Future work in this area will lead to developing the “safe” carrying capacity of steganographic methods. More recently work by Davidson and Paul [Dav] used techniques from data mining to locate hidden messages using outlier detection. Their work shows that even small messages are identifiable in a jpeg document using LSB embedding. 4. Hiding data using Most Significant Data In a file using the 24 bit bmp format of size 800 by 600 pixels there are in the image pixels a total of 3 bands of data each of 8 bits in width. For the previously mentioned size this yields a total of 3*8*800*600 or 11,530,000 bits. When this file is converted to a jpeg file the low order bits are removed. It can be seen that even using the most significant bit of a single band will provide for 480,000 total bits. A message of 2000 characters will have a total of 16,000 bits which is much less than the amount of possible data bits. The proposed technique will mask the location of message bits in this data field and create a mask similar to a one-time pad which has been shown to be impossible to break [Blak]. The image will be processed in a column wise fashion since it is believed that there will be more variation in colors in the vertical as opposed to the horizontal processing. Standard photographic techniques [] tell the photographer to divide his image into three horizontal bands with the major portion of the image being located in the center band. Our tests will use photographs of automobiles that are posted for sale on the ebay auction web site. The photos will have the automobile in this middle third and the ground on the bottom and the sky in the upper band. Processing in a horizontal fashion will have less variation than vertical processing and since the hidden message will change rapidly we need a more rapid change in bits in the message. In order for the process to work both sides will copy an agreed upon image from the ebay website or any of the numerous sites in the Internet where images may be posted. T The algorithm is as follows: 1. Extract the most significant bit of the red band of the bmp image and store this in a bit vector by processing one column at a time until all columns have been used. 2. Initialize a 2000 byte (16,000 bit) array with the message to mask out 3. Initialize a mask to all zeros of sufficient length to mask the message 4. for each character in the hidden message for each bit in the character extract a bit scan forward in the image vector until a bit that matches is found set this bit in the mask to one 5. Transmit the mask only to the receiver To extract the message the receiver will use the mask and run the previous algorithm using the mask to extract the appropriate bits from the image vector. Both sides will have to agree before the fact on which images to use and without this information a person intercepting the message will not be able to read the message. 5. Experimental Results For this preliminary work it was decided arbitrarily to use the red band of the images and all images were collected over a period of one week from ebay. All images collected were either of size 800 x 600 or 640 x 480. The data to be masked were the lyrics from the nine songs from the Rolling Stone’s album “Let It Bleed”. Each song lyric file was between 628 and 2080 characters. Figure 1 shows summarized totals for 800 x 600 images while figure 2 shows totals for 600x400 images. File name: 800-1.bmp Data File Number of characters in message 1 628 2 1026 3 1436 4 1111 5 637 6 1885 7 794 8 2080 9 730 Average 1147 Number of bits examined in image file 17027 27412 38196 29384 17085 51284 21274 57402 20086 31017 Figure 1 File name: 640-1.bmp Data File Number of characters in message 1 628 2 1026 3 1436 4 1111 5 637 6 1885 7 794 8 2080 9 730 Average 1147 Number of bits examined in image file 26598 42148 58383 45374 26157 76480 32685 84320 30018 46907 Figure 2 Will need more here on experimental results 6. Conclusions and Future Work As a proof of concept it can be seen from the experimental results that there is sufficient search space in images of size 640 x 480 and 800 x 600 in order to mask a message of up to 2000 characters. Future work will attempt to discover a relationship between image type and the data holding capacity. These image types can be of different items on the eBay website in an attempt to find image types that will minimize the size of the mask that will be needed to be transferred between users. Density measurements on the density of data points in the full Cartesian product of feature domains will be examined as guides to focused density-based steganography. These techniques should allow a dramatic increase in the amount o information that can be effectively hidden in a cover. Horizontal data structuring will be examined as a means of reducing the high cost of these kinds of techniques. Again more needs to be added here. 7. References [Agr1] Rakesh Agrawal, Jerry Kiernan, “Watermarking relational databases.” Proceedings of the 28th International Conference on Very Large Databases VLDB, 2002 [Agr2] Rakesh Agrawal, Peter J. Hass, Jerry Kiernan, “A System for Watermarking Relational Databases.” SIGMOD 2003, San Diego, California USA 2003. [Blak} Blakley, G., “One Time Pads are Key Safeguarding Schemes, Not Cryptosystems: Fast Key Safeguarding Schemes (Threshold Schemes) Exist,” Procedings of the 1980 IEEE Symposium on Security and Privacy, pp. 108-113, Apr. 1980 [Brassil] Brassil, J., Low, S., Maxemchuk, N. and O'Gorman, L. “Document Marking and Identification using both Line and Word Shifting”. Technical report, AT&T Bell Laboratories, 1994 [Brown] Brown, Andy. http://www.webattack.com/download/dlstools.shtml [Dav] Davidson, Ian and Paul, Goutam, “Locating Secret Messages in Images”, KDD’04, August 22-25, 2004, Seattle, Washington, USA [Fri] Fridrich, Jessica and Goljan, Miroslav, “Practical Steganalysis – State of the Art”, Proc. SPIE Photonics West, Vol. 4675, Electronic Imaging 2002, Security and Watermarking of Multimedia Contents, San Jose, California, January, 2002, pp. 1-13. [Johns1] Johnson, Neil F. & Jajoda, Sushil “Exploring Steganography: Seeing the Unseen”, IEEE Computer, vol. 31, no. 2, pp26-34, Feb 1998 [Johns2] Johnson, Neil F. & Jajoda, Sushil “Steganalysis: The Investigation of Hidden Information”, IEEE Information Technology Conference, Syracuse, New York, USA, September 1998 [JSteg] http://www.securityfocus.com/tools/1434 [Pet] Petitcolas, Fabien, http://www.petitcolas.net/fabien/steganography/image_downgrading/ [Pro] Provos, Niels, “Defending Against Statistical Steganalysis”, Procedings of the 10th USENIX Security Symposium, 2001 [Wat] Watkins, Jonathan. “Steganography – Messages Hidden in Bits”, http://citeseer.ist.psu.edu/555992.html [West] Westfield, Andreas & Pfitzmann, Andreas, “Attacks on Steganographic Systems”, Information Hiding, Third International Workshop, Germany, 1999