Secure Steganography using the Most Significant Bits

advertisement
Secure Steganography using the Most Significant Bits
George Hamer
Computer Science Department
South Dakota State University
Brookings, SD 57006 USA
George.Hamer@sdstate.edu
William Perrizo
Computer Science Department
North Dakota State University
Fargo, ND 58102 USA
William.Perrizo@ndsu.nodak.edu
Abstract
The goal of steganography is to embed a hidden message in a cover. Typical covers are
audio and image files and use the least significant bits to hide the message. Steganalysis
is the classification of files as stego-bearing or not. Many techniques exist using
statistical analysis to classify files and rely on the fact that hidden messages tend to be
hidden in the low order bits and this upsets the distribution of ones and zeros in a
detectable way. This paper will explore using the most significant bits for hiding the
message without changing the cover file and changing the distribution of bits. The
authors explore the maximum length of a message that can be found in the high order bits
using a masking technique. Instead of changing bits in a message we mark the location of
bits that make up the hidden message and transmit the mask as a separate file from the
cover. The paper uses BMP image files to find the message and then coverts the images
to JPEG for transmission. The receiver will then covert the file back to BMP format and
extract the message. Small messages of up to approximately 2000 characters can easily
be found and recovered using this method and will not be detected using standard
steganalysis techniques.
Keywords
Steganography, Steganalysis, Image processing
1. Introduction
The word Steganography, commonly abbreviated stego, is derived from the Greek words
steganos (covered) and graphy (writing) [Johns1]. The goal of stego is to hide a message
that will not draw the attention of someone examining the cover message. The most
common carrier files are images, such as jpegs, and audio files, such as mp3s. We will
look at some common programs for hiding cover messages in section 2.
Steganalysis techniques tend to be statistical in nature and they attempt to uncover and
make useless the hidden message [Johns2]. Again a brief survey of Steganalysis
techniques will be presented in section 3.
In most stego techniques an attempt is made to hide the message in low order bits of the
cover image or through the manipulation of coefficients of the file. In section 4 we will
present a new algorithm that looks for the message in the high order bits of the cover
image. We will use jpeg images that are downloaded from the eBay (www.ebay.com)
website and develop a masking technique to locate a message up to approximately 2000
characters in length.
Section 5 will present experimental results of the proposed algorithm as implemented and
future work will be laid out in section 6.
2. Steganography
The use of stego can be traced back to the ancient Greek’s use of wax covered tablets that
hid messages under the wax. During World War II the use of Steganography involved
invisible inks and micro dots. More recently the spacing in text documents has been
altered to represent the message [Brassil]. Current research is focused on using audio and
picture files to hide the message. The rest of this sections discussion will concentrate on
using image files as the cover message.
The simplest technique involves using the least significant bit (LSB) in a sample. Most
digital files use an eight bit sample and the LSB is then used to encode one bit of the
cover message. An 800 x 600 pixel 24 bit color bitmap image would yield 1,440,00 bits
available for hiding the message which is 180,000 characters. S-Tools is a Windows
based freeware program that hides the message in the LSB of a bmp file [Brown].
The use of bitmap images on the Internet today is rare due to their large size and
compressed images in the form of jpegs are more commonly used. The jpeg format is a
lossy compression algorithm. Jpeg uses the discrete cosine transform (DCT) compression
scheme to store image data. Since DCT removes some info from the file storing the
message in the LSB will most likely result in loss of the message.
To overcome this problem instead of storing the message in the data the message is
instead stored using the compression coefficients. To store a one the coefficient is
rounded up and conversely rounded down to store a zero. The J-Steg program is a
freeware implementation of this technique [JSteg]. The limitation of using the jpeg
format is the reduced hiding capability. According to work done by Jonathan Watkins a
500 Kb jpeg image has the capacity for a 30 Kb message [Wat].
A more complete list of stego tools and techniques can be found at
http://www.jjtc.com/Security/stegtools.htm.
3. Steganalysis
As mentioned earlier Steganalysis is an attempt to discover and render useless a hidden
message. As an example, one of the easiest ways to defeat LSB stego is to convert the
original bmp file to the jpeg format. When using a lossy compression method most of the
least significant bits will be removed as redundant and therefore the message is removed.
When the file is then converted back to a bitmap the original info will be missing.
Early work in this area was performed by Sushil Jajoda, Andreas Pfitzmann, Niels Provos
and Andreas Westfield [West] [Johns2] [Pro]. There are three basic approaches to
detecting hidden messages:



Visual – Since many methods remove part of the image and replace it with the
message a well trained set of human eyes can in many cases detect that an image
has been altered.
Structural – The format of a data file will often change as hidden information is
added resulting in a detectable pattern.
Statistical – Patterns in the pixels and their least significant bits can reveal the
existence of a message.
Visual methods can involve examining the image by eye or by using programs to extract
and examine individual bit planes of the image. In most images that are free from
embedded data the LSB plane will still show a outline of the original data and when the
LSB has higher amounts of data embedded this will reduce to random noise.
The drawback to visual methods is the difficulty in automating the process, a good set of
human eyes is still needed to examine the results. A second shortcoming is the time it
takes to train a good set of eyes.
Structural methods rely on the actual methods used by the software used to create the
cover message. Most will manipulate the image in know ways and can be detected by
checking an image file for these effects. This is most common in some of the early stego
programs.
Statistical methods rely on the fact that a hidden message will appear to be more random
than the data that it replaces. The simplest statistical test is the χ2 (chi-squared) test. This
can be used to determine the number of times the LSB is 1 (or 0). Low scores indicate the
presence of embedded data and high scores will show that the message has not been
altered.
Jessica Fridrich and Miroslav Goljan of SUNY Binghamton state that the battle between
Steganography and Steganalysis is never-ending [Fri]. As new steganographic techniques
are developed newer more sophisticated analysis techniques will be needed to detect the
hidden message. Future work in this area will lead to developing the “safe” carrying
capacity of steganographic methods.
More recently work by Davidson and Paul [Dav] used techniques from data mining to
locate hidden messages using outlier detection. Their work shows that even small
messages are identifiable in a jpeg document using LSB embedding.
4. Hiding data using Most Significant Data
In a file using the 24 bit bmp format of size 800 by 600 pixels there are in the image
pixels a total of 3 bands of data each of 8 bits in width. For the previously mentioned size
this yields a total of 3*8*800*600 or 11,530,000 bits. When this file is converted to a
jpeg file the low order bits are removed. It can be seen that even using the most
significant bit of a single band will provide for 480,000 total bits. A message of 2000
characters will have a total of 16,000 bits which is much less than the amount of possible
data bits.
The proposed technique will mask the location of message bits in this data field and
create a mask similar to a one-time pad which has been shown to be impossible to break
[Blak]. The image will be processed in a column wise fashion since it is believed that
there will be more variation in colors in the vertical as opposed to the horizontal
processing.
Standard photographic techniques [] tell the photographer to divide his image into three
horizontal bands with the major portion of the image being located in the center band.
Our tests will use photographs of automobiles that are posted for sale on the ebay auction
web site. The photos will have the automobile in this middle third and the ground on the
bottom and the sky in the upper band. Processing in a horizontal fashion will have less
variation than vertical processing and since the hidden message will change rapidly we
need a more rapid change in bits in the message.
In order for the process to work both sides will copy an agreed upon image from the ebay
website or any of the numerous sites in the Internet where images may be posted. T
The algorithm is as follows:
1. Extract the most significant bit of the red band of the bmp image and store this in a bit
vector by processing one column at a time until all columns have been used.
2. Initialize a 2000 byte (16,000 bit) array with the message to mask out
3. Initialize a mask to all zeros of sufficient length to mask the message
4. for each character in the hidden message
for each bit in the character
extract a bit
scan forward in the image vector until a bit that matches is found
set this bit in the mask to one
5. Transmit the mask only to the receiver
To extract the message the receiver will use the mask and run the previous algorithm
using the mask to extract the appropriate bits from the image vector. Both sides will have
to agree before the fact on which images to use and without this information a person
intercepting the message will not be able to read the message.
5. Experimental Results
For this preliminary work it was decided arbitrarily to use the red band of the images and
all images were collected over a period of one week from ebay. All images collected
were either of size 800 x 600 or 640 x 480. The data to be masked were the lyrics from
the nine songs from the Rolling Stone’s album “Let It Bleed”. Each song lyric file was
between 628 and 2080 characters.
Figure 1 shows summarized totals for 800 x 600 images while figure 2 shows totals for
600x400 images.
File name: 800-1.bmp
Data File
Number of characters in
message
1
628
2
1026
3
1436
4
1111
5
637
6
1885
7
794
8
2080
9
730
Average
1147
Number of bits examined in image
file
17027
27412
38196
29384
17085
51284
21274
57402
20086
31017
Figure 1
File name: 640-1.bmp
Data File
Number of characters in
message
1
628
2
1026
3
1436
4
1111
5
637
6
1885
7
794
8
2080
9
730
Average
1147
Number of bits examined in image
file
26598
42148
58383
45374
26157
76480
32685
84320
30018
46907
Figure 2
Will need more here on experimental results
6. Conclusions and Future Work
As a proof of concept it can be seen from the experimental results that there is sufficient
search space in images of size 640 x 480 and 800 x 600 in order to mask a message of up
to 2000 characters.
Future work will attempt to discover a relationship between image type and the data
holding capacity. These image types can be of different items on the eBay website in an
attempt to find image types that will minimize the size of the mask that will be needed to
be transferred between users.
Density measurements on the density of data points in the full Cartesian product of
feature domains will be examined as guides to focused density-based steganography.
These techniques should allow a dramatic increase in the amount o information that can
be effectively hidden in a cover. Horizontal data structuring will be examined as a means
of reducing the high cost of these kinds of techniques.
Again more needs to be added here.
7. References
[Agr1]
Rakesh Agrawal, Jerry Kiernan, “Watermarking relational databases.”
Proceedings of the 28th International Conference on Very Large Databases
VLDB, 2002
[Agr2]
Rakesh Agrawal, Peter J. Hass, Jerry Kiernan, “A System for Watermarking
Relational Databases.” SIGMOD 2003, San Diego, California USA 2003.
[Blak}
Blakley, G., “One Time Pads are Key Safeguarding Schemes, Not
Cryptosystems: Fast Key Safeguarding Schemes (Threshold Schemes) Exist,”
Procedings of the 1980 IEEE Symposium on Security and Privacy, pp. 108-113,
Apr. 1980
[Brassil] Brassil, J., Low, S., Maxemchuk, N. and O'Gorman, L. “Document Marking
and Identification using both Line and Word Shifting”. Technical report, AT&T
Bell Laboratories, 1994
[Brown] Brown, Andy. http://www.webattack.com/download/dlstools.shtml
[Dav]
Davidson, Ian and Paul, Goutam, “Locating Secret Messages in Images”,
KDD’04, August 22-25, 2004, Seattle, Washington, USA
[Fri]
Fridrich, Jessica and Goljan, Miroslav, “Practical Steganalysis – State of the
Art”, Proc. SPIE Photonics West, Vol. 4675, Electronic Imaging 2002, Security
and Watermarking of Multimedia Contents, San Jose, California, January, 2002,
pp. 1-13.
[Johns1] Johnson, Neil F. & Jajoda, Sushil “Exploring Steganography: Seeing the
Unseen”, IEEE Computer, vol. 31, no. 2, pp26-34, Feb 1998
[Johns2] Johnson, Neil F. & Jajoda, Sushil “Steganalysis: The Investigation of Hidden
Information”, IEEE Information Technology Conference, Syracuse, New York,
USA, September 1998
[JSteg] http://www.securityfocus.com/tools/1434
[Pet]
Petitcolas, Fabien,
http://www.petitcolas.net/fabien/steganography/image_downgrading/
[Pro]
Provos, Niels, “Defending Against Statistical Steganalysis”, Procedings of the
10th USENIX Security Symposium, 2001
[Wat]
Watkins, Jonathan. “Steganography – Messages Hidden in Bits”,
http://citeseer.ist.psu.edu/555992.html
[West]
Westfield, Andreas & Pfitzmann, Andreas, “Attacks on Steganographic
Systems”, Information Hiding, Third International Workshop, Germany, 1999
Download