Digital Steganography: A Symmetric Key Algorithm By: Joshua C. Clark

advertisement
Digital Steganography:
A Symmetric Key Algorithm
By: Joshua C. Clark
Cryptography is the science of passing information between groups such that 3rd parties
are unable to comprehend the information. Cryptography has been widely practiced as far back
as ancient Egypt and Greece with early forms including imbedding messages utilizing nontraditional hieroglyphic characters and hiding secrets under wax coatings on stone tablets.
Cryptography has since evolved to utilize technology and computer software with the digital
advancements of the 21st century. The purpose of my paper is to discuss the methodology and
applications of Digital Steganography as well as to propose an algorithm that offers greater
information security than existing freeware programs.
Prior to defining steganography and offering early, non-digital based, examples, it is
necessary to address the goals of a cryptographic scheme. In cryptography, Alice and Bob are
two parties trying to communicate while Eve and Malory are separate groups attempting to
eavesdrop and tamper with the communications.i This leads to the objectives of an effective
cryptography system: Authenticity, Privacy/Confidentiality, Integrity, and Non-repudiation.ii
These requirements are achieved by utilizing secret key (symmetric) or public-key (asymmetric)
encryption processes followed by a correlated decryption process. In the example of Alice and
Bob, Alice would encrypt her plaintext message intended for Bob using either a public-key or
private key system and transfer the resulting ciphertext. Upon receiving the ciphertext, Bob
would apply either the corresponding public or private key to decipher the message and obtain
the original plaintext. The figures below provide examples of public-key (left) and private-key
(right) encryption/decryption schemes. In evaluating a cryptographic scheme, one must assume
that Eve and Malory have the ciphertext and know the process by which the data was
encrypted.iii Therefore, it is critical that private keys remain secret and out of the possession of
3rd parties, otherwise the original plaintext is easily obtained.
iv
Steganography is a branch of cryptography that involves hiding information “in plain
sight”. While cryptography strives to make messages unreadable by 3rd parties, steganography
attempts to hide and obscure the data from unsuspecting parties.v Early forms of steganography
included writing messages on the shaven heads of Greek slaves and allowing hair to grow over
the message. Another common steganographic scheme, popular to children, is the use of
invisible ink (lemon based ink) that can be recovered by applying heat to the source.vi With the
advancement of technology came the adaption of steganography to the digital realm. Today,
digital steganography is highly applicable in digital watermarking, the process of applying a
watermark to a digital image, audio file, logo, and several other forms of digital data. Similar to
traditional watermarks, a digital watermark allows one to mark intellectual property that is
presented in a digital format. Below is an example of digital watermarking. The first figure is a
photograph of Professor Joyner obtained from his webpage. The second image is the result of
reading the image into a hexadecimal editor (Frhed). The mathematics behind this application
will come in following paragraphs. For now, observe the far right columns of the hex editor.
The digital watermark (Kodak Company.DC210 Zoom v03.10) should be easily recognizable
amidst the rest of the hexadecimal output for the image.
This highlights the concept of imbedding data within an image that is undetectable to simple
observation. Though steganography can be applied across a spectrum of digital data, my project
focuses on imbedding data within images. The upcoming paragraphs will address the storage of
images on a computer followed by an analysis of existing steganographic freeware and conclude
with my developed algorithm.
Images are arranged in a pixel format which can be determined by right-clicking on an
image and selecting the properties tab. The pixel count is listed similar to the dimensions of a
matrix. In the case of an MxN pixel image, each index of a matrix represents an individual pixel
for the image. An image can be read into a computational program such as MATLAB, which
will convert the image into a matrix. The specifics of uploading an image into MATLAB will be
addressed later, simply note that an image can be represented by a matrix. The matrix has the
same row and column dimensions as the image’s pixel format (MxN). However, the matrix
gains a third depth dimension with fixed value 3, so the matrix that corresponds to an image will
have dimensions MxNx3. This is due to images being displayed with three color bands: red,
green and blue (RGB). All colors are a combination of these three fundamental colors, so each
index for a specific color band represents the intensity of that fundamental color.
Now that the connection between images and matrices has been explained, I will continue
with specifics of how computers store data. All information on a computer is stored in binary
format as either 1’s or 0’s. A bit of data refers to a single 1 or 0 and a byte is a combination of 8
bits. A byte is generally split in half as two 4 bit pieces to allow conversion to another standard
base, hexadecimal. Though a computer does not store information in base hex, it does quick
conversions between decimal storage (how information is input into a computer) to hex and then
to binary. Decimal based information can be converted to hexadecimal with an ASCII
conversion table. In this conversion, each decimal representation is converted into a pair of
hexadecimal characters, which range from 0-F. Recall that a byte is split in half as two 4 bit
pieces, which means there is a total of 24 (16) distinct options for each half byte of data, or 28
(256) for an entire byte. Because base hex has a total of 16 options for each entry (0-F), 4 bits of
data convert to a single hex character and a pair of hex characters convert to a whole byte. See
below for a demonstrative conversion from a character input to binary output.
Character Input: ‘H’
ASCII Conversion: ‘48’
Split Hex Characters: ‘4’ AND ‘8’
Convert Hex to Binary: ‘0100’ AND ‘1000’
Combine for Binary Output: ‘01001000’
Therefore, the letter ‘H’ is stored on the computer as ‘01001000’.
Having addressed the concept of data storage in binary on a computer, I will revisit the concept
of an image being stored as a matrix. Each pixel of an image is stored on the computer in a byte
of data. This means there are 28 combinations to represent the red, green, and blue color
intensities for an image. It should make sense, then, that the value of a given pixel can range
anywhere from 0 to 255 (for a total of 28 =256 values).
Now I will begin describing the process by which information is imbedded into an image.
Altering the last bit in a pixel would alter the color of an image at most (1/256)th for that pixel.
Altering the last bit of each pixel in an image (switching from 1 to 0 or vice-versa) is completely
undetectable to the human eye. This process is referred to LSB imbedding, in which data is
hidden in the Least Significant Bits of an image. Consider the following example of imbedding
the letter ‘F’ into a 4x2 arrangement of arbitrarily selected binary pixels.
‘F’ Binary Conversion:
Pixels:
10110001
01110010
Hexadecimal- ‘46’
11101110
10110110
4 bit Binary- ‘0100’ AND ‘0110’
11111111
11001100
Binary Byte- ‘01000110’
00000000
01100111
Working from left to right of the pixel matrix, I will ensure the least significant bit (LSB) of the
Binary representation of ‘F’ is imbedded.
Output Pixel Matrix:
10110000
01110011
11101110
10110110
11111110
11001101
00000001
01100110
The pixels now have the letter ‘F’ imbedded into their least significant bits. As seen above, it is
not necessary to change the LSB for every byte, only those that do not match the corresponding
bits for the message being imbedded.
Next, I will discuss a freeware program, EasyBMP, which is publically available and
easily hides messages within images. While the LSB example from above simply imbedded the
message by matching the LSB for each byte with the corresponding bit from the message, the
EasyBMP program utilizes modular arithmetic to imbed in the LSB. Below is the algorithm
used by EasyBMP to imbed data (obtained from the EasyBMP webpage):
Let N (a byte) be a number from 0 to 255
Let r1,g1,b1,a1, r2,g2,b2,a2 E {0,1}
N = r1 + g1 2 + b1 22 + a1 23 + r2 24 + g2 25 + b2 26 + a2 27
If (R1,G1,B1,A1) and (R2,G2,B2,A2) are adjacent pixels to be overwritten, then the new
pixels are obtained by:
(R1,G1,B1,A1) – (R1,G1,B1,A1)%2 + (r1,b1,g1,a1)
(R2,G2,B2,A2) – (R2,G2,B2,A2)%2 + (r2,b2,g2,a2)
Using the example of two image pixels of (255,255,255,0) and (255,255,255,0) and a message of
N=18:
N = 18
r1 = N % 2 = 18 % 2 = 0
T = r1 = 0
g1 = (N - T)/2 % 2 = (18 - 0)/2 % 2 = 9 % 2 = 1
T = T + 2 * g1 = 0 + 2*1 = 2
b1 = (N - T)/4 % 2 = (18 - 2)/4 % 2 = 4 % 2 = 0
T = T + 4 * b1 = 2 + 4*0 = 2
a1 = (N - T)/8 % 2 = (18 - 2)/8 % 2 = 2 % 2 = 0
T = T + 8 * a1 = 2 + 8*0 = 2
r2 = (N - T)/16 % 2 = (18 - 2)/8 % 2 = 1 % 2 = 1
T = T + 16 * r2 = 2 + 16*1 = 18
g2 = (N - T)/32 % 2 = (18 - 18)/2 % 2 = 0 % 2 = 0
T = T + 32 * g2 = 18 + 2*0 = 18
b2 = (N - T)/64 % 2 = (18 - 18)/64 % 2 = 0 % 2 = 0
T = T + 64 * b2 = 18 + 4*0 = 18
a2 = (N - T)/128 % 2 = (18-18)/128 % 2 = 0 % 2 = 0
Therefore, the new pixels are:
(255,255,255,0) - (255,255,255,0) % 2 + (0,1,0,0)
= (255,255,255,0) - (1,1,1,0) + (0,1,0,0)
= (254,255,254,0)
(255,255,255,0) - (255,255,255,0) % 2 + (1,0,0,0)
= (255,255,255,0) - (1,1,1,0) + (1,0,0,0)
= (255,254,254,0)
vii
The ai’s represent alpha channels, which are insignificant to the imbedding of messages for any
of the numerical or picture examples in this paper. They were included in the above algorithm to
ensure the message was properly written into the LSB of the RGB bytes for the adjacent pixel
example.
Understanding the modular mathematics method by which EasyBMP imbeds a message
into an image, I will explain how the program chooses pixels for imbedding. The program uses
an extremely simple method for selecting pixels, it starts in the upper left corner and works from
left to right by row. In other words, the EasyBMP program views the image as an MxN matrix
and imbeds in each column entry (Ni) for a given row (Mj) and progresses to the next row (Mj+1)
after all column entries (Ni’s) have been imbedded for the given row (Mj). The issue with this
method of imbedding is the susceptibility to detection given an image of solid color. Below is an
example of a pure white source image and the output image after using the Paint Bucket option
in the program Paint:
Placing thick boarders around the two pictures, it is easy to see in the second picture that the
imbedded message lies at the top of the picture. Knowing this, it is easy to extract the hidden
message by pulling the LSB for each pixel and compiling the binary back into ASCII character
output to obtain the hidden message.
After analyzing the methodology by which EasyBMP imbeds a message in images, it
should be clear that though the message is hidden from plain sight, an analyst with experience in
digital steganography could easily extract the message. This is primarily due to the simple
indexing procedure of imbedding the message from left to right and top to bottom. Because
EasyBMP has no layer of randomness or security in the selection of pixels for the encryption, it
is easily susceptible to attacks. This was the primary goal of my algorithm, to construct a
program that reads an image and requires the user to input a message to be hidden in the image.
Unlike EasyBMP, I constructed my code such that there is a layer of randomness which provides
security to the cryptographic scheme. Referring back to the example of Alice and Bob, the code
by which the image is encrypted is no longer required to remain secret from attackers. Rather,
only the key for selecting pixels must remain secret to maintain a secure system. Therefore,
when considering the following steganographic algorithm, compare it to a symmetric scheme in
which both the sender (Alice) and receiver (Bob) use the same key to encrypt and decrypt.
The following is the code scripts for my algorithm for encrypting/decrypting a message
inside an input BMP format image.
%ENCRYPTION CODE:
%Must first load a BMP format image into the variable X that will store the
%encrypted message.
%EXAPMPLE:
%X=imread('image.bmp','BMP')
pt=input('Please enter plaintext message: ','s');
len=length(pt);
pt2=dec2bin(pt,8);
key=inputdlg(‘'Please input key separated by spaces or commas: ');
key=str2num(key{1});
x=key(1,1);
y=key(1,2);
R=X(:,:,1);
G=X(:,:,2);
B=X(:,:,3);
[m n]=size(R);
R2=dec2bin(R);
G2=dec2bin(G);
B2=dec2bin(B);
R2=reshape(str2num(R2),m,n);
G2=reshape(str2num(G2),m,n);
B2=reshape(str2num(B2),m,n);
rng(x);
for u=1:y
rand;
end
for a=1:len
for b=1:8
rand1=randi(m,1,1);
rand2=randi(n,1,1);
rand3=randi(3,1,1);
ptind=str2num(pt2(a,b));
if ptind==0
if rand3==1
if round(R2(rand1,rand2)/2)~=R2(rand1,rand2)/2
R2(rand1,rand2)=R2(rand1,rand2)-1;
end
elseif rand3==2
if round(G2(rand1,rand2)/2)~=G2(rand1,rand2)/2
G2(rand1,rand2)=G2(rand1,rand2)-1;
end
else
if round(B2(rand1,rand2)/2)~=B2(rand1,rand2)/2
B2(rand1,rand2)=B2(rand1,rand2)-1;
end
end
else
if rand3==1
if round(R2(rand1,rand2)/2)==R2(rand1,rand2)/2
R2(rand1,rand2)=R2(rand1,rand2)+1;
end
elseif rand3==2
if round(G2(rand1,rand2)/2)==G2(rand1,rand2)/2
G2(rand1,rand2)=G2(rand1,rand2)+1;
end
else
if round(B2(rand1,rand2)/2)==B2(rand1,rand2)/2
B2(rand1,rand2)=B2(rand1,rand2)+1;
end
end
end
end
end
R_trans=reshape(R2,m*n,1);
R_bin=num2str(R_trans);
R_encoded=bin2dec(R_bin);
R_final=reshape(R_encoded,m,n);
G_trans=reshape(G2,m*n,1);
G_bin=num2str(G_trans);
G_encoded=bin2dec(G_bin);
G_final=reshape(G_encoded,m,n);
B_trans=reshape(B2,m*n,1);
B_bin=num2str(B_trans);
B_encoded=bin2dec(B_bin);
B_final=reshape(B_encoded,m,n);
X_encoded(:,:,1)=R_final;
X_encoded(:,:,2)=G_final;
X_encoded(:,:,3)=B_final;
X_enc=uint8(X_encoded);
imwrite(X_enc,'secret.bmp','BMP')
%DECRYPTION CODE
%Uses the BMP Image 'secret.bmp' to extract hidden message
Y=imread('secret.bmp','BMP');
key=inputdlg('Please input key separated by spaces or commas: ');
key=str2num(key{1});
x=key(1,1);
y=key(1,2);
[m n o]=size(Y);
ct_length=800;
rng(x);
for u=1:y
rand;
end
for i=1:ct_length
rand1=randi(m,1,1);
rand2=randi(n,1,1);
rand3=randi(3,1,1);
trans(i)=Y(rand1,rand2,rand3);
end
trans2=dec2bin(trans,8);
for i=1:ct_length
ct(i)=trans2(i,8);
end
for j=1:length(ct)/8
ct2(j,:)=ct(8*(j-1)+1:j*8);
end
MESSAGE=char(bin2dec(ct2))
Beginning with the first set of code, for encryption, the user will select a BMP image that
will be used to mask the hidden message. A BMP format must be used throughout the
encryption and decryption process due to conservation of intensity values. Though JPEG images
are the most readily downloaded images, they suffer from auto scaling of intensity values. This
makes imbedding a message difficult since the intensity values must remain unchanged to
preserve the encrypted message. This issue is easily solved by converting any target images
from a JPEG to BMP format with free online converters.viii Next, the user will be asked to input
the plaintext message and the key to be used. The original 3-dimensional matrix is separated
into three matrices for the red, green, and blue color intensity matrices. The key of the code,
which is a two number user input, utilizes ‘rng’, a MATLAB command that sets the random
number generator such that the numbers generated by any of the ‘rand’ commands will follow a
fixed sequence. This means that pixels can be selected at random from the base image and
undergo LSB imbedding with the pattern being easily recovered by knowing the ‘rng’ value.
However, beyond simply knowing the rng value, an attacker would also need to know the
number of “wasted” randomly generated numbers. The code uses the second key input to speed
through a portion of random numbers before actually imbedding data, which adds another level
of security. Understanding this, if a sender and receiver know the key, the value within the rng( )
and the number of “wasted” ‘rand’ commands, then they can easily input a message in random
locations that can only be recovered by using the same key or a brute force attack on all of the
rng( ) combinations of possible pixel arrangements. The later would be a costly time
computation that I will analyze in a few paragraphs. After establishing the key, the program
begins imbedding the binary representation of the character string input into the binary matrix
representation of the image. A series of conversions between ASCII, Decimal, and Binary can
be observed in the code, which is due to the manner in which MATLAB indexes the different
formats. The specifics of these conversions is unimportant, simply understanding that the
character input is converted to binary and imbedded using the LSB process described at the
beginning of this paper is important. Finally, the matrices that correspond to the red, green, and
blue color intensities are combined back into a single “cube” matrix that can be compiled back
into an image and exported as a BMP format.
After Bob receives the exported BMP image, referred to as ‘secret.bmp’, the message can
be decrypted by running the appropriate algorithm. The decryption algorithm requires the
receiver to input the key to set the ‘rng’ to the same value as for the encryption. From there, the
program begins generating the random numbers necessary to extract the LSB from the indexed
pixels. Finally, these LSB’s are compiled into bytes and converted back to ASCII, allowing the
receiver to decrypt the ‘secret.bmp’ image and obtain the imbedded message.
I will now present the security of the algorithm by looking at the key space. I propose
that it would be infeasible to attack this algorithm with a brute force attack, due to the size of the
key space. The original codes I generated were a one key system that only required an input of
the ‘rng’ value. However, the MATLAB command for ‘rng’ has a fixed range from 1 to 232.
This means that should the above codes get intercepted by an attacker as well as an encrypted
photo, then modifying the code to be prefaced with a for loop that indexes the possible ‘rng’
values would lead to a brute force crack. Though ‘rng’ values are a large key space that would
require large time computations to crack, it could be accomplished. On the other hand, including
an added layer of randomness to make the beginning generator value unknown to an attacker
greatly increases the key space. Without being able to determine the starting point of the LSB
values and being able to follow a simple sequence, the only way to crack the message is to
extract the LSB for every pixel and analyze all possible combinations. Below is a symbolic and
numerical example of an attempt to perform a brute force attack on my algorithm:
Given: MxNx3 matrix image
Key Space: Multiply the rows by column by depth= (M)(N)(3)
Combinations: (M)(N)(3)!
For a standard 400x600 pixel image, the computations become:
Given: 400x600x3 matrix image
Key Space: (400)(600)(3)= 720,000
Combinations: 720,000!
However, this should be rescaled due to the fact that an attacker could use detection software to
identify the location of altered pixels. An example image is provided below to demonstrate
seeing altered pixels in a pure white image by using a black bucket fill command in paint.
Assuming an attacker is able to identify the location of the LSB alterations for a 50 ASCII
character message, the computations would become:
Number of altered pixels: 50*8 (8 bits are required for every ASCII character)= 400
pixels.
Possible combinations: 400!
Therefore, it is reasonable to conclude that a brute force attack would be infeasible due to the
size of combinations and inability to determine the stored sequence of data without knowing the
two part key.
In summary, freeware versions of steganography are perfectly capable of imbedding
secret messages “in plain sight.” However, analysts with a working knowledge of LSB
imbedding and detection software would be capable of extracting the message. My proposed
algorithm introduces an element of security by incorporating a key for encryption. The
encryption relies upon random pixel selection and LSB imbedding, but does so in such a manner
that knowing the key makes it easy to extract the imbedded message. The key sets the random
number generator as well as the added security of “wasting” a number of the random
computations. This ensures that only a sender and receiver with the key are able to extract an
imbedded message from the source image. Can you spot the message?
Output MESSAGE =
S
E
C
R
E
T
Input Message: ‘SECRET MESSAGE!’
M
E
S
S
A
G
E
!
Works Cited
Douglas Withers, (Advisor), interview by Joshua Clark, Chauvenet Hall "JPEG vs. BMP Image
Formatting," March 27, 2013
Kessler, Gary. "An Overview of Cryptography ." Last modified April 03, 2013. Accessed April
10 , 2013. http://www.garykessler.net/library/crypto.html
Macklin, Paul. EasyBMP Project, "EasyBMP Code Sample: Steganography." Last modified
February 20, 2011. Accessed April 10, 2013.
http://easybmp.sourceforge.net/steganography.html.
University of Virginia CS Department, "Hiding Information Perfectly: A Short History of
Steganography ." Accessed April 04, 2013.
http://www.cs.virginia.edu/~wm2a/HistorialOverview.html
i
Overview of Cryptography
Ibid
iii
Ibid
iv
Ibid
v
History of Steganography
vi
Ibid
vii
EasyBMP
viii
Professor Withers Interview
ii
Download