mp3 stegonography - Terminally Incoherent

advertisement
MP3 STEGONOGRAPHY
Applying Stenography to Music Captioning
Lukasz Grzegorz Maciak
Micheal Alexis Ponniah
Renu Sharma
TABLE OF CONTENTS
MP3 STEGONOGRAPHY ......................................................................................... 1
TABLE OF CONTENTS ................................................................................................ 2
PROJECT GOAL............................................................................................................ 3
WHY MP3? .................................................................................................................... 3
THE ANATOMY OF AN MP3...................................................................................... 4
MP3 Encoding ............................................................................................................ 4
MP3 File Structure ...................................................................................................... 6
MP3 Headers ............................................................................................................... 8
STEGONOGRAPHY ON MP3 FILES ........................................................................ 11
Non Stegonographic Methods................................................................................... 11
ID3v2 Comment Frame ..................................................................................... 11
Lyrics3 Standard................................................................................................. 12
Non Stegonographic Methods Summary ....................................................... 13
Encode Time Stegonography .................................................................................... 13
Low Bit Encoding................................................................................................ 13
Phase Encoding ................................................................................................. 14
Spread Spectrum ............................................................................................... 14
Echo Data Hiding ............................................................................................... 15
Flaws of Encode Time MP3 Stegonography ................................................. 15
Existing Implementations .................................................................................. 16
Post Encoding MP3 Stegonography ......................................................................... 16
Unused Header Bit Stuffing .............................................................................. 17
Padding Byte Stuffing ........................................................................................ 17
DESIGNING STEGONOGRAPHIC SOFTWARE FOR MP3 FILES ........................ 18
Specifications ............................................................................................................ 18
Stegonographic Module – Implementation Notes .................................................... 19
StegIO Header Algorithm ......................................................................................... 23
Stegonographic Module - Padding Byte Stuffing ..................................................... 24
Stegonographic Module Usage ................................................................................. 25
Stegonographic Module – Implementation Issues .................................................... 26
IMPLEMENTING GRAPHICAL USER INTERFACE FRONT END ....................... 26
JAVA and MP3:............................................................................................................ 27
Java Sound API ......................................................................................................... 27
Java Media Framework (JMF) .................................................................................. 27
JAVA BASED MP3 PLAYER ..................................................................................... 28
MP3 Player Implementation: .................................................................................... 28
MP3 Player Implementation details: ......................................................................... 28
MP3 Player Screenshots ........................................................................................... 28
FUTURE WORK .......................................................................................................... 31
REFERENCES ................................................................................................................. 33
FIGURES .......................................................................................................................... 34
EQUATIONS .................................................................................................................... 34
PROJECT GOAL
The goal of this project is to embed textual information into a popular media using
stegonography. It can be assume that the text is relatively short when compared to the
media file. A good example of this is the relationship between a recoded song, and it's
lyrics. The audio file containing the recording is much larger than the song lyrics stored
as a plain ASCII files. Therefore it is probably safe to assume that the smaller file could
be stegonographically embedded into the larger one without impacting the quality.
Similar argument could be made about video data and close captioning information.
This project concentrates on the song/lyrics dynamics in order to create a
stegonographically driven karaoke machine. The song lyrics will be seamlessly
embedded into an audio file, and then displayed on the screen when the file is played.
This research will include implementation of stegonographic algorithm for encoding data
inside audio files, as well as technique to dynamically extract that data and play it back.
WHY MP3?
The MP3 format is a very good target for this research because it is currently one of the
most popular music encodings. Potential users of karaoke software are most likely to pick
the mp3 format above any other audio encoding on the market. Because the end goal of
this project is to create a usable piece of software, catering to the tastes and needs of the
end users seems to be a good idea.
Furthermore, mp3 is an open standard which means that it is well documented and
accessible. Thus the uncovering the inner workings of this format does not pose any legal
threats to the researchers. On the other hand, choosing a proprietary, closed format such
as Windows Media Audio (WMA) could put the researchers in legal jeopardy.
Doubtlessly any stegnographic research will rely heavily on exploiting certain properties
of the data format chosen as the information carrier. This project is no different. However
the actual research process, data examination and implementation steps could be
replicated for other media to create analogous solutions.
THE ANATOMY OF AN MP3
The implementation proposed in this paper relies heavily on the specific properties of the
mp3 data format. Therefore it is only logical to start the discussion by reviewing the
structure of these media files.
The mp3 format is designed to store audio data, which is different from visual
information stored in images. Therefore image stegonography techniques may not always
work with audio data. Furthermore, unlike some image data formats, mp3 files are
compressed and encoded in a very storage-conscious way. Thus they are not the best host
files for stegonographic data.
MP3 Encoding
MP3 is a lossy data format which aims to preserve the sound quality while minimizing
storage space. The encoding process takes into account the properties of human auditory
system. For example, humans cannot hear frequencies below 20Hz and above 20kHz.
Furthermore human ear is often unable to distinguish between two or more notes with
specific frequencies when they are played together. Thus mp3 file can safely discard any
sounds with frequencies out of the audible scale, and needs to store only a single copy
out of a group of similar sounding notes. This is of course not a trivial process. Mp3
encoders employ a complex psychoacoustic modeling to perceptually optimize the
data.[1]
The encoding process is very complex and involves both perceptual optimization, as well
as more conventional data compression methods. Figure 1 shows a conceptual model of
an mp3 algorithm:
Figure 1
Explaining the actual encoding procedure is out of the scope of this paper. However there
are several tasks performed by that the encoder that may be of some interest to a
stegonography researcher.
The mp3 encoder breaks the audio data into small fragments called frames. Each frame
represents a fraction of a second. The size of the frame depends on the audio resolution or
bit rate. The most convenient (algorithmically) to do this is to assume a constant bit rate
throughout the recording thus forcing same size onto all frames. However music is not
structured this way. Very often very dynamic sequences including vocals and many
instruments playing at the same time are interweaved with very simple melodic tracks.
Therefore using a constant bit rate (CBR) is not always economical. MP3 specification
allows the data to be stored in a variable bit rate format (VBR) which means that the
audio frames are not the same size. [1]
Each frame is perceptually analyzed using the psychoacoustic model. The frequencies
that are not audible are discarded, or allocated minimal number of bits. The exact inner
workings of this procedure are complex and beyond the scope of this paper.
Once perceptual optimization is done, the data is compressed using Huffman coding. This
is a lossless algorithm so the audio information is preserved, while decreasing storage
space. [1] This is an important fact for a stegonographer.
Because of the nature of the compression algorithm, Huffman coded data cannot be easily
modified. Huffman coded data is stored using variable length bit strings that are matched
against a lookup table. The most frequently used characters are encoded with the shortest
possible strings, while the rare ones are coded with longer strings. Thus it is possible that
certain values have two or three bit codes. [4] Inverting a single bit therefore can
completely change a value of the coded data.
Furthermore the data cannot be easily divided into bytes, words and etc. So the least
significant bit of a given byte may actually be the most significant bit of a Huffman
coded character. Therefore least significant bit substitution cannot be easily done on
Huffman coded data.
Compressed audio data is then reassembled. Each frame is pre-pended with a header
which stores information about the bit rate, sample rate and other meta-data [1].
MP3 File Structure
MP3 files are therefore composed from short data frames, padded with headers. MP3 file
can also contain some meta-data tags. There are two types of these tags. ID3v1 is the
older format which is post-pended at the end of the file. This tag is always 128 bytes long
and it contains seven fields which specify the artist name, song title, album, genre and
etc… Because of it’s static size, and lack of flexibility, this tag type is slowly replaced by
the more advanced ID3v2 standard. [6]
The newer, more flexible ID3v2 tags are pre-pended to the file. Their structure is almost
as flexible as the structure of the mp3 file itself. ID3v2 tags are composed of their own
frames which store various bits of information. This might be the standard character
strings such as artist name and song title or more advanced information about the way the
file was encoded. ID3v2 tags can be used to provide useful hints to the decoder. As an
example, equalization curves are often stored in ID3v2 tags. There is no set size limit on
ID3v2 tags so in theory they can grow indefinitely. [5]
MP3 files in circulation can include either tag type. There is no clear preference, so a
stegonographer has to be prepared to deal with information tags present either before or
after the audio data stream. However, it is logical to assume that ID3v1 tags will become
increasingly rare in the future. Figure 2 below shows a conceptual model of an mp3 file:
Frame 1
Figure 2
Due to their extendibility the ID3v2 tags would be an interesting target for embedding
information, however they are not guaranteed to be present in every mp3 file. Thus the
best approach is to embed the data into the data frames. Before discussing stegonographic
methodology however, it would be best to take a closer look at the data frame.
MP3 Headers
As it was mentioned before, MP3 files can be encoded with variable frame rate (VBR)
which in fact makes the frames vary in size. Since the frame sizes are not obvious it is
necessary to be able to identify where a frame starts and where it ends. This is not as
difficult as it would first appear. Each frame is pre-pended with a frame header. All
headers are very similar in structure and content. In fact, they will often be identical.
Thus, identifying an mp3 header is just a matter of pattern matching.
Each header starts with a 12 bit block called the Sync block (see Figure 3). The Sync is a
string of ones which is supposed to help the decoder to home in on a header. Therefore to
find a frame one simply needs to detect a 12 consecutive bits initialized to be 1.
Figure 3
However, this pattern is not necessarily unique to a header. In fact this pattern can be
easily found in any longer data block. There are few other checks that can be performed
to identify a 4 byte data block as a header:

The Layer field cannot be 00

The Bit-rate field cannot be 0000 or 1111

The Frequency field cannot be 11
A 4 byte block which starts with the Sync and does not violate the conditions listed above
is probably a header. [7]
Figure 4 shows an alternative view of the mp3 header in which the fields are marked with
characters. Table 1 provides brief explanations of each field.
Figure 4
Table 1
Position
Purpose
Length
A
Frame sync
11
B
MPEG audio version (MPEG-1, 2, etc.)
2
C
MPEG layer (Layer I, II, III, etc.)
2
D
Protection (if on, then checksum follows header)
1
E
F
Bitrate index (lookup table used to specify bitrate for this MPEG
version and layer)
Sampling rate frequency (44.1kHz, etc., determined by lookup
table)
4
2
G
Padding bit (on or off, compensates for unfilled frames)
1
H
Private bit (on or off, allows for application-specific triggers)
1
I
Channel mode (stereo, joint stereo, dual channel, single channel)
2
J
Mode extension (used only with joint stereo, to conjoin channel
data)
2
K
Copyright (on or off)
1
L
Original (off if copy of original, on if original)
1
M
Emphasis (respects emphasis bit in the original recording; now
largely obsolete)
2
Frame size is a function of bit-rate and sampling frequency. The size of a given frame in
bytes can be obtained using the following equation:
Equation 1
Most of the fields which are not involved in the size calculation are inessential. In fact,
some of them are very rarely used at all. However, the field G or the Padding Bit is very
significant. Sometimes the frames need to be padded with some empty bytes to even out
the frame rate. This means that some frames contain 1 byte of useless data. This
information can be easily exploited by a stegonographer.
STEGONOGRAPHY ON MP3 FILES
Having reviewed the mp3 format and its properties in details, it is now possible to discuss
the actual stegonographic approaches to store data in audio files. The available methods
can be divided into three categories:

Non-Stegonographic methods

Encode time mp3 stegonography

Post encoding mp3 stegonography
Non Stegonographic Methods
Embedding lyrics into audio files is not a new idea. There are several methods to embed
textual information in mp3 files. This can be done by either using the ID3v2 comment
tags or the special extension of ID3 called Lyrics3.
ID3v2 Comment Frame
The ID3v2 tags have limited support for embedding song lyrics. As per section 4.11 of
the specification document the commend frame is “is indented for any kind of full text
information that does not fit in any other frame.” [8]
This means that lyrics can be easily embedded into and extracted from the comment
frame. This method is not very efficient because there is no guidelines on how to format
the text lyrics, how to split them among text frames and etc. Furthermore, not all mp3
files use the ID3v2 tags so this is not a universal approach.
Lyrics3 Standard
To overcome the shortcomings of ID3v2 Comment Frame the ID3 standard was extended
with the Lyrics3 specification. This document describes how to add an extra frame
between the audio block and ID3 tag which would contain song lyrics accompanied by
meta-data. [9]
Figure 5
Figure 5 shows a possible way to embed Lyrics3 frame into an mp3 file. As shown in the
picture the lyrics block always starts with a literal string LYRICSBEGIN and ends with
LYRICSEND. The Lyrics3 spec also includes fields specifying the author of the lyrics
(ie. the person who wrote them down), a link to an album cover and the time it takes for
each line of text should be displayed on the screen.[9]
There are many existing plug-ins for popular mp3 players that work with this standard.
Non Stegonographic Methods Summary
Unfortunately neither one of the methods described above can be used in conjunction
with this project. While they both are valid and successfully implemented approaches
they have nothing to do with stegonography. This projects main focus is attempting to
develop a stegonographic method of embedding lyrics into music files.
Encode Time Stegonography
As the name suggest, encode time mp3 stegonography is conducted during the mp3
encoding process. These methods require the researcher to either implement their own
encoder, or modify an existing one to introduce stegonographic data into the audio stream
after the psychoacoustic optimization, but before Huffman compression.
Encode time mp3 stegonography include:

Low Bit Encoding

Phase Coding

Spread Spectrum Coding

Echo Data Hiding
Low Bit Encoding
Low bit encoding or Least Significant Bit (LSB) encoding technique proposes to encode
the least significant bit of a host file with a bit of the stegonographic data. The this
method assumes that this alteration introduces only a minutiae difference into the host
file which would be hard to detect. This is not necessarily true in mp3 files. Since mp3 is
composed from bit field headers (where flipping a single bit can for example cause the
decoder to interpret the frame size wrong) and Huffman Coded data it is nearly
impossible to do real LSB stegonography.[2]
Thus the only possible approach is to conduct LSB at encode time, on psychoacoustically compressed data before it hits the Huffman compression stage. Since
Huffman compression is lossless the compression the encoded information will be
preserved. Then the embedded information can be recovered by decompressing Huffman
code of each frame and extracting the least significant bit.
While effective, this method can introduce significant levels of noise into the music data.
This is unacceptable, as this project aims to introduce as little noise as possible.
Phase Encoding
Phase encoding is a much more complex method than the simplistic LSB encoding. Phase
encoding “works by substituting the phase of an initial audio segment with a reference
phase that represents the data. The phase of subsequent segments is adjusted in order to
preserve the relative phase between segments” [10]
The IBM researchers claimed to achieve a capacity that “varied from 8 bps to
32 bps, depending on the sound context” [10]
Similarly to LSB, this method requires manipulation of data before Huffman encoding. It
is supposed to introduce much less noise into the recording
Spread Spectrum
Spread spectrum encoding aims to spread out the encoded data across the available
frequencies. The text is modulated using pseudorandom noise sequence which then
becomes the key. Then the modulated data is attenuated and added to the original file as
additive random noise. [10]
Due to the wide spread of the noise, it is possible that it will be almost inaudible to a
human ear. However due to the complexity of implementing this method, it was beyond
the scope of this project.
Echo Data Hiding
Text can be embedded in audio data by introducing an echo to the original signal. The
data is then hidden by varying three parameters of the echo: initial amplitude, decay rate,
and offset. It is possible to time the echo in such a way that a human ear will not be able
to distinguish between the two signals, and registers the echo as some added resonance.
However, doing so is not trivial. [10]
There is no set rule of how long should be the delay between the signal and echo to
achieve near synthesis. According to IBM research the best delay for the coded echo is
1/1000 of a second for most listeners. [10]
Flaws of Encode Time MP3 Stegonography
Encode time stegonography is possibly the best method to hide data in audio files. It
provides the best diffusion and confusion rates and is optimal for data hiding. However it
has some important flaws.
To implement one of the encode time methods a researcher needs to either create or
modify an existing mp3 encoder. The decoder also needs to be modified to extract the
embedded data and decode time. This makes the encode time methods difficult, and time
consuming to implement.
In addition encode time methods are inexplicably tied to a single implementation of mp3
encoder/decoder. This creates portability and extendibility issues, and prevents
researchers using proprietary or non-extendible encoding algorithms.
Furthermore to encode textual data one has to have access to a non-mp3 source audio file
in the form of a wav file. One should not assume that users of the lyric embedding
software will have music stored in a source format. Most of the time music collectors
collect mp3’s, not wav’s. Requiring users to have a collection of wav files readily
available defeats the purpose of this project. Ideally, a user should be able to obtain a
random mp3 file and embed lyrics inside of it without the need for a source file.
Existing Implementations
The mp3stegoc software created by Fabien Petitcolas is a great example of encode time
stegonography. His algorithm manipulates the data at a very early stage of encoding
before the audio data is optimized. He included built in checks which prevent the
introduced noise to exceed the psychoacoustic thresholds optimized by the mp3 encoder.
His algorithm resembles phase encoding methodology. [11]
This implementation is however limited as it requires the user to have a 16 bit mono wav
source file encoded with 44100Hz pulse code modulation. It does not allow the user to
encode stereophonic wav files. [2]
It is difficult to expect anyone to have source files which fit this specification. Thus
Petitcilas mp3stego software is not a good model for this project.
Post Encoding MP3 Stegonography
There is very little research done in the area of post encoding stegonography. This is
probably due to the fact that post encoding methods cannot achieve very good diffusion
of stegonographic data. Mp3 file format is not very flexible, and as it was mentioned
above writing over random bytes can irreversible corrupt audio data.
However, for the purpose of this project this seems to be the right approach. Since lyrics
do not need to be hidden, diffusion and confusion concerns are not an issue. Post encode
time methods usually exploit the quirks of mp3 file structure, and hide data in blocks that
do not belong to actual audio data sequences. This makes audio corruption virtually
impossible.
There are at lest two possible approaches to post encoding stegonography:

Unused Header Bit Stuffing

Padding Byte Stuffing
Unused Header Bit Stuffing
The mp3 frame headers are composed of various fields explained in Table 1. Some of
these fields are very rarely used. Good examples are the Private bit, the Copyright bit, the
Original bit, and the Emphasis bit. Most mp3 players completely ignore these fields. It is
also safe to assume that their value is not essential to the integrity of the audio frame.
Changing one of these fields may at worst cause the player to misinterpret copyright
information about the given frame.
Therefore each audio frame contains at least 4 bits that can be used for embedding data.
Koso et all proposed using this method to implement digital watermarking, but
embedding lyrics would be a logical extension of this approach. At 4 bits per frame, one
is able to achieve a decent capacity. [12]
Padding Byte Stuffing
Some mp3 audio frames are padded with an “empty” byte to even out the frame rate. On
average mp3 files tend to have hundreds of frames which need to be padded. Since the
padding bytes do not carry any audio information they are a good target for data stuffing.
Padding byte stuffing is an attractive method because it is relatively straightforward to
implement and has good average storage capacity. It is possible to encode information at
1 byte per frame as long as padding bytes are available. Since lyrics files are usually
small, there should usually be enough padding bytes to contain the whole text.
Surprisingly, there is very little research dealing with padding byte stuffing. In fact there
is very little documentation available regarding the nature of padding bytes, their location
in the frame and similar information. It seems to be a new avenue for stegonographic
research.
Thus this project proposes to use padding byte stuffing to embed song lyrics into mp3
files.
DESIGNING STEGONOGRAPHIC SOFTWARE FOR MP3 FILES
Specifications
After considering all available options the following set of specifications and
requirements was agreed upon the whole research team:
1. The project must be platform independent and portable.
2. The implementation should not be tied to a single mp3 decoder.
3. As per 1 and 2, Post Encoding Stegonography should be used
4. The software should use Padding Byte Stuffing as the primary stegonographic
method.
5. A graphical mp3 player interface with a text box for displaying lyrics should be
part of implementation.
6. The lyrics should be synchronized with the song
7. There should be clear separation between the stegonographic layer and display
layer.
Java was chosen as the primary language for implementing the software due to the
platform independence requirement. This was also the language the research team was
familiar with.
As per requirement 7, the project was divided into two distinct parts: the stegonographic
layer and presentation layer.
Stegonographic Module – Implementation Notes
The stegonographic layer was implemented as stand alone application. The mp3stego.jar
file can be downloaded from
http://www.csam.montclair.edu/~maciakl/stuff/projects/security/index.php
The application performs two functions: embedding the text into mp3 files and extracting
embedded text. To conserve space the text is encrypted using lzw algorithm.
The lzw algorithm is a dictionary based compression. “The dictionary starts off with 256
entries, one for each possible character (single byte string). Every time a string not
already in the dictionary is seen, a longer string consisting of that string appended with
the single character following it in the text, is stored in the dictionary. The output consists
of integer indices into the dictionary. These initially are 9 bits each, and as the dictionary
grows, can increase to up to 16 bits.” [14]
Figure 6
Figure 6 shows an example application of lzw algorithm. In the example above the
savings are 12.5%. Longer text will usually yield more compression than a short one. In
fact, documents of significant length can be even compressed by a factor of 2 or more.
[15]
For song lyrics (which are usually relatively short – text documents below 5kB) an
average compression rate of 20 to 30% was achieved.
The java based implementation of lzw used in this project was developed by Cheok Yan
Cheng. [12]
Figure 7 shows the UML diagram of a proposed implementation. Below is a short
overview of the most important classes.
mpstego.mp3.Header – can be initialized with an integer array of size 4 which
represents a header read from the file. The isValid() method can be used to test if the
array represents a valid mp3 header. If it does, this class allows one to calculate the size
of the frame described by this header, and availability of a padding byte.
mpstego.file.text.lzw.Compress – implementation of the lzw compression algorithm
developed by Cheok Yan Cheng. It will generate a compressed lzw file that can be used
for embedding in an audio file.
Figure 7
mpstego.file.text.lzw.Decompress - implementation of the lzw compression algorithm
developed by Cheok Yan Cheng. It reads a lzw file and generates a plain text ASCII file.
mpstego.file.mp3.StegIO – is the input/output class. The mp3 file is edited in place
using the java.io.RandomAccessFile library. StegIO loops through the file, reading 4 byte
segments and tries to match a header using the Header class. Once a header is identified,
StegIO seeks to the padding byte location and either reads or writes out a byte. StegIO
also obtains the lyrics from a text file using java.io.FileInputStream.
StegIO has a dual role of reading and writing data into the file. Initially, the plan was to
split these roles among two different classes. However, the process of identifying headers
and searching for padding bytes is quite complicated. The same algorithm must be used
in both cases. It would be a bad design choice to replicate the code in two different
classes. Thus reading and writing was integrated in a single class.
Figure 8 and Figure 9 show potential flow of control diagram. The Main class calls either
the Compress or Decompress to do the lzw compression, and then calls the StegIO class
to do actual embedding. This is a very simple, and yet powerful design.
Figure 8
Figure 9
StegIO Header Algorithm
The StegIO class uses a fairly straightforward (but not trivial) algorithm to find padding
bytes. The procedure used to accomplish this can be expressed in the pseudo code in
Figure 10.
calculate the length of the message in bytes
prepend the length to the message
put the message on a byte queue
while(there are still bytes to be written)
{
Header = read 4 bytes from the file
if(Header is_valid && contains padding byte)
{
seek to the end of frame
pop the byte from the queue
write the popped byte into file
}
}
Figure 10
To read back the data, and reverse the process see the pseudo code in Figure 11.
length =
integer > 4
counter = 0
while (counter < length)
{
Header = read 4 bytes from the file
if(Header is_valid && contains padding byte)
{
if(counter == 4)
{
length = to_integer ( pop 4 bytes from queue)
}
seek to the end of frame
read a byte from the file
push the read byte onto queue
}
}
Stegonographic Module - Padding Byte Stuffing
Using the padding byte stuffing method to embed the lyrics into a mp3 file poses a
significant problem. How to store compressed text in a way that is easy to extract later?
For example, how does extraction algorithm know that it has read the last byte of lyrics?
One could assume that padding bytes always hold the value of 0 but that does not have to
be always true.
Another method would be to put a stop marker as the last byte of encrypted text.
However, since text is stored as compressed binary file, it is not always possible what
kind of characters will it contain. Since lzw tokens can be from 9 to 16 bits, some data
bytes can contain unpredictable combinations.
Thus another method of marking the end of lyrics is needed. For the purpose of this
project, the length of the lyrics in bytes is appended to the beginning of the textual
message. It is stored as an integer, so the first 4 padding bytes always contain the
information about the length of the lyrics data.
This makes it possible to encode up to 232 bytes of data in a given mp3 file. Fortunately
no mp3 file can possibly contain that much empty space. Thus this limit is perfectly
acceptable.
Stegonographic Module Usage
To embed text into an mp3 file one needs to specify the mp3 file and the lyrics file on the
command line:
java –jar mp3stego.jar mp3_file.mp3 text_file.txt
Where mp3_file.mp3 is the path to the host mp3 file and text_file.txt is a plain text ASCII
file containing the lyrics. This will create a compressed lzw file with the same name as
the text file in the same directory. The user does not need to do anything with this file,
he/she should just be aware that one will be created.
To extract lyrics one needs to specify just the mp3 file name on the command line:
java –jar mp3stego.jar mp3_file.mp3
This will create both a compressed lzw file and plain text ASCII file in the same directory
as the mp3 file.
Stegonographic Module – Implementation Issues
The main problem with Padding Byte Stuffing method is the limited amount of
information available about how the mp3 frames are padded. The assumption that the
padding byte is the very last byte of the audio block of a frame does not necessarily need
to be correct. This was the assumption used in this project.
Unfortunately the algorithm developed for this implementation failed to correctly identify
padding bytes. This was either because this assumption was incorrect, or due to
programming errors, The ext data was stuffed into what appeared to be the last byte of
the audio frame, but these bytes were empty (ie 0 or 0xFF) only half of the time.
Thus written data usually introduced significant amount of noise into the recording.
Because the mp3 file is always edited “in place”, it is highly recommended to use this
algorithm only on redundant copies of mp3 files due to the high risk of data corruption.
Because of time constraints, it was not possible to identify single source of the data
corruption. It is likely that this is due to the limited knowledge about the nature of
Padding Bytes. Stuffing might not a valid stegonographic method.
It is also possible that the code contains a trivial error that is yet to be discovered.
IMPLEMENTING GRAPHICAL USER INTERFACE FRONT END
Having a functional stegonographic back end was only part of the project. Next part
would be implementing a graphical user interface to play mp3 files and display lyrics on
screen. Once again, this is not a trivial task. Since the language of choice for this project
was Java, it would be a good idea to review the support for playing mp3 files in this
language.
JAVA and MP3:
Java provides a low-level API for effecting and controlling the input and output of sound
media, including both audio and Musical Instrument Digital Interface (MIDI) data, called
sound API. More advanced features can be found in external libraries created by the Java
community. One example of such library set is the Java Media Framework (JMF).
Java Sound API
The Java Sound API provides the lowest level of sound support on the Java platform. It
provides application programs with a great amount of control over sound operations, and
it is extensible. For example, the Java Sound API supplies mechanisms for installing,
accessing, and manipulating system resources such as audio mixers, MIDI synthesizers,
other audio or MIDI devices, file readers and writers, and sound format converters. The
Java Sound API does not include sophisticated sound editors or graphical tools, but it
provides capabilities upon which such programs can be built. It emphasizes low-level
control beyond that commonly expected by the end user.
Java Media Framework (JMF)
There are other Java platform APIs that have sound-related elements. The Java Media
Framework (JMF) is a higher-level API that is currently available as a Standard
Extension to the Java platform. JMF specifies a unified architecture, messaging protocol,
and programming interface for capturing and playing back time-based media. JMF
provides a simpler solution for basic media-player application programs, and it enables
synchronization between different media types, such as audio and video. On the other
hand, programs that focus on sound can benefit from the Java Sound API, especially if
they require more advanced features, such as the ability to carefully control buffered
audio playback or directly manipulate a MIDI synthesizer. Other Java APIs with sound
aspects include Java 3D and APIs for telephony and speech. An implementation of any of
these APIs might use an implementation of the Java Sound API internally, but is not
required to do so. [15]
JAVA BASED MP3 PLAYER
Due to limited support in the standard API a java based mp3 player has to employ the
some external libraries. JMF is a mature project and it is employed in many existing java
applications. Thus it seemed a logical choice to use it.
MP3 Player Implementation:
JMF is a very popular framework, and there is an abundance of existing JMF based
players. Writing another player from scratch would be an interesting exercise, but it
seems out of the scope for this project (which is focused on stegonography). Therefore,
instead of inventing the wheel the choice was made to use an open source java player
called “Java MP3 player”. [http://www.codetoad.com/java_mp3_player.asp]
This is purely a swing based MP3 player.
Goal using this player:
1. Able to read text from a file.
2. Display the lyrics and change them accordingly.
MP3 Player Implementation details:
The javax.media.Player class has a method getMediaTime() which returns a Time class.
Using this method what we are trying to do is to track the time lapsed from the start of
the song till any give given time when the song is played.
MP3 Player Screenshots
Figure 11 shows the main window of the mp3 player. It contains all the important
controls (ie. play, stop, rewind). The long empty rectangle is a label which will display
the lyrics.
Figure 11
Figure 12 shows a screen allowing the user to create a play list. This generates an m3u
file which then can be played as shown in Figure 13.
Figure 12
Figure 13
Figure 14
Figure 14 shows the actual player in action. The lyrics are displayed in the viewing area,
along with the progress bar.
The player utilizes the stegonographic module to extract the lyrics from a mp3 file prior
to playing the song.
SYNCHRONIZING MUSIC AND TEXT
An integral part of the project was to develop a way to synchronize the music and the text
displayed on the screen to create Karaoke like effect. To achieve this, the embedded
lyrics had to include some sort of meta-data that would specify how long each line should
be displayed on the screen.
Lyrics Meta Data Format
The following standard for recording lyrics was adopted:

Each line starts with an integer specifying the time offset from the beginning of
the song (in milliseconds)

The time offset is followed by a pipe character “ | ”

The pipe character is followed by the text to be displayed on the screen

The line is ended with a single new line character (“ \n ”)

The lyrics file is terminated with a single hash mark (“ # “) on a new line
Figure 15 shows a sample lyrics file would be formatted
19100|May the good Lord be with you
21250|Down every road you roam
28300|And may sunshine and happiness
31150|surround you when you're far from home
…
207790|Forever Young
217240|For, forever young
227050|Forever Young
#
Figure 15
Synchronization Method
To obtain the time offset a manual method was used. In other words, the time readings
for each line were obtained by timing the song using a stopwatch and recording the
results. This method is far from perfect, but due to time constraints this was the only
markup method available to this research team.
Because of the limits of human reflexes and response time, this method introduces a
small delay. The lyrics are displayed a fraction of a second after the vocalist starts
singing, because this is how long it takes for the human brain to process the information
and record the time. Thus the time offsets often have to be tweaked by hand to achieve
full synchronization.
FUTURE WORK
Because of the problems with finding Padding Bytes it would be very interesting to
implement the stegonographic module with another method. The Unused Header Bit
approach seems to be the most appropriate one. It could be easily modified to work
within the existing framework of the stego module.
It would also be worth it to slightly redesign the class structure to achieve better
encapsulation. StegIO class should be made abstract, and the actual reading and writing
tasks should be moved into its subclasses. This is purely a cosmetic change, but it would
greatly improve the readability and maintainability of the code base.
The manual synchronization method also should be improved or replaced with an
automated one. It would be a very interesting project to see if it would be possible to
generate time offset information by analyzing an mp3 file, when given a lyrics file.
REFERENCES
1. Hacker, Scot, “MP3, The Definitive Guide”, 1st Edition, March 2000, O'Reilly
Publishing.
2. Noto, Mark, MP3Stego: Hiding Text in MP3 Files, September 2001, SANS
Institute.
3. Koichi Takagi, Shigeyuki Sakazawa, Yashuiro Takishima, “Light Weight MP3
Watermarking Method for Mobile Terminals”, KDDI R&D Labs, MM'05
November 6-11, 2005 Singapore, ACM
4. “Huffman Code”, Wikipedia, http://en.wikipedia.org/wiki/Huffman_code
5. M. Nilsson, “ID3 tag version 2.4.0 - Main Structure”, November 2000,
http://www.id3.org/id3v2.4.0-structure.txt
6. Predrag Supurovic, “MPEG Audio Frame Header”, 1998 DataVoyage,
http://www.dv.co.yu/mpgscript/mpeghdr.htm#MPEGTAG
7. “The Private Life of MP3 Frames”, http://www.id3.org/mp3frame.html
8. M. Nilsson, “ID3 tag version 2.3.0”, February 1999,
http://www.id3.org/id3v2.3.0.html
9. Strnad Peter, Gingold Peter, “Lyrics3 Tag v2.00”, Jun 1998,
http://www.id3.org/lyrics3200.html
10. Bender W., Gruhl D., Morimoto N., Lu A., “Techniques for data hiding”, 1996,
IBM, http://www.research.ibm.com/journal/sj/353/sectiona/bender.txt
11. Petitcolas Fabien A. P., “mp3stego”, 1997–2005,
http://www.petitcolas.net/fabien/steganography/mp3stego/
12. Koso A., Turi A., and Obimbo C., “Embedding Digital Signatures in MP3s”, from
proceedings 477 Internet and Multimedia Systems, and Applications, 2005
13. Cheng Cheok Yan, “Introduction On Text Compression Using Lempel, Ziv,
Welch (LZW) method”, http://www.geocities.com/yccheok/lzw/lzw.html
14. “LZW”, Wikipedia, http://en.wikipedia.org/wiki/LZW
15. “Information-and-Entropy”, MIT, Spring 2003,
http://ocw.mit.edu/NR/rdonlyres/Electrical-Engineering-and-ComputerScience/6-050JInformation-and-EntropySpring2003/ABF6E960-C29C-48BB-
95AE-AF9F79D9E20B/0/chapter3new.pdf
FIGURES
1. Fraunhofer-Gesellschaft,
http://www.iis.fraunhofer.de/amm/techinf/layer3/index.html
2. Image © 2005, Lukasz Grzegorz Maciak.
3. “The Private Life of MP3 Frames”, http://www.id3.org/mp3frame.html
4. Hacker, Scot, “MP3, The Definitive Guide”, 1st Edition, March 2000, O'Reilly
Publishing.
5. Strnad Peter, Gingold Peter, “Lyrics3 Tag v2.00”, Jun 1998,
http://www.id3.org/lyrics3200.html
6. “Information-and-Entropy”, MIT, Spring 2003,
http://ocw.mit.edu/NR/rdonlyres/Electrical-Engineering-and-ComputerScience/6-050JInformation-and-EntropySpring2003/ABF6E960-C29C-48BB95AE-AF9F79D9E20B/0/chapter3new.pdf
7. Image © 2005, Lukasz Grzegorz Maciak
8. Image © 2005, Lukasz Grzegorz Maciak
9. Image © 2005, Lukasz Grzegorz Maciak
10. Code Snippet © 2005, Lukasz Grzegorz Maciak
11. Code Snippet © 2005, Lukasz Grzegorz Maciak
12. Image © 2005, Micheal Alexis Ponniah
13. Image © 2005, Micheal Alexis Ponniah
14. Image © 2005, Micheal Alexis Ponniah
EQUATIONS
1. Hacker, Scot, “MP3, The Definitive Guide”, 1st Edition, March 2000, O'Reilly
Publishing.
Related documents
Download