Master’s Project Report
Secure communication of multimedia through encryption using matrix algorithm
Sandeep Chandrashekaregowda
With the advancement of the ages, man has greatly found the need to communicate through
distances. Initially this being accomplished through snail-mail was not real enough. He wanted to
communicate vital moments of his file, his thoughts through the usage of more realistic means by
the usage of multimedia, which is nothing but audio and video, which helped to share interesting
thoughts, interesting audio/video files among people. Sharing of such files often requires
communicating through networks of computers, which is not always secure enough. It is often a
requirement that the file being shared is only visible or usable by the intended recipient,
sometimes it is also may be essential to disguise the intruder of the file being different than what
it really is. And in some commercial purposes it also may be required that only parts of the
communicated audio/video files are playable. This arouses the need to device a methodology to
securely communicate these multimedia files and hence protect the intellectual property of
multimedia from attacks arising out of a hostile network environment.
1.1 Aim and Objective of the project:
To device a methodology by which the video and audio files are secured in time and space
efficient manner.
To device an encryption methodology that utilizes various available encryption techniques and
helps secure multimedia data files in such a manner that securing only information in the frame
data provides an effect of securing the file as a whole. This securing process is to be carried out
in a manner so as to reduce the amount of time used up in securing the file.
1.2 Need of Secure communication
With the advent and consequent vast growth of the Internet. Intellectual property has become
vulnerable to a number of threats that range from information retrieval to destruction of the
intellectual property. Hence one has found the extensive need to secure such intellectual
property. Intellectual property in the form of multimedia data files has been under constant threat
over the years.
Given the fact that often files (including multimedia) would need to be communicated through
possibly insecure channels where an imposter or an intruder may cause extensive damage to such
intellectual property. It has become the need of the hour that methods are devised to ensure
secure communication of such file.
1.3 Applications and Benefits:
Securely communicate multimedia data
Multimedia on Demand.
Protection of multimedia from threats "rising in a hostile network environment.
Better encryption than standard textual encryption methods as it makes use of specialized
structure of multimedia thus providing a time and space effective solution for secure
Pay and play multimedia on the internet.
2.1 SECMPEG by Meyer and Gadegast, 1995
In 1995 Meyer and Gadegast introduced the encryption method called Secure WPEG, or shortly
SECMPEG, designed for the MPEG-1 video standard. The SECMPEG I contains four different
levels of security. At the first level, SECMPEG encrypts the [headers from the sequence layer to the
slice layer, while the motion vectors and DCT [blocks are unencrypted. At the second level, most
relevant parts of the I-blocks are I additionally encrypted (upper left corner of the block). At the third
level, SECMPEG encrypts all I-frames and all I-blocks. Finally, at the fourth level, SECMPEG
encrypts the whole MPEG-1 sequence. The authors chose Data Encryption Standard (DES)
symmetric key cryptosystem, which was the natural choice, given that this cryptosystem had been
around since 1976 and was the official symmetric encryption algorithm standardized by National
Institute of Standard and Technology (NIST) and adopted by the US Government. Since DES is a
symmetric key cryptosystem, it could only be used to achieve confidentiality. Meyer and Gadegast
targeted solving the problem of data integrity as well. For that reason, the Cyclic - RedundancyCheck (CRC) was incorporated as a low-level solution to the integrity. The real data integrity
mechanisms that included public key cryptography and cryptographically good hash functions such
as MD4, MD5, or SHA were left for further research.
The encryption in SECMPEG (levels 1, 2, and 3) has some weaknesses. It is own that even though
single P- or B-frame on its own carries almost no information tout the corresponding I-frame, a series
of P- or B-frames can tell a lot if their base I-frames are correlated. Since SECMPEG introduces
changes to the MPEG-1 format, a special encoder and decoder is needed to handle SECMPEG
streams. Nevertheless, the SECMPEG paper and implementation my Meyer and Gadegast was one of
the first important research initiatives for selective encryption of multimedia streams.
2.2 Video Encryption Algorithm by Qiao and Nahrstedt, 1997
The Video Encryption Algorithm (VEA) by Qiao and Nahrstedt is constructed with the goal to exploit
the statistical properties of the MPEG video standard. The algorithm consists of the following four
Step 1: Let the 2n byte sequence, denoted by ala2...a2n, represent the chunk of an I-frame
Step 2: Create two lists, one with odd indexed bytes ala3...a2n-l, and the other with even indexed
bytes a2a4...a2n.
Step 3: XOR the two lists into an n-byte sequence denoted with clc2...en
Step 4: Apply the chosen symmetric cryptosystem E (for example DBS or AES) with the secret
key KeyE on either odd list or even list, and thus create the cipher text sequence
EKeyE(ala3...a2n-l) or clc2...cnEKeyE(a2a4...a2ri) respectively.
2.3 Video Encryption Methods by Alattar, Al-Regib and Al-Semari, 1999
In 1999, Alattar, Al-Regib and Al-Semari presented the three methods for selective video
encryption based on DES cryptosystem. These methods, called simply Method /, Method II and
Method III, were computationally improved versions of the previous work from two of the coauthors, which is referred to as Method 0. The first algorithm (Method 0), proposed by Alattar
and Al-Regib in, essentially encrypts all macro blocks from I-frames and the headers of all
prediction macro blocks using DES cryptosystem. This method performs relatively poorly
because encryption is carried out on 40%-79% of the MPEG video stream.
In Method I, the data of every nth macro block from the I-frame of MPEG video stream is
encrypted using DES cryptosystem, while the information from the all other I-frame macro
blocks is left unencrypted. The value of n was not specified, and it can be chosen depending on
the application needs. If the value of n is 2 then the encryption is performed on approximately a
half of all I-frame macro blocks, but the security level is higher. On the other hand, if the value
of n is higher, the computational savings are bigger, yet the security level is lower. An important
observation is that even though the certain number of I-macro blocks is left unencrypted, they are
not expected to reveal any information about the encrypted ones.
To improve the security of Method I, Alattar, Al-Regib and Al-Semari suggested Method II,
which additionally encrypts the headers of all predicted macro blocks using DES. Since DES is a
block cipher that operates on 64-bit blocks, a 64-bit segment starting from the header of a
predicted macro block is processed in the beginning. This segment may include exactly the
whole header (which is the rare case when header size is equal to 64 bits), a part of the header
(when header size is larger than 64 bits), or the whole header along with the part of the macro
block data (when the header size is smaller than 64 bits). In the case when the encrypted segment
contains a part of the header, an adversary would have serious problems with synchronization,
which adds to the security regarding motion vectors. The security is further increased if the
encrypted segment also contains a part of the macro block data. The computation performed
using Method II is clearly faster than that of the Method 0, but slower than that of Method I.
Finally, Alattar, Al-Regib and Al-Semari proposed Method III to reduce the amount of
computation from Method II. Namely, instead of encrypting all predicted macro blocks, the
encryption in Method III is performed on every nth predicted macro block, along with encrypting
every nth I-macro block.
2.4 Partial Encryption Algorithms for Videos by Cheng and Li, 2000
The partial encryption schemes for still images introduced by Cheng and Li are also further extended
to the videos. The approaches proposed by Cheng and Li are not suitable for JPEG image
compression, and thus naturally also not suitable for the MPEG video compression standard. Instead,
the partial encryption algorithms are designed for the video compression methods, which use either
quadtree compression or wavelet compression based on zero trees for the video sequence
intraframes, motion compensation, and residual error coding. For example, the partial encryption is
applicable to the videos that are based on the Set Partitioning In Hierarchical Trees (SPIHT) image
compression algorithm, which is an application of zerotree wavelet compression. Cheng and Li's
partial encryption algorithms are designed to disguise the intraframes (I-frames), the motion vectors,
and the residual error code of the given video sequences. In both quadtree compression and wavelet
compression based videos, all I-frames are encrypted using the previously discussed methods for
partial encryption of still images by Cheng and Li. In addition, it is also important to encrypt the
motion vectors. If the motion vector information is unencrypted, the adversary may be able to use an
image frame to obtain approximations to the successive frames. Almost all motion estimation
algorithms divide the frame into blocks and try to predict their movement (the position in the next
frame) by constructing the estimated motion vector for each block. The blocks that belong to the
same large object often have identical motion vectors and it is efficient to encode these vectors
together. The authors restrict to those video encryption algorithms that use a quadtree for merging
these blocks. Then, quadtree partial encryption is used to encrypt the motion vectors. Finally, for the
security purposes it is important to encrypt the residual error as well. Unencrypted residual error may
reveal the outline of a moving object. The residual error is often treated as an image frame and then
compressed using some standard image compression algorithm. Again, we restrict ourselves to video
compression algorithms that use either quadtree or wavelet based image compression algorithm to
compass the residual error frames. Thus, Partial encryption schemes for both quadtree and wavelet
compression can be applied to the residual error encryption.
3. Algorithm
This project will encrypt only the selected frames in multimedia, render the file playable.
Each selected frames are encrypted separately.
Analyze to find out
frames to be encrypted
Encrypt frames
Remaining frames
Analyze as to which frames
to be decrypted
Frames to be decrypted
Decrypt frame
Decrypted frames
Fig 1: Figure 1 illustrates a simplified implementation of the algorithm
3.1 Matrix Encryption Algorithm
Figure 2 – Matrix Encryption Algorithm
The Algorithm is:
Step 1. Create matrix
Step 2. xor X00(00,01, 10, 11) with X01(02, 03, 12, 13) respectively which updates only 1/4th of
Step 3. Rotate X00->X01->X11->X10->X01
Step 4. Add Key.
Step 5. Repeat step 2, 3, 4 for 3 more times. (Means all the 4 parts of matrix are updated).
Since the first step operates on itself (in all the 4 rounds, the same above operation works), we need not
to keep the s-Box like AES algorithm.
The second step is just like AES algorithm, it uses 16 bytes key, which will be XORed with the resultant of
the above step. (AES algorithm also uses 128 bit key, which mean 16 bytes).
The third step will actually shuffles the bytes in the matrix. Unlike AES, which will shifts values within row.
This algorithm will rotate the value position. The values changes row-wise as well as column-wise.
Multimedia is media and content that uses a combination of different content forms. The term can be
used as a noun (a medium with multiple content forms) or as an adjective describing a medium as
having multiple content forms. The term is used in contrast to media which only use traditional forms
of printed or hand-produced material. Multimedia includes a combination of text, audio, still images,
animation, video, and interactivity content forms. Multimedia has become an inevitable part of any
presentation. It has found a variety of applications right from entertainment to education. The
evolution of internet has also increased the demand for multimedia content. Multimedia is the media
that uses multiple forms of information content and information processing (e.g. text, audio,
graphics, animation, video, interactivity) to inform or entertain the user. Multimedia also refers to the
use of electronic media to store and experience multimedia content. Multimedia is similar to
traditional mixed media in fine art, but with a broader scope. The term "rich media" is synonymous
for interactive multimedia.
Multimedia may be broadly divided into linear and non-linear categories. Linear active content
progresses without any navigation control for the viewer such as a cinema presentation. Non-linear
content offers user interactivity to control progress as used with a computer game or used in selfpaced computer based training. Non-linear content is also known as hypermedia content.
Multimedia presentations may be viewed in person on stage, projected, transmitted, or played locally
with a media player. A broadcast may be a live or recorded multimedia presentation. Broadcasts and
recordings can be either analog or digital electronic media technology. Digital online multimedia
may be downloaded or streamed. Streaming multimedia may be live or on-demand.
Multimedia games and simulations may be used in a physical environment with special effects, with
multiple users in an online network, or locally with an offline computer, game system, or simulator.
Multimedia Building Blocks
Any multimedia application consists any or all of the following components:
1. Text: Text and symbols are very important for communication in any medium. With the recent
explosion of the Internet and World Wide Web, text has become more the important than ever.
Web is HTML (Hypertext Markup language) originally designed to display simple text
documents on computer screens, with occasional graphic images.
Audio: Sound is perhaps the most element of multimedia. It can provide the listening pleasure
of music, the startling accent of special effects or the ambience of a mood-setting background.
Images: Images whether represented analog or digital plays a vital role in a multimedia. It is
expressed in the form of still picture, painting or a photograph taken through a digital camera.
Video: Digital video has supplanted analog video as the method of choice for making video for
multimedia use. Video in multimedia are used to portray real time moving pictures in a
multimedia project.
4.1 Audio:
Sound is perhaps the most important element of multimedia. It is meaningful "speech" in any
language, from a whisper to a scream. It can provide the listening pleasure of music, the startling
accent of special effects or the ambience of a mood setting background. Sound is the terminology
used in the analog form, and the digitized form of sound is called as audio.
An audio file format is a file format for storing audio data on a computer system. It can be a raw
bit stream, but it is usually a container format or an audio data format with defined storage layer.
The general approach towards storing digital audio is to sample the audio voltage which, on
playback, would correspond to a certain level of signal in an individual channel with a certain
resolution—the number of bits per sample—in regular intervals (forming the sample rate). This
data can then be stored uncompressed, or compressed to reduce the file size.
It is important to distinguish between a file format and a codec. A codec performs the encoding
and decoding of the raw audio data while the data itself is stored in a file with a specific audio
file format. Most of the publicly documented audio file formats can be created with one of two or
more encoders or codecs. Although most audio file formats support only one type of audio data
(created with an audio coder), a multimedia container format (as MKV or AVI) may support
multiple types of audio and video data.
4.2 Video
Video can be basically understood as a process of displaying still images at a rapid rate giving a
notion of a moving image, which is coupled with perfectly synchronized audio stream. Each such
still image is referred to as a frame.
Modern video file formats interleave audio and video to allow for playing even partially loaded
video stream on the network. And they also employ video compression methodologies to
conserve space. This compression is made possible by using relative references that is to say if
two consecutive frames have almost the same content except for partial changes, it is preferable
to record only the changes and use the reference frame to generate the current frame.
Thus we normally find the frames distinguished as,
I-frame (intra-coded frame) is an infra-coded picture in effect a fully specified picture, like a
conventional static image file. I-frames are pictures coded without reference to any pictures
except themselves. They may be generated by an encoder to create a random access point (to
allow a decoder to start decoding properly from scratch at that picture location). They may also
be generated when differentiating image details prohibit generation of effective P or B frames. Iframes typically require more bits to encode than other picture types. Often, I-frames are used for
random access and are used as references for the decoding of other pictures. Intra refresh periods
of a half-second are common on such applications as digital television broadcast and DVD
storage. Longer refresh periods may be used in some environments.
P-frame (Predicted frames) holds the changes in the image from the previous frame (Ex:
Moving a car across a stationary background, only car’s movement needs to be recorded).
P-frames require the prior decoding of some other picture(s) in order to be decoded. They may
contain both image data and motion vector displacements and combinations of the two. They can
reference previous pictures in decoding order.
In the older standard designs (such as MPEG-2), use only one previously-decoded picture as a
reference during decoding, and require that picture to also precede the P picture in display order. In
H.264, it can use multiple previously-decoded pictures as references during decoding, and can have
any arbitrary display-order relationship relative to the picture(s) used for its prediction. Typically, Pframes require fewer bits for encoding than I-frames do.
B-frame (bi-directional predicted frame) helps specify the content by using differences between
the current and both preceding and following frames.
B-frames require the prior decoding of some other picture(s) in order to be decoded. It may
contain both image data and motion vector displacements and combinations of the two. They
include some prediction modes that form a prediction of a motion region by averaging the
predictions obtained using two different previously-decoded reference regions.
In older standard designs (such as MPEG-2), B pictures are never used as references for the
prediction of other pictures. As a result, a lower quality encoding (resulting in the use of fewer
bits than would otherwise be the case) can be used for such B pictures because the loss of detail
will not harm the prediction quality for subsequent pictures. In H.264, they may or may not be
used as references for the decoding of other pictures.
In older standard designs (such as MPEG-2), use exactly two previously-decoded ^pictures
as references during decoding, and require one of those pictures to precede the B j picture in
display order and the other one to follow it. In H.264, can use one, two, or more than two
previously-decoded pictures as references during decoding, and can have any arbitrary displayorder relationship relative to the picture(s) used for its prediction. Typically, B-frames require
fewer bits for encoding than either I or P frames do. The following figure shows the relationship
among the various frame types mentioned above
4 3 Images
An image (from Latin imago) is an artifact, for example a two-dimensional picture that has a
similar appearance to some subject, usually a physical object or a person. Image file formats are
standardized means of organizing and storing digital images. Image files are composed of either
pixel or vector (geometric) data that are rasterized to pixels when displayed (with few
exceptions) in a vector graphic display. The pixels that constitute an image are ordered as a grid
(columns and rows); each pixel consists of numbers representing magnitudes of brightness and
5.1 Need of secure communication
The requirements of information security within an organization have undergone two major changes
in the last several decades. Before the wide spread use of data processing equipment, the security of
information felt of be valuable to an organization was provided primarily by physical and
administrative means.
With the advent of the computer, the need for automated tools for protecting files and other
information stored on the computer became evident. This is especially the case for a shared system.
5.2 Secure communication
When two entities are communicating with each other, and they do not want a third party to listen to
their communication, then they want to pass on their message in such a way that nobody else can
understand their message. This is known as communicating in a secure manner or secure
communication. Secure communication includes means by which people can share information with
varying degrees of certainty that third parties cannot know what was said. Other than communication
spoken face to face out of possibility of listening, it is probably safe to say that no communication is
guaranteed secure in this sense, although practical limitations such as legislation, resources, technical
issues (interception and encryption), and the sheer volume of communication are limiting factors to
5.3 How security is provided?
There are two major encryption methodologies used to provide secure communication:
Method I: Symmetric Encryption
Encryption methodologies that require the same secret key to encipher and decipher the message are
using what is called private key encryption or symmetric key encryption. Symmetric encryption
methods use mathematical operations that can be programmed into extremely fast computing
algorithms so that the encryption and decryption processes are executed quickly by even small
computers. The challenges that this type of encryption methodology faces is that if either copy of the
key falls into the wrong hands, massages can be decrypted by others and the sender and intended
receiver may not know the message was intercepted. The primary challenge of symmetric key
encryption is getting the key to the receiver, a process that must be conducted out of band to avoid
There are number of popular symmetric encryption cryptosystems. One of the most widely known is
the Data Encryption Standard (DBS). DBS uses a 64-bit block size and 56-bit key. But over the years
DBS has been proven to be easily compromised using a dedicated attack supported by proper
Method II: Asymmetric Encryption
Another category of encryption technique is asymmetric encryption. Whereas the symmetric
encryption system use a single key both encrypted and decrypted a message, asymmetric encryption
uses two different but related keys, and either key can be used to encrypt or decrypt the message. If,
however, key A is used to encrypt the message, only key B can decrypt it, and if key B is used to
encrypt the message, only key A can decrypt it. Asymmetric encryption can be used to provide
elegant solution to problem of security and verification. This technique has its highest value when
one key used as a private key, which means that it is kept secret (much like the key of the symmetric
encryption), known only to the owner of the key pair, and other key serves as a public key, which
means that it is stored in a public location where anyone can use it. This is why the more common
name for asymmetric encryption is public-key encryption.
One of the most popular asymmetric encryption techniques is the RSA algorithm. There are many
such algorithms that are widely in use in the present day.
MP4 follows the ISO base format. The ISO Base Media File Format is designed to contain timed
media information for a presentation in a flexible, extensible format that facilitates interchange,
management, editing, and presentation of the media. This presentation may be 'local' to the
system containing the presentation, or may be via a network or other stream delivery mechanism.
The file structure is object-oriented; a file can be decomposed into constituent objects very
simply, and the structure of the objects inferred directly from their type. The file format is
independent of any particular network protocol while enabling efficient support for them in
general. The ISO Base Media File Format is a base format for media file formats. One such
being MPEG-4 is technically described by the ISO/IEC 14496-12, in accordance with its
compliance to the base format.
The responsibility of maintaining the ISO Base Media File Format rests on WG1 and WG11.
6 .1 Terms and definitions
The following terms and definitions are useful in understanding the file format
Box: Object-oriented building block defined by a unique type identifier and length. It is also
popularly known as 'atom' in some specifications, including the first definition of MP4.
Chunk: Contiguous set of samples for one track
Container Box: whose sole purpose is to contain and group a set of related boxes.
Hint track: Special track which does not contain media data, but instead contains instructions
for packaging one or more tracks into a streaming channel
Hinter: tool that is run on a file containing only media, to add one or more hint tracks to the file
and so facilitate streaming
Movie box: Container box whose sub-boxes define the metadata for a presentation ('moov')
media data box, box which can hold the actual media data for a presentation ('mdat')
Sample: All the data associated with a single timestamp. No two samples within a track can
share the same time-stamp. In non-hint tracks, a sample is, for example, an individual frame of
video, a series of video frames in decoding order, or a compressed section of audio in decoding
order; in hint tracks, a sample defines the formation of one or more streaming packets.
Sample description: Structure which defines and describes the format of some number of
samples in a track
6.2 Object-structured File Organization
Files are formed as a series of objects, called boxes. All data is contained in boxes; there is no other data
within the file. This includes any initial signature required by the specific file format. All files conformant
to the ISO base format are required to contain the File Type Box.
Object Structure
An object in this terminology is a box. Boxes start with a header which gives both size and type. The
header permits compact or extended size (32or 64 bits) and compact or extended types (32 bits or full
Universal Unique Identifiers, i.e. UUIDs). The standard boxes all use compact types (32-bit) and most
boxes will use the compact (32-bit) size. Typically only the Media Data Box (es) need the 64-bit size.The
size is the entire size of the box, including the size and type header, fields, and all contained boxes. This
facilitates general parsing of the file. The definitions of boxes are given in the syntax description language
(SDL). The fields in the objects are stored with the most significant byte first, commonly known as
network byte order or big-endian format. When fields smaller than a byte are defined, or fields span a
byte boundary, the bits are assigned from the most significant bits in each byte to the least significant. For
example, a field of two bits followed by a field of six bits has the two bits in the high order bits of the
aligned(S) class Box (unsigned int(32) boxtype,
optional unsigned int(8)[16] extended_type) {
unsigned int(32) size;
unsigned int(32) type = boxtype; if(size==l) {
unsigned int(64) largesize; } else if (size-- =0) { //box extends to end of file
} if (boxtype -='uuid') { unsigned int(8) [16] usertype = extended _type;
The semantics of these two fields are:
Size is an integer that specifies the number of bytes in this box, including all its fields and
contained boxes; if size is 1 then the actual size is in the field largesize; if size is 0, then this box
is the last one in the file, and its contents extend to the end of the file (normally only used for a
Media Data Box)
Type: identifies the box type; standard boxes use a compact type, which is normally four
printable characters (commonly known as the 4cc or 4 character code), to permit ease of
identification, and is shown so in the boxes below. User extensions use an extended type; in this
case, the type field is set to 'uid'.
Version: is an integer that specifies the version of this format of the box.
Flags: is a map of flags.
The semantics of these two fields are:
aligned(S) class FullBox(unsigned int(32) boxtype, unsigned int(8) v, bit(24)f) extends
Box(boxtype) { unsigned int(8) version = v; bit(24) flags =f;
Many objects also contain a version number and flags field:
It suggested to ignore any boxes with an unrecognized type and is not to be considered for the normal
processing of the video.
Box Order
In order to improve interoperability and utility of the files, the ISO base format specifications requires
following of the following rules and guidelines for the order of boxes:
1. The file type box 'ftyp' shall occur before any variable-length box (e.g. movie, free space, mediadata).
Only a fixed-size box such as a file signature, if required, may precede it.
2. It is strongly recommended that all header boxes be placed first in their container: these boxes are the
Movie Header, Track Header, Media Header, and the specific media headers inside the Media
Information Box (e.g. the Video Media Header).
3. Any Movie Fragment Boxes shall be in sequence order
4. It is recommended that the boxes within the Sample Table Box be in the following order: Sample
Description, Time to Sample, Sample to Chunk, Sample Size, Chunk Offset.
5. It is strongly recommended that the Track Reference Box and Edit List (if any) should precede the
Media Box, and the Handler Reference Box should precede the Media Information Box, and the Data
Information Box should precede the Sample Table Box.
6. It is recommended that user Data Boxes be placed last in their container, which is either the Movie
Box or Track Box.
7. It is recommended that the Movie Fragment Random Access Box, if present, be last in the file.
8. It is recommended that the progressive download information box be placed as early as possible in
files, for maximum utility.
The table shows those boxes that may occur at the top-level in the left-most column; indentation is used
to show possible containment. Thus, for example, a Track Header Box (tkhd) is found in a Track Box
(trak), which is found in a Movie Box (moov). Not all boxes are required to be present in all the files; the
mandatory boxes are marked with an asterisk (*), these mandatory boxes provide the minimal information
necessary for the normal processing and rendering of the file.
User data objects shall be placed only in Movie or Track Boxes, and objects using an extended type may
be placed in a wide variety of containers, not just the top level.