Uploaded by 백승재

mrpatilphd

advertisement
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/338548454
INTELLIGENT ENCODER AND DECODER MODEL FOR ROBUST AND SECURE
AUDIO WATERMARKING Under the Faculty of Engineering
Thesis · January 2009
CITATIONS
READS
0
284
1 author:
Meenakshi R Patil
Jain AGM Institute of Technology Jamkhandi, India
33 PUBLICATIONS 23 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Image Inpainting View project
Audio watermarking View project
All content following this page was uploaded by Meenakshi R Patil on 13 January 2020.
The user has requested enhancement of the downloaded file.
INTELLIGENT ENCODER AND DECODER MODEL
FOR ROBUST AND SECURE AUDIO
WATERMARKING
A Thesis submitted to
SHIVAJI UNIVERSITY, KOLHAPUR
For the Degree of,
DOCTOR OF PHILOSOPHY
In
ELECTRONICS ENGINEERING
(Digital Signal Processing)
Under the Faculty of Engineering
By
MRS. MEENAKSHI RAVINDRA PATIL
Under the Guidance of
DR. MRS. S. D. APTE
Department of Electronics Engineering.
Walchand College of Engineering, Sangli
YEAR 2009
DECLARATION BY GUIDE
This is to certify that the thesis entitled INTELLIGENT ENCODER
AND
DECODER
MODEL
FOR
ROBUST
AND
SECURE
AUDIO
WATERMARKING which is being submitted herewith for the award of
the DEGREE OF DOCTOR OF PHILOSOPHY in Electronics Engineering,
under the Faculty of Engineering of Shivaji University, Kolhapur is
the result of the original Research work completed by MRS.
MEENAKSHI RAVINDRA PATIL under my supervision and guidance
and to the best of my knowledge and belief the work embodied in this
thesis has not formed earlier the basis for the award of any degree or
similar title of this or any other university or examining body.
Place: Kolhapur
Date:
Research Guide
(Dr. Mrs. S. D. Apte)
ii
DECLARATION BY STUDENT
I hereby declare that the thesis entitled INTELLIGENT ENCODER AND
DECODER
MODEL
FOR
ROBUST
AND
SECURE
AUDIO
WATERMARKING completed and written by me has not previously
formed the basis for the award of any Degree or Diploma or other
similar title of this or any other university or examining body.
Place: Kolhapur
Date:
Research Student
(Mrs. Meenakshi Ravindra Patil)
iii
CERTIFICATE
This is to certify that the thesis entitled
INTELLIGENT
ENCODER AND DECODER MODEL FOR ROBUST AND SECURE AUDIO
WATERMARKING which is being submitted herewith for the award of
the Degree of Doctor of Philosophy in Electronics Engineering, under
the Faculty of Engineering of Shivaji University, Kolhapur is the
result of the original Research work completed by MRS. MEENAKSHI
RAVINDRA PATIL to the best of our knowledge and belief the work
embodied in this thesis has not formed earlier the basis for the award
of any degree or similar title of this or any other university or
examining body.
Place: Kolhapur
Date:
Research Guide
(Dr. Mrs. S. D. Apte)
Examiners:
1)
2)
iv
ACKNOWLEDGMENT
I consider myself fortunate to have got an opportunity to work under the
valuable guidance of Dr. Mrs. S. D. Apte. I wish to take this opportunity to convey my
deep gratitude to her for her valuable advice, constant encouragement, keen interest,
painstaking corrections, constructive criticism, and scholarly guidance right from the
suggestion of the topic up to the completion of manuscript.
It is hard finding words to express my gratitude to my husband Mr. Ravindra P.
Patil for everything he has done for me. I thank him for his continuous encouragement,
valuable discussions during the entire research right from beginning till the completion
of this thesis. Without his positive, kind and silent support this work would neither
have been possible nor could it have been completed.
I have no words to express my gratitude’s to His Holiness Swastishree Charukirti
Bhattarak Mahsamiji, Chairman SSDJJP Sangha, Shravanbelgola, Dr. J. J. Magdum,
honorable Chairman Dr. J. J. Magdum Trust, Jaysingpur, and Shri. Vijayraj Magdum,
honorable Executive Director, Dr. J. J. Magdum College of Engineering, Jaysingpur for
their kind support.
I am grateful to the Prof. R. P.Pungin, Former Principal Bahubali College of
Engg. Shravanbelgola, for his guidance and encouragement especially during the early
stages of this research. I am also grateful to Prof Mrs. R.S. Patil, Former H.O.D.
Electronics department, Walchand College of Engineering Sangli for her guidance and
encouragement. Due to her support only I could able to complete my post graduate and
later registered for PhD.
I extend my sincere thanks to Prof. G. M. Ravannavar, Vice Principal Bahubali
college of Engineering, Shravanbelgola, Dr. D. S. Bormane, Principal RSCOE Pune, Dr.
P. R. Mutagi, Principal Dr. J. J. Magdum College of Engineering, Jaysingpur and Prof.
A.K. Gupta, Vice Principal Dr. J. J. Magdum College of Engineering Jaysingpur for
providing the facilities to carry out the research at their institutes while my working in
the respective institutes during the research.
My Special thanks to Dr. Mrs. A. A. Agashe and Dr. A. S. Yadav for their
valuable suggestions and help during the thesis writing.
I am thankful to my colleagues and staff members from Bahubali college of
Engineering, Shravanbelgola for their support and good wishes during the early stages
of my research. I am also thankful to my colleagues and staff members from
v
Electronics, Electronics and communication Dept. and Electrical department of Dr. J. J.
Magdum College of Engineering Jaysingpur for their good wishes.
Last but not the least; I acknowledge with affection my indebtedness to my kids
Yukta and Abhishek, my parents, parents in law, brother, sister, sister in laws, brother
in laws and all my family members who have provided me the prolonged
encouragement and all possible comfort besides showing active interest in the work.
Mrs. Meenakshi Ravindra Patil
vi
List of Contributions
This thesis is based on the eleven original papers (Appendices I–XI) which are
referred in the text by Roman numerals. All analysis and simulation results presented in
publications or this thesis have been produced by the author. Professor Mrs. S. D. Apte
gave guidance and needed expertise in signal processing methods. She had an important
role in the development of the initial ideas and shaping of the final outline of the
publications.
Conference Publications
I)
“DWT based Image Watermarking” at international conference ICWCN 2003
organized by S.S.N.C.O.E., Kalavakkam, Chennai during 27-28 June 2003.
II)
“DWT based Audio Watermarking” at international conference ICWCN 2003
organized by S.S.N.C.O.E., Kalavakkam, Chennai during 27-28 June 2003.
III)
“Audio Watermarking for covert communication” in 12th annual Symposium on
mobile computing and applications organized by IEEE Bangalore section
November 2004.
IV)
“Insight on Audio Watermarking” at national conference NC2006 organized by
Padre Conceicao College of Engineering Verna – Goa during 14-15 September
2006
V)
“Performance analysis of Audio Watermarking in Wavelet Domain” at
international conference RACE organized by Engineering College Bikaner during
24-25 March 2007
VI)
“SNR based audio watermarking in Wavelet domain” International conference
ICCR-08 April 2008 Mysore.
International Journals/Digital library publications
VII)
“SNR based audio watermarking Scheme for blind detection” International
conference ICETET - 08, July 2008 Nagpur, published by IEEE Computer Digital
Library.
vii
VIII) “SNR based Spread Spectrum Watermarking using DWT and Lifting Wavelet
Transform” in International Journal IJCRYPT October 2008.
IX)
“Adaptive Spread Spectrum Audio Watermarking for Indian Musical Signals by
Low Frequency Modification” in proc. IEEE international conference ASID 2009
held in 22-25 August 2009 at Hongkong.
X)
“Adaptive Spread spectrum audio watermarking” in ICFAI Journal on
Information technology, September 2009.
XI)
“Adaptive Audio Watermarking for Indian Musical Signals by GOS
Modification” selected in IEEE International conference TECNCON 2009 held
on 23-26 November 2009 at Singapore.
Publications VIII and X are International Journal publications. Publications VII,
IX and XI are published by IEEE Digital library and are available on line on
www.ieeexplore.org.
viii
ABSTRACT
Recent advancement in digital technology for broadband communication and
multimedia data distribution in digital format opened many challenges and opportunities
for researchers. Simple-to-use software and decreasing prices of digital devices have
made it possible for consumers from all around the world to create and exchange
multimedia data. Broadband Internet connections and error-free transmission of data
facilitate people to distribute large multimedia files and make identical digital copies of
them. A perfect reproduction in digital domain has promoted the protection of intellectual
ownership and the prevention of unauthorized tampering of multimedia data to become
an important technological and research issue.
Digital watermarking has been proposed as a new, alternative method to enforce
intellectual property rights and protect digital media from tampering. Digital
watermarking is defined as technique which directly embeds and extracts the data from
the host signal. The main challenge in digital audio watermarking is that if the perceptual
transparency parameter is fixed, the design of a watermark system cannot obtain high
robustness and a high watermark data rate at the same time.
In this thesis, we address the research problem on audio watermarking. First, what
is the imperceptibility of the watermarked data and how to measure it? Second, how can
the detection performances of a watermarking system are improved for blind detection?
Third, whether the system is robust to different signal processing attacks? Is it possible to
increase the robustness trough attack characterization? An approach that combined
theoretical consideration and experimental validation, including digital signal processing,
is used in developing algorithms for audio watermarking.
The results of this study are the development and implementation of audio
watermarking algorithms. The algorithms performance is validated in the presence of the
standard watermarking attacks. The thesis also includes a thorough review of the
literature in the digital audio watermarking.
ix
List of abbreviation and Symbols
1-D
One dimension
A/D
Analog to digital
AOAA
Average of absolute amplitudes
AWGN
Additive white Gaussian noise
BEP
Bit error probability
BER
Bit error rate
CD
Compact Disk
CMF
Conjugate Mirror Filters
CPU
Central processing unit
CWT
Continuous wavelet transform
D/A
Digital to analog
dB
Decibels
db2
Daubechies wavelet
DCT
Discrete Cosine transform
DSSS
Direct sequence spread spectrum
DWT
Discrete wavelet transforms
FFT
Fast Fourier transform
GOS
Group of samples
HAS
Human auditory system
HRT
Hough-Radon transform
HVS
Human Visible system
IFPI
International Federation of the Phonographic
Industry
IID
Independent, identically distributed
IMF
Instantaneous mean frequency
ISS
Improved Spread Spectrum
JND
Just noticeable difference
LP
Low pass
LSB
Least Significant bits
x
LWT
Lifting wavelet transform
MASK
Modified Signal Keying
Mp3
MPEG -1 Compression, Layer III
MPEG
Moving Picture Experts group
MSE
Mean-squared error
NMR
Noise to mask ratio
PDF
Probability density function
PN
Pseudo Noise
PRN
Pseudo random Number
SDMI
Secure Digital Music Initiative
SMR
Signal to mask ratio
SNR
Signal to Noise ratio
SPL
Sound pressure level
SS
Spread Spectrum
ST-DM
Spread-transform dither modulation
SVM
Support vector machines
SVR
Support vector regression
TF
Time frequency
THD1
Threshold 1
TSM
Time scale modification
WER
Word error rate
WMSE
Weighted mean-squared error
List of symbols
m
Input message
w ′′
Recovered watermark
D Lk (i)
Lth level detail coefficient
′
D Lk (i)
Modified Lth level detail coefficients by the pseudorandom signal r(i)
AkL (i )
DCT coefficient of ca3 coefficient of Lth level DWT of xk (i )
xi
xk (i )
kth segment of the host audio.
Ak′ L (i )
Modified Lth level DCT coefficient by pseudorandom signal r(i)
E in
Absolute average of nth section
Emax
Max. AOAA
E min
Min AOAA
Emid
Middle value of AOAA
wr
Recovered watermark
Pb
Bit error probability
Pr
Error probability that b n ≠ b̂ n
ŵ
Recovered watermark
ω
Highest frequency
α
Scaling parameter
δ
Discrimination parameter used to modify mean
σn
Variance of noise introduced
σr
Variance of r
σu
Variance of signal u
σx
Variance of host audio
a(n)
Coarse approximation
A,B
Difference mean
A´, B´
Means of received signal
b
Bits to be transmitted by watermarking process
ca3
Third level coarse approximation coefficients of xk (i )
Co
Host audio signal
Cw
Watermarked signal
Cwn
The received watermarked signal after noise addition
d[n]
Detail information
d3
Third level detailed wavelet coefficients
f(x)
Signal in which watermark to be embedded
xii
G0
Low-pass filter
G1, H1
Synthesis filter/ Hypothesis 1
H0
High-pass filter/Hypothesis 0
k
Secret key
k
Segment number
K
Imperceptibility parameter
L
Level of decomposition of signal using wavelets
L1, L2
Section lengths
M
Total no. of watermark bits
M1× M2
Size of the watermark, row pixel no. × Column pixel no.
mr
Mean of r
N
Length of the segment, length of PN sequence
p
Error probability
r
Sufficient statistic required to detect the mark
r(i)
PN- sequence used to embed the watermark in x using SS method
s
Signal after embedding the watermark bit
u
Zero mean PN – sequence to be added in to signal x
w
Original watermark
w’
Scaled watermark
Wa
Encoded watermark
wk
Watermark key
x
Original host audio signal
y
Watermarked signal, received vector
Y(i)
Watermarked signal
xiii
List of figures and tables
Figures
Figure
Title of Figure
No.
Page
No.
1.1
Block diagram of encoder
3
3.1
A watermarking system & an equivalent communications model
31
4.1
Magic triangle, three contradictory requirements of watermarking
35
4.2
Frequency masking in the human auditory system
38
4.3
Signal-to-mask ratio & Signal to Noise ratio values
39
4.4
Temporal masking in the human auditory system
40
4.5
Mallat-tree decomposition
41
4.6
Reconstruction of original signal from wavelet coefficient
42
4.7
Watermark embedding
45
4.8
Watermark extraction
46
4.9
Original Host audio
47
4.10
Watermarked Signal
47
4.11
Original Watermark
47
4.12
Recovered watermark
48
5.1
General model of SS-based Watermarking
54
5.2
Error probability as a function of the SNR
56
5.3
Watermark embedding Process
59
5.4
Watermark extraction
59
5.5
a) Original host audio b) watermarked audio
60
5.6
a) Original watermark b) recovered Watermark
61
5.7
Results for SNR based scheme for non blind detection
62
5.8
Watermark embedding process for proposed adaptive SNR based
66
blind tech. in DWT/LWT domain
5.9
Watermark extraction for adaptive SNR based blind technique in
67
DWT/LWT domain.
5.10
Original watermark
68
xiv
5.11
Results for SNR based scheme for blind detection in DWT domain
68
5.12
Results for SNR based scheme for blind detection in LWT domain
68
5.13
Watermark embedding process for proposed adaptive SNR based
73
blind technique in DWT-DCT domain
5.14
Watermark extraction for adaptive SNR based blind technique in
74
DWT_DCT domain.
5.15
Results for SNR based scheme for blind detection in DWT-DCT
75
domain
5.16
PDF of SNR for various keys
76
5.17
Improved encoder and decoder for blind watermarking using
78
cyclic coding
6.1
Block schematic of GOS based watermark embedding in DWT
86
domain
6.2
Block schematic of GOS based watermark extraction in DWT
87
domain
6.3
a) SNR Vs K without attack b) BER Vs K
88
6.4
Results of robustness test recovered watermark DWT domain
89
6.5
Block schematic of GOS based watermark embedding in DCT
91
domain
6.6
Block schematic of GOS based watermark extraction in DCT
91
domain
6.7
Results of robustness test recovered watermark DCT domain
92
6.8
Block schematic of GOS based watermark embedding in DWT-
94
DCT domain
6.9
Block schematic of GOS based watermark embedding in DWT-
95
DCT domain
6.10
Results of robustness test recovered watermark signal
95
6.11
Improved encoder decoder for GOS based blind watermarking
97
using cyclic codes
7.1
A watermarking system & an equivalent communication model
104
xv
7.2
7.3
Intelligent encoder model for proposed adaptive SNR based blind
technique in DWT-DCT domain
Decoder model for adaptive SNR based blind technique in DWT-
108
109
DCT domain.
7.4
Block schematic of GOS based intelligent encoder of watermark in 112
DWT-DCT domain.
7.5
Block schematic of GOS intelligent decoder of watermark in
112
DWT-DCT domain
TABLES
Table
Table details
No.
Page
No.
1.1
Four categories of Information hiding technique
4
4.1
Subjective listening test for MP3 song
48
4.2
SNR of watermarked signal & BER of extracted watermark signal
50
4.3
Robustness test results for MP3 song
50
5.1
Experimental results against signal processing attack for non blind
62
tech.
5.2
Results after signal processing attacks
a)SNR between original signal and watermarked signal after
69
attack for three schemes
b)Correlation coefficient between original watermark and recovered
69
watermark for three schemes
5.3
Relation between segment length, parameter α(k), observed SNR
70
and correlation coefficient
5.4
SNR between original audio signal and watermarked signal BER at
75
recovered watermark
5.5
Comparison chart of spread audio watermarking technique
77
implemented in this chapter
5.6
Results obtained for (6,4) cyclic codes
78
5.7
Results obtained for (7,4) cyclic codes
78
xvi
6.1
SNR between original audio signal and BER of recovered
89
watermark for various musical signal in DWT domain
6.2
SNR between original audio signal and BER of recovered
92
watermark for various musical signal in DCT domain
6.3
Relation between segment length variation and No. of coeff.
93
Modified with SNR and BER
6.4
SNR between original signal and watermarked signal, BER of
95
recovered watermark results in DWT-DCT domain
6.5
Comparison between the proposed GOS based scheme and proposed
96
SS based scheme
6.6
Results obtained for (7,4) cyclic codes
97
6.7
Comparison chart of various watermarking algorithms
99
6.8
Comparison chart for various blind audio watermarking techniques
100
7.1
Robustness results after attack characterization (SS method)
111
7 .2
Robustness test result after attack characterization(GOS method)
113
xvii
Contents
Declaration by Guide…………………………………………………………... ii
Declaration by Student………………………………………………………… iii
Certificate……………………………………………………………………….. iv
Acknowledgement……………………………………………………………… v
List of contributions…………………………………………………………..... vii
Abstract…………………………………………………………………………. ix
List of symbols and abbreviation…………………………………………….. x
List of Figures and tables……………………………………………………... xiv
Contents………………………………………………………………………... xviii
1. Introduction…………………………………………………………………
1
1.1 Basic functionalities of watermarking schemes………………………
4
1.1.1 Perceptibility……………………………………………………..
4
1.1.2 Level of reliability………………………………………………..
5
1.1.3 Capacity…………………………………………………………..
6
1.1.4 Speed……………………………………………………………...
6
1.1.5. Statistical undetectability……………………………………….
6
1.1.6 Asymmetry……………………………………………………….
6
1.2 Evaluation of Schemes…………………………………………………..
7
1.2.1 Perceptibility……………………………………………………..
8
1.2.2 Robustness………………………………………………………..
8
1.2.3 Capacity…………………………………………………………..
10
1.2.4 Speed……………………………………………………………...
10
1.2.5 Statistical undetectability………………………………………..
11
1.3 organization of thesis……………………………………………………
11
2. Literature Survey…………………………………………………………...
13
2.1 Spread spectrum audio watermarking………………………………...
13
2.2 Methods using patch work algorithm………………………………….
16
2.3 Methods implemented in time domain………………………………...
17
xviii
2.4 Methods implemented in transform domain…………………………..
18
2.5 Other recently developed algorithms…………………………………..
21
2.6 Audio watermarking techniques against time scale modifications…..
23
2.7 Papers studied on performance analysis and evaluation of
25
watermarking Systems…………………………………………………...
2.8 Watermark attacks……………………………………………………...
27
2.9 Research problems identified…………………………………………..
28
2.10 Concluding remarks…………………………………………………...
29
3. Research problem…………………………………………………………...
31
3.1 Summary of chapter……………………………………………………
33
4. High capacity covert communication for Audio…………………………..
35
4.1 Overview of the properties of HAS …………………………………...
36
4.2 Discrete wavelet transform…………………………………………….
40
4.2.1 Conditions for perfect reconstruction…………………………..
42
4.2.2 Classification of wavelets………………………………………..
43
4.2.2.1 Features of orthogonal wavelet filter banks…………...
43
4.2.2.2 Features of biorthogonal wavelet filter banks………...
43
4.3 Audio watermarking for Covert Communication……………………
44
4.4 Results of high capacity covert communication technique…………..
46
4.4.1 Subjective Listening test………………………………………...
48
4.4.2 Robustness test…………………………………………………...
48
4.5 Summary of chapter…………………………………………………….
51
5. Spread Spectrum Audio watermarking algorithms………………………
53
5.1 Conventional spread spectrum method of watermarking…………...
54
5.2 Adaptive SNR based non blind watermarking technique in wavelet
domain ………………………………………………………………….
5.3 Proposed adaptive SNR based blind watermarking using DWT/
Lifting wavelet transform……...............................................................
5.3.1 Watermark embedding…………………………………………..
56
64
5.3.2 Watermark extraction…………………………………………...
66
5.3.3 Experimental results …………………………………………….
67
63
xix
5.3.4 Selection criteria for value of SNR in computing α(k) and
selection criteria for segment length N …………………………
5.4 Proposed adaptive SNR based spread spectrum scheme in DWTDCT ……………………………………………………………………..
5.4.1 Watermark embedding…………………………………………..
72
5.4.2 Watermark extraction…………………………………………...
73
5.4.3 Experimental results……………………………………………...
74
70
71
5.5 Proposed SNR based blind technique using cyclic coding …………...
77
5.6 Summary of chapter……………………………………………………
79
6. Adaptive watermarking by GOS modification…………………………..
81
6.1 Introduction to audio watermarking technique based on GOS
82
modification in time domain………………………………………….
6.1.1 Rules of watermark embedding………………………………...
83
6.1.2 Watermark extraction…………………………………………...
83
6.2 Proposed adaptive watermarking using GOS modifications in
84
transform
domain…………………………………………………………………...
6.2.1 Proposed blind watermarking using GOS modification in
DWT domain
6.2.2 Proposed blind watermarking using GOS modification in DCT
domain
6.2.3 Proposed blind watermarking using GOS modification in
DWT-DCT domain
6.3 Proposed GOS based blind technique using cyclic coding
85
90
94
96
6.4 Comparison of proposed method with well known watermarking
algorithms
6.5 summary of chapter…………………………………………………….
101
7. Intelligent encoder and decoder modeling ………………………………..
103
7.1. Basic model of watermarking………………………………………...
103
7.2 Method-1: Proposed Intelligent encoder and decoder model for
robust and secure audio watermarking based on Spread Spectrum
7.3 Method-2: Proposed Intelligent encoder and decoder model for
robust and secure audio watermarking based on GOS modification
7.4 Summary of chapter……………………………………………….
107
8. Discussion and Conclusion………………………………………….
98
111
113
115
xx
8.1 Discussion and conclusion………………………………………………
115
8.2 Main contribution of the present research…………………………….
120
8.3 Future scope……………………………………………………………..
120
9. References……………………………………………………………………
121
Appendix (Details of Published papers)…..…………………………………..
129
xxi
Chapter 1
Introduction
Chapter 1 Introduction
Introduction to Watermarking
In 1954, Emil Hembrooke of the Muzac Corporation filed a patent entitled
“Identification of sound and like signals” which described a method for imperceptibly
embedding an identification code into music for the purpose of proving ownership.
The patent states “The present invention makes possible the positive identification of
the origin of a musical presentation and thereby constitutes an effective means of
preventing piracy. Electronic watermarking had been invented.
Presently there is increased usage of Internet and multimedia information. The
availability of different multimedia editing software and the ease with which
multimedia signal are manipulated have opened many challenges and opportunities
for the researchers.
A possibility for unlimited copying without a loss of fidelity causes a
considerable financial loss for copyright holders. The ease of content modification
and a perfect reproduction in digital domain have promoted the protection of
intellectual ownership and the prevention of the unauthorized tampering of
multimedia data to become an important technological and research issue
A wide use of multimedia data combined with a fast delivery of multimedia to
users having different devices with a fixed quality of service is becoming a
challenging and important topic. Traditional methods for copyright protection of
multimedia data are not sufficient. Hardware-based copy protection systems have
already been easily circumvented for analogue media. Hacking of digital media
systems is even easier due to the availability of general multimedia processing
platforms, e.g. a personal computer. Simple protection mechanisms that were based
on the information embedded into header bits of the digital file are not useful because
header information can easily be removed by a simple change of data format, which
does not affect the fidelity of media.
Encryption of digital multimedia prevents access of the multimedia content to
an individual without a proper decryption key. Therefore, content providers get paid
for the delivery of perceivable multimedia, and each client that has paid the royalties
must be able to decrypt a received file properly. Once the multimedia has been
decrypted, it can be repeatedly copied and distributed without any obstacles. Modern
1
software and broadband Internet provides the tools to perform it quickly and without
much effort and deep technical knowledge. It is clear that existing security protocols
for electronic commerce serve to secure only the communication channel between the
content provider and the user and are useless if commodity in transactions is digitally
represented.
From a business perspective, the question is whether watermarking can
provide economic solutions to real problems. Current business interest is focused on a
number of applications that broadly fall into the categories of security and device
control. From a security perspective, there has been criticism that many proposed
watermark security solutions are “weak”, i.e. it is relatively straightforward to
circumvent the security system. While this is true, there are many business
applications where “weak” security is preferable to no security. Therefore it is
expected that businesses will deploy a number of security applications based on
watermarking [1]. In addition, many device control applications have no security
requirement, since there is no motivation to remove the watermark. Device control,
particularly as it pertains to the linking of traditional media to the Web, is receiving
increased attention from businesses and this interest will increase.
From an academic perspective, the question is whether watermarking
introduces new and interesting problems for basic and applied research.
Watermarking is an interdisciplinary study that draws experts from communications,
cryptography and audio and image processing. Interesting new problems have been
posed in each of these disciplines based on the unique requirements of watermarking
applications. Commercial implementations of watermarking must meet difficult and
often conflicting economic and engineering constraints.
Digital watermarking has been proposed as a new, alternative method to
enforce the intellectual property rights and protect digital media from tampering. It
involves a process of embedding the digital signature into a host signal in order to
"mark" the ownership. The digital signature is called the digital watermark. The
digital watermark contains data that can be used in various applications, including
digital rights management, broadcast monitoring and tamper proofing. The existence
of the watermark is indicated when watermarked media is passed through an
appropriate watermark detector.
2
Watermark
Host
Audio
Watermark
Embedder
Secret Key
Watermarked
Audio
Watermark
Detecter
Detected
Watermark
Secret Key
Fig1.1 Block diagram of encoder
Figure 1.1 gives an overview of the general watermarking system. A
watermark, which usually consists of a binary data sequence, is inserted into the host
signal in the watermark embedder. Thus, a watermark embedder has two inputs; one
is the watermark message accompanied by a secret key and the other is the host signal
(e.g. image, video clip, audio sequence etc.). The output of the watermark embedder
is the watermarked signal, which cannot be perceptually discriminated from the host
signal.
The watermarked signal is then usually recorded or broadcasted and later
presented to the watermark detector. The detector determines whether the watermark
is present in the tested multimedia signal, and if so, will decode the message. The
research area of watermarking is closely related to the fields of information hiding
and steganography. The three fields have a considerable overlap and many common
technical solutions. However, there are some fundamental philosophical differences
that influence the requirements and therefore the design of a particular technical
solution. Information hiding (or data hiding) is a more general area, encompassing a
wider range of problems than the watermarking. The term hiding refers to the process
of making the information imperceptible or keeping the existence of the information
secret. Steganography is a word derived from the ancient Greek words steganos,
which means covered and graphia, which in turn means writing. It is an art of
concealed communication. Therefore, we can define watermarking systems as
systems in which the hidden message is related to the host signal and nonwatermarking systems in which the message is unrelated to the host signal. On the
other hand, systems for embedding messages into host signals can be divided into
steganographic systems, in which the existence of the message is kept secret, and nonsteganographic systems, in which the presence of the embedded message does not
have to be secret. Division of the information hiding systems into four categories is
given in Table 1.1.
3
Message
Hidden
Message
Known
Table1.1 Four categories of Information hiding techniques
Host Signal Dependent Message
Host Signal Independent Message
Covert Communication
Steganographic Watermarking
Non-steganographic Watermarking
Covert Embedded Communications
The primary focus of this thesis is the watermarking of digital audio (i.e.,
audio watermarking). The watermarking algorithms were primarily developed for
digital images and video sequences [1-11]; interest and research in audio
watermarking started slightly later. In the past few years, several algorithms[12-33]
for the embedding and extraction of watermarks in audio sequences have been
presented. All of the developed algorithms take advantage of the perceptual properties
of the human auditory system (HAS) in order to add a watermark into a host signal in
a perceptually transparent manner. Embedding additional information into audio
sequences is a more tedious task than that of images, due to dynamic supremacy of
the HAS over human visual system. In addition, the amount of data that can be
embedded transparently into an audio sequence is considerably lower than the amount
of data that can be hidden in video sequences as an audio signal has a dimension less
than two-dimensional video files. On the other hand, many attacks that are malicious
against image watermarking algorithms (e.g. geometrical distortions, spatial scaling,
etc.) cannot be implemented against audio watermarking schemes.
1.1. Basic Functionalities of Watermarking Scheme
The objectives of the scheme and its operational environment dictate several
immediate constraints (a set of minimal requirements) on the algorithm. In the case of
automated radio monitoring, for example, the watermark should clearly withstand
distortions introduced by the radio channel. Similarly, in the case of MPEG video
broadcast, the watermark detector must be fast (to allow real-time detection) and
simple in terms of number gates required for hardware implementation. One or more
of the following general functionalities can be used.
1.1.1.
Perceptibility
The perceived quality of the medium after watermark embedding should be
imperceptible. In the case of digital watermarking in images there should not be any
4
visual difference between the original image and the watermarked image. For audio
watermarking schemes the embedded watermark should be inaudible.
1.1.2.
Level of Reliability
There are two main aspects to reliability: Robustness and false negatives
occur when the content was previously marked but the mark could not be detected.
The threats centered on signal modification are robustness issues. Robustness can
range from no modification at all to destruction of the signal. (Complete destruction
may be too stringent a requirement. Actually, it is not clear what it means. Instead one
could agree on a particular quality measure and a maximum quality loss value.) This
requirement separates watermarking from other forms of data hiding (typically
steganography). Without robustness, the information could simply be stored as a
separate attribute. Robustness remains a very general functionality as it may have
different meanings depending on the purpose of the scheme. If the purpose is image
integrity (tamper evidence), the watermark extractor should have a different output
after small changes have been made to the image while the same changes should not
affect a copyright mark. In fact, one may distinguish at least the following main
categories of robustness:
1. The threats centered on modifying the signal to disable the watermark (typically a
copyright mark), willfully or not, remain the focus of many research articles that
propose new attacks. By “disabling a watermark” means making it useless or
removing it.
2. The threats centered on tampering of the signal by unauthorized parties to change
the semantic of the signal are an integrity issue. Modification can range from the
modification of court evidences to the modification of photos used in newspapers or
clinical images.
3. The threats centered on anonymously distributing illegal copies of marked work are
a traitor-tracing issue and are mainly addressed by cryptographic solutions.
4. Watermark cascading—that is, the ability to embed a watermark into an
audiovisual signal that has been already marked—requires a special kind of
robustness. The order in which the marks are embedded is important because different
types of marks may be embedded in the same signal. For example, one may embed a
public and a private watermark (to simulate asymmetric watermarking) or a strong
public watermark together with a tamper evidence watermark. As a consequence, the
5
evaluation procedure must take into account the second watermarking scheme when
testing the first one.
At last, false positives occur whenever the detected watermark differs from the
mark that was actually embedded. The detector could find a mark A in a signal where
no mark was previously hidden, in a signal where a mark B was actually hidden with
the same scheme, where a mark B was hidden with another scheme.
1.1.3.
Capacity
Knowing how much information can reliably be hidden in the signal is very
important to users, especially when the scheme gives them the ability to change this
amount. Knowing the watermarking access unit (or granularity) is also very
important; indeed, spreading the mark over a full sound track prevents audio
streaming, for instance. (A “watermark access unit” is the smallest part of a cover
signal in which a watermark can be reliably detected and the payload extracted.)
1.1.4.
Speed
Some applications require real-time embedding and/or real time detection.
Ultimately for such application the implemented scheme should be able to embed and
detect the embedded watermark with high speed.
1.1.5.
Statistical Undetectability
For some private watermarking systems—that is, a scheme requiring the
original signal—one may wish to have a perfectly hidden watermark. In this case it
should not be possible for an attacker to find any significant statistical differences
between an unmarked signal and a marked signal. As a consequence an attacker could
never know whether an attack succeeded or not. Note that this option is mandatory for
steganographic systems.
1.1.6.
Asymmetry
Private-key watermarking algorithms require the same secret key both for
embedding and extraction. They may not be good enough if the secret key has to be
embedded in every watermark detector (that may be found in any consumer electronic
or multimedia player software); then malicious attackers may extract it and post it to
the Internet allowing anyone to remove the mark. In these cases the party that embeds
a mark may wish to allow another party to check its presence without revealing its
embedding key. This can be achieved using asymmetric techniques. Unfortunately,
6
robust asymmetric systems are currently unknown, and the current solution (which
does not fully solve the problem) is to embed two marks: a private one and a public
one.
Other functionality classes may be defined but the ones listed above seem to
include most requirements used in the recent literature. The first three functionalities
are strongly linked together, and the choice of any two of them imposes the third one.
In fact, when considering the three-parameter watermarking model (perceptibility,
capacity, and reliability), the most important parameter to keep is the imperceptibility.
(“Capacity” is the bit size of a payload that a watermark access unit can carry.) Then
two approaches can be considered: emphasize capacity over robustness or favor
robustness at the expense of low capacity. This clearly depends on the purpose of the
marking scheme, and this should be reflected in the way the system is evaluated.
1.2. Evaluation of Watermarking Scheme
A full scheme is defined as a collection of functionality services to which a
level of assurance is globally applied and for each of which a specific level of strength
is selected. So a proper evaluation has to ensure that all the selected requirements are
met to a certain level of assurance.
The number of levels of assurance cannot be justified precisely. On the one
hand, it should be clear that a large number of them make the evaluation very
complicated and unusable for particular purposes. On the other hand, too few levels
prevent scheme providers from finding an evaluation close enough to their needs. We
are also limited by the accuracy of the methods available for rating. Information
technology security evaluation has been using six or seven levels for the reasons it is
just mentioned above but also for historical reasons. This seems to be a reasonable
number for robustness evaluation. For perceptibility we preferred to use fewer levels
and, hence, follow more or less the market segmentation for electronic equipment.
Moreover, given the roughness of existing quality metrics it is hard to see how one
could reasonably increase the number of assurance levels.
Following section discuss possible methods to evaluate the functionalities
listed above.
7
1.2.1.
Perceptibility
Perceptibility can be assessed to different levels of assurance. The problem
here is very similar to the evaluation of compression algorithms. The watermark could
be just slightly perceptible but not perceptible under domestic/ consumer
viewing/listening conditions. Another level is nonperceptibility in comparison with
the original under studio conditions. Finally, the best assurance is obtained when the
watermarked media are assessed by a panel of individuals who are asked to look or
listen carefully at the media under the above conditions.
As it is stated, however, this cannot be automated, and one may wish to use
less stringent levels. In fact, various levels of assurance can also be achieved by using
various quality measures based on human perceptual models. Since there are various
models and metrics available, an average of them could be used. Current metrics do
not really take into account geometric distortions, which remain a challenging attack
against many watermarking schemes.
1.2.2.
Robustness
The robustness can be assessed by measuring the detection probability of the
mark and the bit error rate for a set of criteria that are relevant for the application that
is considered. The levels of robustness range from no robustness to provable
robustness. For level zero, no special robustness features have been added to the
scheme apart from the one needed to fulfill the basic constraints imposed by the
purpose and operational environment of the scheme. So if we go back to the radiomonitoring example, the minimal robustness feature should make sure that the mark
survives the distortions of the radio link in normal conditions. The low level
corresponds to some extra robustness features added but which can be circumvented
using simple and cheap tools publicly available. These features are provided to
prevent “honest” people from disabling the mark during normal use of the work. In
the case of watermarks used to identify owners of photographs, the end users should
be able to save and compress the photo, resize it, and crop it without removing the
mark. Moderate robustness is achieved when more expensive tools are required, as
well as some basic knowledge on watermarking. So if we use the previous example,
the end user would need tools such as Adobe Photoshop and apply more processing to
the image to disable the mark.
8
Moderately high means tools are available but special skills and knowledge
are required and attempts may be unsuccessful. Several attempts and operations may
be required and one may have to work on the approach. High robustness means all
known attempts have been unsuccessful. Some research by a team of specialists is
necessary. The cost of the attempt may be much higher than what it is worth and its
success is uncertain. Provable robustness means it should be computationally (or even
more stringent: theoretically) infeasible for a willful opponent to disable the mark.
This is similar to what we found for cryptography where some algorithms are based
on some difficult mathematical problem.
The first levels of robustness can be assessed automatically by applying a
simple benchmark algorithm:
For each medium in a determined set:
1. Embed a random payload with the greatest strength that does not introduce
annoying effects. In other words, embed the mark such that the quality of the output
for a given quality metric is greater than a given minima.
2. Apply a set of given transformations to the marked medium.
For each distorted medium try to extract the watermark and measure the certainty of
extraction. Simple methods may just use a success/failure approach, that is, to
consider the extraction successful if and only if the payload is fully recovered without
error. The measure for the robustness is the certainty of detection or the bit error rate
after extraction.
This procedure must be repeated several times since the hidden information is
random and a test may be successful by chance. Levels of robustness differ by the
number and strength of attacks applied and the number of media on which they are
measured. The set of test and media will also depend on the purpose of the
watermarking scheme and are defined in evaluation profiles. For example, schemes
used in medical systems need only to be tested on medical images while
watermarking algorithms for owner identification have to be tested on a large panel of
images.
The first levels of robustness can be defined using a finite and precise set of
robustness criteria (e.g., S.D.M.I., IFPI, or E.B.U. requirements) and one just need to
check them.
9
False Positives
False positives are difficult to measure, and current solutions use a model to
estimate their rate. This has two major problems: first, “real world” watermarking
schemes are difficult to model accurately, and second, modeling the scheme requires
access to details of the algorithm. Despite the fact that not publishing algorithms
breaches Kerckhoffs’ principles, details of algorithms are still considered trade
secrets, and getting access to them is not always possible. (In 1883, Auguste
Kerckhoffs enunciated the first principles of cryptographic engineering, in which he
advises that we assume the method used to encipher data is known to the opponent, so
security must lie only in the choice of key. The history of cryptology since then has
repeatedly shown the folly of “security- by-obscurity”—the assumption that the
enemy will remain ignorant of the system in use.)
So one way to estimate the false-alarm rate is to count the number of false
alarms using large sample of data. This may turn out to be another very difficult
problem, as some applications require one error in 108 or even 1012.
1.2.3.
Capacity
In most applications the capacity will be a fixed constraint of the system so
robustness testing will be done with a random payload of a given size. While
developing a watermarking scheme, however, knowing the tradeoff between the basic
requirements is very useful and graphing with two varying requirements—the others
being fixed—is a simple way to achieve this. In the basic three-parameter
watermarking model, for example, one can study the relationship between robustness
and strength of the attack when the quality of the watermarked medium is fixed
between the strength of the attack and the visual quality or between the robustness and
the visual quality [6].
This is useful from a user’s point of view: the performance is fixed (we want
only 5% of the bits to be corrupted so we can use error correction codes to recover all
the information we wanted to hide) and so it helps to define what kind of attacks the
scheme will survive if the user accepts such or such quality degradation.
1.2.4.
Speed
Speed is dependent on the type of implementation: software or hardware. The
complexity is an important criteria and some applications impose a limitation on the
maximum number of gates that can be used, the amount of required memory, etc. For
10
a software implementation, success also depends very much on the hardware used to
run it but comparing performance results obtained on the same platform (usually the
typical platform of end users) provides a reliable measure.
1.2.5.
Statistical Undetectability
All methods of steganography and watermarking substitute part of the cover
signal, which has some particular statistical properties, with another signal with
different statistical properties; in fact, embedding processes usually do not pay
attention to the difference in statistical properties between the original cover signal
and the stegosignal. This leads to possible detection attacks. As for false positives,
evaluating such functionality is not trivial but fortunately very few watermarking
schemes require it, so we will not consider it in the next section.
1.3. Organization of Thesis
Robust digital audio watermarking algorithms and high capacity watermarking
methods for audio are studied in this thesis. The purpose of the thesis is to develop
novel audio watermarking algorithms providing a performance enhancement over the
other state-of the- art algorithms with an acceptable increase in complexity and to
validate their performance in the presence of the standard watermarking attacks.
Presented as a collection of eleven original publications enclosed as appendices I-XI,
the thesis is organized as follows.
Following this introductory chapter, thesis is organized as follows. Chapter 2
Literature Review provides the sufficient background that would help out in solving
the research problems. The research work is continued with review of literature
published in various international journals and conferences to study latest
development in the field of watermarking. The focus is given on research related to
audio watermarking.
Chapter 3 states the research problem. Also provides the research hypothesis
to solve the problem and assumption made while solving the problem. A general
background and requirements for high capacity covert communications for audio are
presented in Chapter 4. In addition, the results which are in part documented in Papers
II and III, for the Wavelet domain LSB watermarking algorithm is presented.
In Chapter 5, the contents of which are in part included in Papers VI, VII,
VIII, IX and X spread spectrum audio watermarking algorithms in wavelet domain
11
are presented. A general model for the spread spectrum-based watermarking is
described in order to place in context the developed algorithms. These developed
algorithms were able to perform blind detection and are perceptually transparent. The
perceptual transparency is measured by computing the SNR between the host audio
and watermarked audio. Also the transparency is measured through subjective
listening tests.
Chapter 6, the contents of which are presented in Paper XI focuses on the
increasing of the robustness of embedded watermark. The Scheme is based on the
patch work algorithm. The schemes are implemented in time, DCT, DWT, DWTDCT domains. The results are also produced simultaneously with the explanation of
each implementation. These developed algorithms increase the perceptual
transparency significantly and performs the blind detection of watermark.
Chapter 7 of the thesis concentrate on modeling the developed algorithm
based on principles of communication. To make the system intelligent and secure a
method to embed the watermark using cyclic coding is suggested. A principle
important to increase the robustness by attack characterization through diversity is
also discussed in the subsequent section of this chapter.
Chapter 8 concludes the thesis discussing its main results and contributions.
Directions for further development and open problems for future research are also
mentioned.
12
Chapter 2
Literature Survey
Chapter 2 Literature Survey
Introduction
Several digital watermarking techniques are proposed which includes
watermarking for images, audio and video. The watermarking is primarily developed
for the images [1-11], the research in audio is started later. There are less
watermarking techniques are proposed for audio compared to the images/video.
Embedding the data in audio is difficult compared to the images because the Human
auditory system (HAS) is more sensitive than the Human visual System (HVS). In last
ten years there is a lot of advancement in audio watermarking few of them are
discussed here. This chapter reviews the literature of information hiding in audio
sequences. Scientific publications included into the literature survey have been chosen
in order to build a sufficient background that would help out in identifying and
solving the research problems.
During the last decade audio watermarking schemes [12-50] have been
applied widely. These schemes are sophisticated very much in terms of robustness
and imperceptibility. Robustness and imperceptibility are important requirements of
watermarking, while they conflict each other. Non-blind watermarking schemes are
theoretically interesting, but not so useful in practical use, since it requires double
storage capacity and communication bandwidth for watermark detection. Of course,
non-blind schemes may be useful for copyright verification mechanism in a copyright
dispute. On the other hand, blind watermarking schemes can detect and extract
watermarks without use of the unwatermarked audio. Therefore it requires only a half
storage capacity and half bandwidth compared with the non-blind watermarking
scheme. Hence, only blind audio watermarking methods are considered in this
chapter.
2.1 Spread Spectrum Audio Watermarking
Most of the existing audio watermarking techniques embed the watermarks in
the time domain/ frequency domain where as there are few techniques which embed
the data in cepstrum or compress domain. Spread spectrum (SS) technique is most
popular technique and is used by many researchers in their implementations [12-18].
Amplitude scaled Spread Spectrum Sequence is embedded in the host audio signal
which can be detected via a correlation technique. Generally embedding is based on a
13
psychoacoustic model (which provides inaudible limits of watermark). Watermark is
spread over a large number of coefficients and distortion is kept below the just
noticeable difference level (JND) by using the occurrence of masking effects of the
human auditory system (HAS). Change in each coefficient can be small enough to be
imperceptible because the correlated detector output still has a high signal to noise
ratio (SNR), since it dispreads the energy present in a large number of coefficients.
Boney et al [12] generated watermarks by filtering a pseudo noise sequence
with a filter that approximates the frequency masking characteristics of the HAS.
Thus the different watermarks are created for different audio signal. Their study and
results show that their scheme is robust in the presence of additive noise, lossy
coding/decoding, resampling and time scaling. They also state that using their scheme
it is easy to detect the watermark for the author. However they have used the original
signal to detect the watermark. The scheme is also robust in presence of other
watermarks.
J. Seok et al [13] proposed a novel audio watermarking algorithm which is
based on a direct sequence spread spectrum method. The information that is to be
embedded is modulated by a pseudo noise (PN) sequence. The spread spectrum signal
is then shaped in the frequency domain and inserted into the original audio signal. To
detect the watermark they used linear predictive coding method. Their experimental
results show that their scheme is robust against different signal processing attacks.
D. Kirovski et al [14] developed the techniques which effectively encode and
decode the direct sequence spread spectrum watermark in audio signal. They have
used the modulated complex lapped transform to embed the watermark. To prevent
the desynchronization attack they developed the technique based on block repetition
coding. Though they have proved that they can perform the correlation test in perfect
synchronization, the wow and flutter induced in watermarked signal may cause false
positive/false negative detection of watermark. To improve the reliability of
watermark detection they proposed the technique which uses cepstrum filtering and
chess watermarks. They have also shown that psychoacoustic frequency masking
creates an imbalance in the number of positive and negative watermark chips in the
part of the SS sequence that is used for correlation detection which corresponds to the
audible part of the frequency spectrum. To compensate this problem they propose a
modified covariance test.
14
Malvar et al [15] introduces a new watermarking modulation techniques
called as Improved Spread Spectrum (ISS) technique. This scheme proposes a new
embedding approach based on traditional SS embedding by slightly modifying it. In
this scheme they introduced two parameters to control the distortion level and control
the removal of carrier distortion on the detection statistics. At a certain values of these
control parameters traditional SS can be obtained from this scheme.
S. Esmaili et al [16] presented a novel audio watermarking scheme based on
spread spectrum techniques that embeds a digital watermark within an audio signal
using the instantaneous mean frequency (IMF) of the signal. This content-based audio
watermarking algorithm was implemented to satisfy and maximize both
imperceptibility and robustness of the watermark. They used short-time Fourier
transform of the original audio signal to estimate a weighted IMF of the signal. Based
on the masking properties of the psychoacoustic model the sound pressure level of the
watermark was derived. Based on these results then modulation is performed to
produce a signal dependent watermark that is imperceptible. This method allows 25
bits to embed and recover within a 5 second sample of an audio signal. Their
experimental results show that the scheme is robust to common signal processing
attacks including filtering, and noise addition whereas the Bit error rate (BER)
increased to 0.08 for mp3 compression i.e. 2 out of 25 bits where not identified.
D. Kirovski et al [17] devised a scheme for robust covert communication over
a public audio channel using spread spectrum by imposing particular structures of
watermark patterns and applying nonlinear filter to reduce carrier noise. This
technique is capable to reliably detect watermark, even in audio clips modified using a
composition attacks that degrade the content well beyond the acceptable limit.
Hafiz Malik et al [18] proposed an audio watermarking method based on
frequency selective direct sequence spread spectrum. The method improves the
detection capability, watermarking capacity and robustness to desynchronization
attacks. In this scheme the process of generating a watermark and embedding it into
an audio signal is treated in the framework of spread spectrum theory. The original
signal is treated as noise whereas the message information used to generate a
watermark sequence is considered as data. The spreading sequence also called as PNsequence is treated as a key. The technique introduces lower mean square as well as
perceptual distortion due to the fact that a watermark is embedded in a small
frequency band of complete audible frequency range.
15
2.2 Methods using patchwork algorithm
The patchwork technique was first presented in 1996 by Bender et al [19] for
embedding watermarks in images. It is a statistical method based on hypothesis
testing and relying on large data sets. As a second of CD quality stereo audio contains
88200 samples, a patchwork approach is applicable for the watermarking of audio
sequences as well. The watermark embedding process uses a pseudorandom process
to insert a certain statistic into a host audio data set, which is extracted with the help
of numerical indexes (like the mean value), describing the specific distribution. The
method is usually applied in a transform domain (Fourier, wavelet, etc.) in order to
spread the watermark in time domain and to increase robustness against signal
processing modifications [19].
Multiplicative patchwork scheme developed by Yeo et al [20] provides a new
way of patchwork embedding. The Most of the embedding schemes are additive such
as y=x+αw, while multiplicative embedding schemes have the form y=x (1+αw).
Additive schemes shift average, while multiplicative schemes changes variance. Thus
detection scheme exploits such facts. In this scheme first mean and variance of the
sample values are computed in order to detect the watermarks, second they assume
that distribution of the sample values is normal, third they try to decide the value of
detection threshold adaptively.
Cvejic et al [21] presented a robust audio watermarking method implemented
in wavelet domain which uses the frequency hopping and patchwork method. Their
scheme embeds the watermark to a mapped sub band in a predefined time period
similar to frequency hopping approach in digital communication and detection
method is modified patchwork algorithm. Their results show that the algorithm is
robust against the mp3 compression, noise addition, requantization and resampling.
For this system to be robust against the resampling attack it is required to find out the
proper scaling parameter.
R. Wang et al [22] proposed an audio watermarking scheme which embedded
robust and fragile watermark at the same time in lifting wavelet domain. Robust
watermark is embedded in the low frequency range using mean quantization. It had
great robustness and imperceptibility. Fragile watermark is embedded in the high
frequency range by quantizing single coefficient. When the audio signal is tampered,
the watermark information will change synchronously. So it can be used for audio
16
content integrity verification. The watermark can be extracted without the original
digital audio signal. Their experimental results show that robust watermark is robust
to many attacks, such as mp3 compression, low pass filtering, noise addition,
requantization, resampling and so on whereas fragile watermark is very sensitive to
these attacks.
2.3 Methods implemented in Time Domain
There are few algorithms implemented in time domain [23-26]. These
algorithms embed the watermarks in the host signal in time domain by modifying the
selected samples.
W. Lie et al [23] proposed a method of embedding digital watermarks into
audio signals in the time domain. Their algorithm exploits differential average-ofabsolute-amplitude relations within each group of audio samples to represent one-bit
information. The principle of low-frequency amplitude modification is employed to
scale amplitudes in a group manner (unlike the sample-by-sample manner as used in
pseudo noise or spread-spectrum techniques) in selected sections of samples so that
the time-domain waveform envelope can be almost preserved. Besides, when the
frequency-domain characteristics of the watermark signal are controlled by applying
absolute hearing thresholds in the psychoacoustic model, the distortion associated
with watermarking is hardly perceivable by human ears. The watermark can be
blindly extracted without knowledge of the original signal. Subjective and objective
tests reveal that the proposed watermarking scheme maintains high audio quality and
is simultaneously highly robust to pirate attacks, including mp3 compression, lowpass filtering, amplitude scaling, time scaling, digital-to-analog/analog-to-digital
reacquisition, cropping, sampling rate change, and bit resolution transformation.
Security of embedded watermarks is enhanced by adopting unequal section lengths
determined by a secret key.
In a method suggested by Bassia et al [24] watermark embedding depends on
the audio signals amplitude and frequency in a way that minimizes the audibility of
the watermark signal. The result is a slight amplitude modification of each audio
sample in a way that does not produce any perceived effect. The audio signal is
divided in Ns segments of N samples each. Each of these segments is watermarked
with the bipolar sequence Wi ∈ {− 1,1}, i = 0,1,2......N − 1 , which is generated by
17
thresholding a chaotic map. The seed (starting point) of the chaotic sequence
generator is the watermark key. By using generators of a strongly chaotic nature
ensures that the system is cryptographically secure, i.e., the sequence generation
mechanism cannot be inverse engineered even if an attacker can manage to obtain a
part of the binary sequence. The watermark signal is embedded in each audio segment
using the following three-stage procedure. The signal-dependent, low-pass-shaped
watermark signal is embedded in the original signal segment to produce the
watermarked signal segment. This scheme is statistically imperceivable and resists
MPEG2 audio compression plus other common forms of signal manipulation, such as
cropping, time shifting, filtering, resampling and requantization.
A. N. Lemma et al [25] investigated an audio watermarking system is referred
to as modified audio signal keying (MASK). In MASK, the short-time envelope of the
audio signal is modified in such a way that the change is imperceptible to the human
listener. In MASK, a watermark is embedded by modifying the envelope of the audio
with an appropriately conditioned and scaled version of a predefined random
sequence carrying some information (a payload). On the detector side, the watermark
symbols are extracted by estimating the short-time envelope energy. To this end, first,
the incoming audio is subdivided into frames, and then, the energy of the envelope is
estimated. The watermark is extracted from this energy function. The MASK system
can easily be tailored for a wide range of applications. Moreover, informal
experimental results show that it has a good robustness and audibility behavior.
2.4 Methods implemented in Transform domain
Synchronization attack is one of the key issues of digital audio watermarking.
In this correspondence, a blind digital audio watermarking scheme against
synchronization attack using adaptive quantization is proposed by X.Y. Wang et al
[26]. The features of the their scheme are as follows: 1) a kind of more steady
synchronization code and a new embedded strategy are adopted to resist the
synchronization attack more effectively; 2) the multiresolution characteristics of
discrete wavelet transform (DWT) and the energy-compression characteristics of
discrete cosine transform (DCT) are combined to improve the transparency of digital
watermark; 3) the watermark is embedded into the low frequency components by
adaptive quantization according to human auditory masking; and 4) the scheme can
extract the watermark without the help of the original digital audio signal. Experiment
18
results shows that the proposed watermarking scheme is inaudible and robust against
various signal processing attacks such as noise adding, resampling, requantization,
random cropping, and MPEG-1 Layer III (mp3) compression.
Barker code has better self-relativity, so Huang et al. [27] chooses it as
synchronization mark and embeds it into temporal domain and embeds the watermark
information into DCT domain. It can resist synchronization attack effectively. But it
has such defects as follows: 1) it chooses a 12-bit Barker code which is so short that it
is easy to cause false synchronization; 2) it only embeds the synchronization code by
modifying individual sample value, which reduces the resisting ability greatly
(especially against resampling and mp3 compression); 3) it does not make full use of
human auditory masking effect.
S. Wu, J. Hang. et al [28] proposed a self-synchronization algorithm for audio
watermarking to facilitate assured audio data transmission. The synchronization codes
are embedded into audio with the informative data, thus the embedded data have the
self-synchronization ability. To achieve robustness, they embed the synchronization
codes and the hidden informative data into the low frequency coefficients in DWT
(discrete wavelet transform) domain. By exploiting the time-frequency localization
characteristics of DWT, the computational load in searching synchronization codes
has been dramatically reduced, thus resolving the contending requirements between
robustness of hidden data and efficiency of synchronization codes searching. The
performance of the scheme is analyzed in terms of SNR (signal to noise ratio) and
BER (bit error rate). An estimation formula that connects SNR with embedding
strength has been provided to ensure the transparency of embedded data. BER under
Gaussian noise corruption has been estimated to evaluate the performance of the
proposed scheme. The experimental results are presented to demonstrate that the
embedded data are robust against most common signal processing and attacks, such as
Gaussian noise corruption, resampling, requantization, cropping, and mp3
compression.
Li et al [29] developed the watermarking technique in wavelet domain based
on SNR to determine the scaling parameter required to scale the watermark. The
intensity of embedded watermark can be modified by adaptively adjusting the scaling
parameter. The authors have proved that the scheme is robust against different signal
processing attacks and provide better embedding degree. This scheme requires the
original signal to recover the watermark. This motivates us to develop the SNR based
19
scheme to detect and extract the watermark without using the original signal. The
watermark embedding procedure adaptively selects the watermark scaling parameter
α for each of the section of audio segment selected for embedding.
A new watermarking technique capable of embedding multiple watermarks
based on phase modulation is devised by A. Takahashi et al [30]. The idea utilizes the
insensitivity of the human auditory system to phase changes with relatively long
transition period. In this technique the phase modulation of the original signal is
realized by means of a time-varying all-pass filter. To accomplish the blind detection
which is required in detecting the copy control information, this watermark is
assigned to the inter-channel phase difference between a stereo audio signal by using
frequency shift keying. Meanwhile, the copyright management information and
fingerprint are embeds in to both channels by using phase shift keying of different
frequency components. Consequently these three kinds of information are
simultaneously embedded into a single time frame. The imperceptibility of the
scheme is confirmed through a subjective listening test. The technique is robust
against several kinds of signal processing attacks evaluated by computer simulations.
Author found that their method has good performance in both subjective and objective
tests.
H. H. Tsai et al [31] proposed a new intelligent audio watermarking method
based on the characteristics of the HAS and the techniques of neural networks in the
DCT domain. The method makes the watermark imperceptible by using the audio
masking characteristics of the HAS. Moreover the method exploits a neural network
for memorizing the relationships between the original audio signals and the
watermarked audio signals. Therefore the method is capable of extracting watermarks
without original audio signals. Their experimental results show that the method
significantly possesses robustness to be immune against common attacks for the
copyright protection of digital audio. C. Xu et al [32] implemented a method to
embed and extract the digital compressed audio. The watermark is embedded in
partially uncompressed domain and the embedding scheme is high related to audio
content. The watermark content contains owner and user identifications and the
watermark embedding and detection can be done very fast to ensure on-line
transactions and distributions.
X. Li et al [33] developed a data hiding scheme for audio signals in cepstrum
domain. Cepstrum representation of audio can be shown to be very robust to a wide
20
range of attacks including most challenging time-scaling and pitch shifting warping.
The authors have embedded the data by manipulating statistical mean of selected
cepstrum coefficients. Psychoacoustic model is employed to control the audibility of
introduced distortion.
S. K. Lee et al [34] suggested a watermarking algorithm in cepstrum domain.
They insert a digital watermark into the cepstral components of the audio signal using
a technique analogous to spread spectrum communications, hiding a narrow band
signal in a wideband channel. The pseudorandom sequence used as watermark is
weighted in the cepstrum domain according to the distribution of cepstral coefficients
and the frequency masking characteristics of human auditory system. Watermark
embedding minimizes the audibility of the watermark signal. The technique is robust
against multiple watermark, MPEG coding and noise addition.
There are various techniques implemented in wavelet domain [35-41]. In these
papers it is proved that the wavelet domain is the more suitable domain compare to
the other transform domains. As the wavelet coefficients contain the multiple
spectrums of multiple band frequencies, this transform is more suitable than other
transform domains to select the perceptible band of frequencies for data embedding.
2.5 Other recently developed algorithms:
Audio watermarking is usually used as a multimedia copyright protection tool
or as a system that embed metadata in audio signals. In the method suggested by S.
D. Larbi et al [42] watermarking is viewed as a preprocessing step for further audio
processing systems: the watermark signal conveys no information, rather it is used to
modify the nonstationarity. The embedded watermark is then added in order to
stationnarize the host signal. Indeed the embedded watermark is piecewise stationary,
thus it modifies the stationarity of the original audio signal. In some audio processing
this can be used to improve the performances that are very sensitive to time variant
signal statistics. The authors have presented the analysis of perceptual watermarking
impact on the stationarity of audio signals. Their study was based on stationarity
indices, which represented a measure of variations in spectral characteristics of
signals, using time-frequency representations. They had presented their simulation
results with two kinds of signals, artificial signals and audio signals. They had
observed the significant enhancement in stationarity indices of watermarked signal,
especially for transient attacks.
21
T. Furon et al [43] investigated an asymmetric watermarking method as an
alternative to direct sequence spread spectrum technique (DSSS) of watermarking.
This method is developed to provide higher security level against malicious attacks
threatening watermarking techniques used for a copy protection purpose. This
application, which is quite different from the classical copyright enforcement issue is
extremely challenging as no public algorithm is yet known to be secure enough and
some proposed proprietary techniques are already hacked. The asymmetric detectors
need more complexity and more money and they accumulate bigger amount of
content in order to take decision.
Conventional watermarking techniques based on echo hiding provide many
benefits, but also have several disadvantages, for example, lenient decoding process,
weakness against multiple encoding attacks etc. B.S. Ko et al [44] improve the weak
points of conventional echo hiding by time-spread echo method. Spreading an echo in
the time domain is achieved by using pseudo-noise (PN) sequences. By spreading the
echo the amplitude of each echo can be reduced, i.e. the energy of each echo becomes
small, so that the distortion induced by watermarking is imperceptible to humans
while the decoding performance of the embedded watermarks is better maintained as
compared with the case of conventional echo hiding method. Authors have proved
this by computer simulations, in which several parameters, such as the amplitude and
length of PN sequences and analysis window length, were varied. Robustness against
typical signal processing was also evaluated in their simulations and showed fair
performance. Results of listening test using some pieces of music showed good
imperceptibility.
S. Eerüçük et al [45] introduced a novel watermark representation for audio
watermarking, where they embed linear chirps as watermark signals. Different chirp
rates, i.e. slopes on time-frequency plane, represent watermark messages such that
each slope corresponds to a unique message. These watermark signals, i.e. linear
chirps, are embedded and extracted using an existing watermarking algorithm. The
extracted chirps are then post processed at the receiver using a line detection
algorithm based on the Hough-Radon transform (HRT). The HRT is an optimal line
detection algorithm, which detects directional components that satisfy a parametric
constraint equation in the image of a TF plane, even at discontinuities corresponding
to bit errors. The simulations carried by authors showed that HRT correctly detects
the embedded watermark message after signal processing operations for bit error rates
22
up to 20%. The new watermark representation and the post processing stage based on
HRT can be combined with existing embedding/extraction algorithms for increased
robustness.
A new adaptive blind digital audio watermarking algorithm is proposed by X.
Wang et al [46] on the basis of support vector regression (SVR). This algorithm
embeds the template information and watermark signal into the original audio by
adaptive quantization according to the local audio correlation and human auditory
masking. During the watermark extraction the corresponding features of template and
watermark are first extracted from the watermarked signal. Then, the corresponding
feature template is selected as training sample to train SVR and an SVR model is
returned. Finally the actual outputs are predicted according to the corresponding
feature of watermark, and the digital watermark is recovered from the watermarked
audio by using the well-trained SVR. The algorithm is not only robust against various
signal processing attacks but also has high perceptibility. The performance of the
algorithm is better than other SVM audio watermarking schemes.
2.6 Audio watermarking techniques against time scale modification
Synchronization attacks are a serious problem to any watermarking schemes.
Audio processing such as random cropping and time scale modification (TSM) causes
displacement between embedding and detection in time domain and is difficult for
watermark to survive. TSM is a serious attack to audio watermarking, very few
algorithms can effectively resist this kind of synchronization attack. According to the
Secure Digital Music Initiative (SDMI) Phase-II robustness test requirement, practical
audio watermarking schemes should be able to withstand pitch-invariant TSM up to
±4%.
Mansour and Twefik [47] proposed to embed watermark by changing the
relative length of the middle segment between two successive maximum and
minimum of the smoothed waveform, the performance highly depends on the
selection of the threshold and it is a delicate work to find an appropriate threshold.
Mansour and Twefik [48] proposed another algorithm for embedding data into audio
signals by changing the interval lengths between salient points in the signal. The
extreme point of the wavelet coefficients from the selected envelop is adopted as
salient points.
23
W. Li et al [49] have suggested a novel content – dependent localized scheme
to combat synchronization attacks like random cropping and time-scale modification.
The basic idea is to first select steady high-energy local regions that represent music
edges like note attacks, transitions or drum sounds by using different methods, then
embed the watermark in these regions. Such regions are of great importance to the
understanding of music and will not be changed much for maintaining high auditory
quality. In this way the embedded watermark will have the potential to escape all
kinds of distortions. Experimental results carried out by authors show that the method
is highly robust against some common signal processing attack and synchronization
attack. This method has its inherent limitations. Although it is suitable for most
modern music with obvious rhythm, it does not work with some classical music
without apparent rhythm.
S. Xiang et al [50] presented a multibit robust audio watermarking solution by
using the insensitivity of the audio histogram shape and the modified mean to TSM
and cropping operations. Authors have addressed the insensitivity property in both
mathematical analysis and experimental testing by representing the histogram shape
as the relative relations in the number of samples among groups of three neighboring
bins. By reassigning the number of samples in groups of three neighboring bins, the
watermark sequence is successfully embedded. In the embedding process, the
histogram is extracted from a selected amplitude range by referring to the mean in
such a way that the watermark will be able to be resistant to amplitude scaling and
avoid exhaustive search in the extraction process. They observed that the
watermarked audio signal is perceptibly similar to the original one. Experimental
results demonstrated by authors prove the robustness of the scheme against TSM and
random cropping attacks and has a satisfactory robustness for those common signal
processing attacks.
A blind digital audio watermarking scheme against synchronization attack
using adaptive mean quantization is developed by X-Y. Wang et al [51]. The features
of the scheme are as follows 1) a kind of more steady synchronization code and a new
embedded strategy are adopted to resist the synchronization attack more effectively;
2) the multiresolution characteristics of DWT and energy-compression characteristics
of discrete cosine transform are combined to improve the transparency of digital
watermark 3) the watermark is embedded into the low frequency components by
adaptive quantization according to human auditory masking; and 4)the scheme can
24
extract the watermark without the help of original audio signal. The experimental
result added in the paper show that the technique can resist the various signal
processing attacks.
2.7 Papers studied on performance analysis and evaluation of
watermarking systems
Powerful and low cost computers allow people to easily create and copy
multimedia content, and the Internet has made it possible to distribute this
information at very low cost.
However, these enabling technologies also make it
easy to illegally copy, modify, and redistribute multimedia data without regard for
copyright ownership. Many techniques have been proposed for watermarking audio,
image, and video, and comprehensive surveys of these technologies is presented in
previous sections. However, it is required to consider an effective means of
comparing the different approaches.
J. D. Gordy et al [52] have presented an algorithm independent set of criteria
for quantitatively comparing the performance of digital watermarking algorithms.
Four criterions were selected by authors as a part of the evaluation framework. They
were chosen
to
reflect
the
fact that
watermarking
is
effectively
a
communications system. In addition, the criteria are simple to test, and may be
applied to any type of watermarking system (audio, image, or video). 1) Bit rate refers
to the amount of watermark data that may be reliably embedded within a host signal
per unit of time or space, such as bits per second or bits per pixel. A higher bit rate
may be desirable in some applications in order to embed more copyright information.
Reliability was measured as the bit error rate (BER) of extracted watermark data.
2)Perceptual quality refers to the imperceptibility of embedded watermark data
within the host signal. In most applications, it is important that the watermark is
undetectable to a listener or viewer. This ensures that the quality of the host signal is
not perceivably distorted, and does not indicate the presence or location of a
watermark. The signal-to-noise ratio (SNR) of the watermarked signal versus the host
signal was used as a quality measure. 3) Computational complexity refers to the
processing required to embed watermark data into a host signal, and / or to extract the
data from the signal. Actual CPU timings (in seconds) of algorithm implementations
were collected. 4) Watermarked digital signals may undergo common signal
25
processing operations such as linear filtering, sample requantization, D/A and
A/D conversion, and lossy compression. Although these operations may not affect
the perceived quality of the host signal, they may corrupt the watermark data
embedded within the signal.
It is important to know, for a given level of host
signal distortion, which watermarking algorithm will produce a more reliable
embedding. Robustness was measured by the bit error rate (BER) of extracted
watermark data as a function of the amount of distortion introduced by a given
operation.
The performance of spread-transform dither modulation watermarking in the
presence of two important classes of non additive attacks, such as the gain attack plus
noise addition and the quantization attack are evaluated by F. Bartolini et al [53]. The
authors developed the analysis under the assumption that the host features are
independent and identically distributed Gaussian random variables, and a minimum
distance criterion is used to decode the hidden information. The theoretical bit-error
probabilities are derived in closed form, thus permitting to evaluate the impact of the
considered attacks on the watermark at a theoretical level. The analysis is validated by
means of extensive Monte-Carlo simulations. In addition to the validation of the
theoretical analysis, Monte-Carlo simulations permitted to abandon the hypothesis of
normally distributed host features, in favor of more realistic models adopting a
Laplacian or a generalized Gaussian probability density function. The general result
of the analysis carried out by authors is that the excellent performance of ST-DM is
confirmed in all cases with only noticeable exception of the gain attack.
Hidden copyright marks have been proposed as a solution for solving the
illegal copying and proof of ownership problems in the context of multimedia objects.
Many systems have been proposed by different authors but it was difficult to have
idea of their performance and hence to compare them. Then F.A.P. Petitcolas et al
[54] propose a benchmark based on a set of attacks that any system ought to survive.
G.C. Rodriguez et al [55] presented a survey report on audio watermarking in which
watermarking techniques are briefly summarized and analyzed. They have made the
following observations:
• The patchwork scheme and cepstrum domain scheme are robust to several
signal manipulations, but for real applications authors suggest to use
patchwork scheme because the cepstrum domain scheme needs the original
26
signal to determine that the host signal is marked as a consequence it needs
the double storage capacity.
• The echo hiding scheme only fulfill with the inaudibility condition and is
not robust to several attacks such as mp33 compression, filtering, resampling,
etc.
In early September 2000 Secure Digital Music Initiative (SDMI) announced a
three-week open challenge for its phase II screening, inviting the public to evaluate
the attach resistance for four watermark techniques. The challenge emphasized on
testing the effectiveness of robust watermarks, which is crucial in ensuring the proper
functioning of the entire system. M. Wu et al [56] points out some weaknesses in
these watermark techniques and suggest directions for further improvement. Authors
have provided the general framework for analyzing the robustness and security of
audio watermark systems.
2.8 Watermark Attacks
Research in digital watermarking has progressed along two paths. While new
watermarking technologies are being developed, some researchers are also
investigating different ways of attacking digital watermarks. Some of the attacks that
have been proposed in the literature are reviewed here.
Frank Hartung et al [57] have shown that the spread spectrum watermarks and
watermark detectors are vulnerable to a variety of attacks. However with appropriate
modifications to the embedding and extraction methods, methods can be made much
more resistant to variety of such attacks. Frank et al classified the attack in four
groups: a) Simple attack attempt to impair the embedded watermark by manipulation
of the whole watermarked data. b) Detection disabling attacks – attempt to break the
connection and to make the recovery of watermark infusible or infeasible for a
watermark detector. c) Ambiguity attacks – attempt to analyze the watermarked data,
estimate the watermark or host data, separate the watermarked data into host data and
watermark, and discard only the watermark. Frank – Huntung et al [57] also
suggested the counter attack to those attacks.
Martin Kutter et al [58] suggested the watermark copy attack, which is based
on an estimation of the embedded watermark in the spatial domain through a filtering
process. The estimate of the watermark is then adapted and inserted into the target
image. To illustrate the performance of the proposed attack they applied it to
27
commercial and non-commercial watermarking schemes. The experiments showed
that the attack is very effective in copying a watermark from one image to a different
image.
Alexander et al [59] suggested the watermark template attack. This attack
estimates the corresponding template points in the FFT domain and then removes
them using local interpolation. The approach is not limited to FFT domain; other
transformation domains may also exploit similar variants at this attack. J. K. Su et al
[60] suggested a Channel Model for a Watermark Attack. Authors have analyzed this
attach for images and stated that the attack can be applied to audio/video
watermarking schemes.
D. Kirovski et al [61] analyzed the security of multimedia copyright protection
systems that use watermarks by proposing a new breed of attacks on generic
watermarking systems. A typical blind pattern matching attack relies on the
observation that multimedia content is often highly repetitive. Thus the attack
procedure identifies subsets of signal blocks that are similar and permutes these
blocks. Assuming that permuted blocks are marked with distinct secrets, it can be
shown that any watermark detector is facing a task of exponential complexity to
reverse the permutations as a preprocessing step for watermark detection. Authors
have described the implementation of attack against a spread-spectrum and a
quantization index modulation data hiding technology for audio signals.
2.9 Research problems identified:
The problems identified from the literature survey carried out in this chapter
include:
1) Construction of the method that would identify perceptually significant
components from an analysis of image/audio and the Human visual system/ Human
auditory system.
2) The system must be tested against lossy operations such as mp3 and data
conversion. The experiments must be expanded to validate the results.
3) There is a need to explore novel mechanisms for effective encoding and decoding
of watermark using DSSS in audio. The technique may aim at improving detection
convergence and robustness, improving watermark imperceptiveness. Preventive
attacks such as desynchronization attack and possibility of establishing covert
communication over a public audio channel.
28
4) Possible asymmetric watermark method may be an alternative to classical DSSS
watermarking, which may provide higher security level against malicious attacks.
5) Possible generation of a framework for blind watermark detection.
6) Possibility of suggesting new malicious attacks and counter attack for available
watermarking techniques.
7) Possibility of embedding audio watermark in audio and design an adaptive system
to overcome number of non-intentional attacks.
Concluding remarks:
Chapter 2 reviews the literature and describes the concept of information
hiding in audio sequences. Scientific publications included in the literature survey
have been chosen in order to build a sufficient background that would help out in
better understanding of the research topic.
A survey of the key digital audio watermarking algorithms and techniques
presented are classified by the signal domain in which the watermark is inserted and
statistical method used for embedding and extraction of watermark bits. Audio
watermarking initially started as a sub-discipline of digital signal processing, focusing
mainly on convenient signal processing techniques to embed additional information to
audio sequences. This included the investigation of a suitable transform domain for
watermark embedding and schemes for imperceptible modification of the host audio.
Only recently has watermarking been placed to a stronger theoretical foundation,
becoming a more mature discipline with a proper base in both communication
modeling and information theory.
My research concentrates on developing an audio watermarking technique to
detection convergence and robustness, improving watermark imperceptiveness. An
attempt is also made to embed the audio data in audio signal during this research.
29
Chapter 3
Research Problem
Chapter 3 Research Problem
Introduction
The fundamental process in each watermarking system can be modeled as a
form of communication where a message is transmitted from watermark embedder to
the watermark receiver [2]. The process of watermarking is viewed as a transmission
channel through which the watermark message is being sent, with the host signal
being a part of that channel. In Figure 3.1, a general mapping of a watermarking
system into a communications model is given. After the watermark is embedded, the
watermarked signal is usually distorted after watermark attacks. The distortions of the
watermarked signal are, similarly to the data communications model, modeled as
additive noise.
Noise
Watermark embedder
Input
Message
Watermark
encoder
m
cw
Wa
wk
Watermark
key
co
Host
signal
n
cwn
Watermark detector
Watermark
Decoder
Output
Message
Watermark
key
Fig. 3.1. A watermarking system and an equivalent communications model.
When setting down the research plan for this study, the research of digital
audio watermarking was in its early development stage; the first algorithms dealing
specifically with audio were presented in 1996 [12]. Although there were a few papers
published at the time, basic theory foundations were laid down and the concept of the
"magic triangle" introduced [62]. Therefore, it is natural to place watermarking into
the framework of the traditional communications system. The main line of reasoning
of the "magic triangle" concept [62] is that if the perceptual transparency parameter is
fixed, the design of a watermark system cannot obtain high robustness and watermark
data rate at the same time. Thus, the research problem can be divided into three
specific subproblems. They are:
SP1: What is the highest watermark bit rate obtainable, under perceptual transparency
constraint, and how to approach the limit?
31
SP2: How can the detection performance of a watermarking system be improved
using algorithms based on communications models for that system?
SP3: How can overall robustness to attacks of a watermark system be increased using
an attack characterization at the embedding side?
The division of the research problem into the three subproblems above defines
the following three research hypotheses:
RH1: To obtain a distinctively high watermark data rate, embedding algorithm can be
implemented in a transform domain.
RH2: To improve detection performance, a spread spectrum method can be used.
RH3: To achieve the robustness of watermarking algorithms, an attack
characterization can be introduced at the embedder.
The general research assumption is that the process of embedding and
extraction of watermarks can be modeled as a communication system, where the
watermark embedding is modeled as a transmitter, the distortion of watermarked
signal as a communications channel noise and watermark extraction as a
communications detector. It is also assumed that modeling of the human auditory
system and the determination of perceptual thresholds can be done accurately using
models from audio coding, namely MPEG compression HAS model.
The perceptual transparency (inaudibility) of a proposed audio watermarking
scheme can be confirmed through subjective listening tests in a predefined laboratory
environment with participation of a predefined number of people with a different
music education and background. The imperceptibility can also be measured by
computing signal to noise ratio between original host signal and watermarked audio
signal. A central assumption in the security analysis of the proposed algorithms is that
an adversary that attempts to disrupt the communication of watermark bits or remove
the watermark does not have access to the original host audio signal. The adversary
should not be able to extract the watermark with any statistical analysis. The
embedded watermark should sustain against all kinds of signal processing attacks
proving better robustness of the scheme. In this thesis we concentrate on developing
the scheme which withstand against signal processing attack and provide better
robustness maintaining the imperceptibility.
32
Summary
In this thesis, a multidisciplinary approach is applied for solving the research
subproblems. The signal processing methods are used for watermark embedding and
extracting processes, derivation of perceptual thresholds, transforms of signals to
different signal domains (e.g. DCT domain, wavelet domain). Communication
principles and models are used for channel noise modeling, different ways of
signaling the watermark (e.g. a direct sequence spread spectrum method, frequency
hopping method), and evaluation of overall detection performance of the algorithm
(bit error rate, normalized correlation value at detection). The research methods also
include algorithm simulations with real data (music sequences) and subjective
listening tests.
33
Chapter 4
High Capacity
Covert
Communication
for Audio
Chapter 4 High Capacity Covert Communication for Audio
Introduction
The simplest visualization of the requirements of information hiding in digital
audio is so called magic triangle [62], given in Figure 4.1. Inaudibility, robustness to
attacks, and the watermark data rate are in the corners of the magic triangle. This
model is convenient for a visual representation of the required trade-offs between the
capacity of the watermark data and the robustness to certain watermark attacks, while
keeping the perceptual quality of the watermarked audio at an acceptable level. It is
not possible to attain high robustness to signal modifications and high data rate of the
embedded watermark at the same time. Therefore, if a high robustness is required
from the watermarking algorithm, the bit rate of the embedded watermark will be low
and vice versa, high bit rate watermarks are usually very fragile in the presence of
signal modifications. However, there are some applications that do not require that the
embedded watermark has a high robustness against signal modifications. In these
applications, the embedded data is expected to have a high data rate and to be detected
and decoded using a blind detection algorithm. While the robustness against
intentional attacks is usually not required, signal processing modifications, like noise
addition, should not affect the covert communications [17].
Inaudibility
Robustness
Data rate
Fig 4.1 Magic triangle three contradictory requirement of watermarking
One interesting application of high capacity covert communications is public
watermark embedded into the host multimedia that is used as the link to external
databases that contain certain additional information about the multimedia file itself,
e.g. copyright information and licensing conditions [17]. Another application with
similar requirements is the transmission of meta-data along with multimedia. Meta-
35
data embedded in, e.g. audio clip, may carry information about a composer, soloist,
genre of music, etc. [17].
Usually embedding is performed in high amplitude portions of the signal
either in time or frequency domain. The techniques explained in [12-16] where the
binary data is used as a watermark, the proposed technique uses the audio information
as the watermark. The objective of the method implemented here is to provide an
audio watermarking technique that can be used for covert communication of audio
signals. In the proposed technique the input audio is decomposed using discrete
wavelet transform (DWT). The covert message (audio watermark) is embedded to the
detailed coefficient of three level decomposed original audio signal.
The results of the technique described in this chapter are presented partly in
paper I and paper II. Section 4.1 gives an overview of the properties of HAS. Sections
4.2 concentrate on understanding of the 1-D wavelet decomposition. The idea of the
proposed technique is described in section 4.3. Section 4.4 discusses the experimental
results obtained for this method.
4.1. Overview of the properties of HAS
Watermarking of audio signals is more challenging compared to the
watermarking of images or video sequences, due to wider dynamic range of the HAS
in comparison with human visual system (HVS). The HAS perceives sounds over a
range of power greater than 109:1 and a range of frequencies greater than 103:1. The
sensitivity of the HAS to the additive white Gaussian noise (AWGN) is high as well;
this noise in a sound file can be detected as low as 70 dB below ambient level.
On the other hand, opposite to its large dynamic range, HAS contains a fairly
small differential range, i.e. loud sounds generally tend to mask out weaker sounds.
Additionally, HAS is insensitive to a constant relative phase shift in a stationary audio
signal and some spectral distortions interprets as natural, perceptually non-annoying
ones. Auditory perception is based on the critical band analysis in the inner ear where
a frequency-to-location transformation takes place along the basilar membrane. The
power spectra of the received sounds are not represented on a linear frequency scale
but on limited frequency bands called critical bands. The auditory system is usually
modeled as a band-pass filter-bank, consisting of strongly overlapping band-pass
filters with bandwidths around 100 Hz for bands with a central frequency below 500
36
Hz and up to 5000Hz for bands placed at high frequencies. If the highest frequency is
limited to 24000 Hz, 26 critical bands have to be taken into account.
Two properties of the HAS dominantly used in watermarking algorithms are
frequency (simultaneous) masking and temporal masking [62]. The concept using
the perceptual holes of the HAS is taken from wideband audio coding (e.g. MPEG
compression 1, layer 3, usually called mp3)[66]. In the compression algorithms, the
holes are used in order to decrease the amount of the bits needed to encode audio
signal, without causing a perceptual distortion to the coded audio. On the other hand,
in the information hiding scenarios, masking properties are used to embed additional
bits into an existing bit stream, again without generating audible noise in the audio
sequence used for data hiding.
Frequency (simultaneous) masking is a frequency domain phenomenon
where a low level signal, e.g. a pure tone (the maskee), can be made inaudible
(masked) by a simultaneously appearing stronger signal (the masker), e.g. a narrow
band noise, if the masker and maskee are close enough to each other in frequency
[67]. A masking threshold can be derived below which any signal will not be audible.
The masking threshold depends on the masker and on the characteristics of the
masker and maskee (narrowband noise or pure tone). For example, with the masking
threshold for the sound pressure level (SPL) equal to 60 dB, the masker in Figure 4.2
at around 1 kHz, the SPL of the maskee can be surprisingly high; it will be masked as
long as its SPL is below the masking threshold. The slope of the masking threshold is
steeper toward lower frequencies; in other words, higher frequencies tend to be more
easily masked than lower frequencies. It should be pointed out that the distance
between masking level and masking threshold is smaller in noise-masks tone
experiments than in tone-masks-noise experiments due to HAS’s sensitivity toward
additive noise. Noise and low-level signal components are masked inside and outside
the particular critical band if their SPL is below the masking threshold. Noise
contributions can be coding noise, inserted watermark sequence, aliasing distortions,
etc. Without a masker, a signal is inaudible if its SPL is below the threshold in quiet,
which depends on frequency and covers a dynamic range of more than 70 dB as
depicted in the lower curve of Figure 4.2.
37
Fig 4.2 Frequency masking in the human auditory system (HAS)
The qualitative sketch of Figure 4.3 gives more details about the masking
threshold. The distance between the level of the masker (given as a tone in Figure 4.3)
and the masking threshold is called signal-to-mask ratio (SMR) [66]. Its maximum
value is at the left border of the critical band. Within a critical band, noise caused by
watermark embedding will be audible as long as signal-to-noise ratio (SNR) for the
critical band [16] is higher than its SMR. Let SNR (m) be the signal-to-noise ratio
resulting from watermark insertion in the critical band m; the perceivable distortion in
a given sub-band is then measured by the noise to mask ratio:
NMR (m) = SMR - SNR (m)
The noise-to-mask ratio NMR (m) expresses the difference between the watermark
noise in a given critical band and the level where a distortion may just become
audible; its value in dB should be negative.
38
Fig. 4.3. Signal-to-mask-ratio and Signal-to-noise-ratio values.
This description is the case of masking by only one masker. If the source
signal consists of many simultaneous maskers, a global masking threshold can be
computed that describes the threshold of just noticeable distortion (JND) as a function
of frequency. The calculation of the global masking threshold is based on the high
resolution short-term amplitude spectrum of the audio signal, sufficient for critical
band-based analysis and is usually performed using 1024 samples in FFT domain. In a
first step, all the individual masking thresholds are determined, depending on the
signal level, type of masker (tone or noise) and frequency range. After that, the global
masking threshold is determined by adding all individual masking thresholds and the
threshold in quiet. The effects of the masking reaching over the limits of a critical
band must be included in the calculation as well. Finally, the global signal-to-noise
ratio is determined as the ratio of the maximum of the signal power and the global
masking threshold [66], as depicted in Figure 4.2.
Temporal Masking refers to sounds that are heard at different time instances.
Temporal masking can be either premasking (backward) or post-masking (forward).
When the masker affects a previous sound, the masking is called premasking whereas,
when the masker affects a subsequent sound, the masking is called post-masking. In
general, premasking is not as intense as post-masking. Pre-masking occurs for
duration of 5–20 ms before the masker is turned on. Post-masking occurs for duration
of 50–200 ms after the masker is turned off. The temporal masking effects appear
39
before and after a masking signal have been switched on and off, respectively (Figure
4.4). The duration of the premasking is significantly less than one-tenth that of the
post-masking, which is in the interval of 50 to 200 milliseconds. Both pre- and postmasking have been exploited in the MPEG audio compression algorithm and several
audio watermarking methods.
Fig 4.4 Temporal masking in the human auditory system (HAS).
4.2. Discrete Wavelet transform:
The wavelet series is just a sampled version of continuous wavelet transform
(CWT) and its computation may consume significant amount of time and resources,
depending on the resolution required. The discrete wavelet transform (DWT), which
is identical to a hierarchical sub band system, where the sub bands are logarithmically
spaced in frequency domain. In DWT, a time- scale representation of the digital signal
is obtained using digital filtering techniques. The signal to be analyzed is passed
through filters with different cutoff frequencies at different scales.
Filters are one of the most widely used signal processing functions. Wavelets
can be realized by iteration of filters with rescaling. The resolution of the signal,
which is a measure of the amount of detail information in the signal, is determined by
the filtering operations, and the scale is determined by upsampling and downsampling
(subsampling) operations.
The DWT is computed by successive lowpass and highpass filtering of the
discrete time-domain signal as shown in figure 4.5. This is called the Mallat algorithm
or Mallat-tree decomposition. Its significance is in the manner it connects the
continuous-time mutiresolution to discrete-time filters. In the figure, the signal is
40
denoted by the sequence x[n], where n is an integer. The low pass filter is denoted by
G while the high pass filter is denoted by H . At each level, the high pass filter
0
0
produces detail information; d[n], while the low pass filter associated with scaling
function produces coarse approximations, a[n].
At each decomposition level, the half band filters produce signals spanning
only half the frequency band. This doubles the frequency resolution as the uncertainty
in frequency is reduced by half. In accordance with Nyquist’s rule if the original
signal has a highest frequency of ω, which requires a sampling frequency of 2ω
radians, then it now has a highest frequency of ω/2 radians. It can now be sampled at a
frequency of ω radians thus discarding half the samples with no loss of information.
This decimation by 2 halves the time resolution as the entire signal is now represented
by only half the number of samples. Thus, while the half band low pass filtering
removes half of the frequencies and thus halves the resolution, the decimation by 2
doubles the scale.
Fig 4.5 Mallat- tree decomposition
With this approach, the time resolution becomes arbitrarily good at high
frequencies, while the frequency resolution becomes arbitrarily good at low
frequencies. The time-frequency plane is thus resolved. The filtering and decimation
process is continued until the desired level is reached. The maximum number of levels
depends on the length of the signal. The DWT of the original signal is then obtained
by concatenating all the coefficients, a[n] and d[n], starting from the last level of
decomposition.
41
Fig 4.6 Reconstruction of original signal from wavelet coefficients.
Figure 4.6 shows the reconstruction of the original signal from the wavelet
coefficients. Basically, the reconstruction is the reverse process of decomposition.
The approximation and detail coefficients at every level are up sampled by two,
passed through the low pass and high pass synthesis filters and then added. This
process is continued through the same number of levels as in the decomposition
process to obtain the original signal. The Mallat algorithm works equally well if the
analysis filters, G and H , are exchanged with the synthesis filters, G and H .
0
0
1
1
4.2.1. Conditions for Perfect Reconstruction
In most Wavelet Transform applications, it is required that the original signal
be synthesized from the wavelet coefficients. To achieve perfect reconstruction the
analysis and synthesis filters have to satisfy certain conditions. Let G0(z) and G1(z) be
the low pass analysis and synthesis filters, respectively and H0(z) and H1(z) the high
pass analysis and synthesis filters respectively. Then the filters have to satisfy the
following two conditions as given in [4] :
G0 (-z) G1 (z) + H0 (-z). H1 (z) = 0
(4.1)
G0 (z) G1 (z) + H0 (z). H1 (z) = 2z-d
(4.2)
The first condition implies that the reconstruction is aliasing-free and the
second condition implies that the amplitude distortion has amplitude of one. It can be
observed that the perfect reconstruction condition does not change if we switch the
analysis and synthesis filters. There are a number of filters which satisfy these
conditions. But not all of them give accurate Wavelet Transforms, especially when the
filter coefficients are quantized. The accuracy of the Wavelet Transform can be
determined after reconstruction by calculating the Signal to Noise Ratio (SNR) of the
42
signal. Some applications like pattern recognition do not need reconstruction, and in
such applications, the above conditions need not apply.
4.2.2. Classification of wavelets
We can classify wavelets into two classes: (a) orthogonal and (b)
biorthogonal. Based on the application, either of them can be used.
4.2.2.1. Features of orthogonal wavelet filter banks
The coefficients of orthogonal filters are real numbers. The filters are of the
same length and are not symmetric. The low pass filter, G and the high pass filter, H
0
0
are related to each other by
H (z) = z
0
-N
-1
G (-z )
0
(4.3)
The two filters are alternated flip of each other. The alternating flip
automatically gives double-shift orthogonality between the lowpass and highpass
filters [1], i.e., the scalar product of the filters, for a shift by two is zero. i.e., ∑G[k]
H[k-2l] = 0, where k,lЄZ [4]. Filters that satisfy equation 4.3 are known as Conjugate
Mirror Filters (CMF). Perfect reconstruction is possible with alternating flip.
Also, for perfect reconstruction, the synthesis filters are identical to the
analysis filters except for a time reversal. Orthogonal filters offer a high number of
vanishing moments. This property is useful in many signal and image processing
applications. They have regular structure which leads to easy implementation and
scalable architecture.
4.2.2.2. Features of biorthogonal wavelet filter banks
In the case of the biorthogonal wavelet filters, the low pass and the high pass
filters do not have the same length. The low pass filter is always symmetric, while the
high pass filter could be either symmetric or anti-symmetric. The coefficients of the
filters are either real numbers or integers.
For perfect reconstruction, biorthogonal filter bank has all odd length or all
even length filters. The two analysis filters can be symmetric with odd length or one
symmetric and the other antisymmetric with even length. Also, the two sets of
analysis and synthesis filters must be dual. The linear phase biorthogonal filters are
the most popular filters for data compression applications.
43
4.3. Audio watermarking for Covert Communication:
The purpose of high capacity audio watermarking scheme implemented for
covert communication is to hide the covert message in to the host multimedia signal
for secure data transmission. The covert message can include a binary message, text
message or audio data required to be transferred over a public channel. An attempt is
made to hide audio information in the host audio signal. The method implemented
here embeds the audio watermark in the host audio signal. The watermark is
embedded in the LSB of the host audio signal. Since audio information is used as a
watermark in this method the capacity of watermarking is increased compare to the
other techniques which embed one bit of binary information in one sample of audio
signal.
The basic block diagram of watermark embedding is shown in Fig 4.7. A three
level wavelet decomposition of an original signal x is first computed as explained in
section 4.2. To embed the data in low middle frequency components the d3
coefficients are selected. Then the detailed coefficients d3 of original signal are
modified for embedding the watermark. The audio watermark (covert signal) is scaled
with the secret key of the author and embedded in d3. The selection criterion for this
key is based on the psychoacoustic model of host signal. The numerical value of the
key should be selected in such away that the embedded watermark should be
imperceptible. To reconstruct the watermark from the watermarked signal the
knowledge of secret key is necessary. Without the knowledge of secret key it is
impossible to extract the watermark. The scaled watermark then added to the selected
d3 coefficients. Then inverse DWT is computed to obtain the watermarked signal. The
algorithm to embed the watermark in the original signal is explained as follows:
•
The original signal X is decomposed to get the approximation and detailed
coefficients using DWT function available in Matlab. The daubechies wavelet
(db2) is used as mother wavelet to decompose the signal.
•
The audio watermark w is scaled with the scaling factor α that is used as the
secret key for the author. Without the knowledge of parameter α extraction of
w is impossible.
•
The scaled watermark w’ is computed from w as (w’ = w * α). The purpose of
the scaling parameter is to reduce the amplitude of the original watermark
signal.
44
•
The value of α is selected in such away that α<1 and should not disturb the
imperceptibility of watermarked signal y. The scaled watermark w’ is
embedded in d3 as d3’ = d3 + w’.
•
The inverse IDWT is computed using the idwt function in Matlab.
Original Signal
3-level
wavelet
decomposition
Audio Watermark
Watermark
Embedding
Secret key
Inverse
DWT
Watermarked signal
Fig.4.7 Watermark embedding
The signal to noise ratio (SNR) [52] between x and y is computed to measure
the imperceptibility of the proposed technique.
SNR = 10 log
∑ x 2 (i )
∀i
∑ ( x (i ) − y (i ))
2
(4.4)
∀i
The watermark is extraction from the watermarked audio signal as shown in
Fig. 4.8. The three level decomposition of original audio signal and the watermarked
audio signal is performed. d3 coefficients of both the signal is selected, then with the
help of secret key the watermark is extracted from the watermarked audio signal y as
w ′′ =
(d′3 − d 3 )
α
(4.5)
Where d3’ is detailed coefficient vector of watermarked audio signal and d3 is
the detailed coefficient vector of original audio signal. α is a scaling parameter (secret
key) used while embedding the watermark. The similarity between the original audio
watermark and the extracted audio watermark is measured by computing the bit error
rate (BER)[12].
45
Original Signal
Watermarked Signal
3-level Wavelet
Decomposition
3-level Wavelet
Decomposition
Watermark extraction
Audio watermark
Fig 4.8 Watermark extraction
BER =
100
∑ (w ′′ ⊕ w )
M
(4.6)
Where M is the total number of bits used in watermark signal and α is XOR
operator.
4.4. Results of high capacity covert communication technique:
The algorithm is applied to a set of audio signals. The signals are mono with
sampling rates 44.1Khz.The original host audio signal to be watermarked is shown in
fig.4.9. The length of the host signal is 45 seconds. The watermarked signal is shown
in fig.4.10. The audio signal to be embedded as a watermark is shown in fig.4.11. The
length of the watermark signal is 1sec. The extracted watermark signal is shown in
Fig.4.12. To evaluate the watermarked audio quality the listening test is performed.
The Table 1 shows the extracting results. The SNR is 28db when watermark signal is
equal to the length of the detailed coefficient vector cd3; the SNR observed in [1] is
22 db. The table shows that SNR is increased with the decrease in the length of
watermark signal. It is also observed that the BER in all the cases is 0. To evaluate the
performance of the proposed watermarking technique its robustness is tested.
46
Fig4.9 Original Host audio signal
Fig 4.10 Watermarked audio signal
Fig 4.11 Original audio used as Watermark
47
Fig 4.12 Recovered Watermark
4.4.1.
Subjective Listening Test:
To further evaluate the watermarked audio quality, we also performed an
informal subjective listening test according to a double blind triple stimulus with
hidden reference listening test [13]. The subjective listening test results are
summarized in Table 4.1 under Diffgrade. The Diffgrade is equal to the subjective
rating given to the watermarked test item minus the rating given to the hidden
reference: a Diffgrade near 0.00 indicates a high level of quality. The Diffgrade can
even be positive, which indicates an incorrect identification of the watermarked item.
Table 4.1 Subjective listening test for mp3 song
Sr.No Type of attack
Diffgrade
MPEG-1 Layer-3 128 kbps/mono
0.00
1
MPEG-1 Layer3 64 kbps/mono
0.00
2
Band pass filtering
0.00
3
Echo addition
0.00
4
Equalization
0.00
5
Cropping
0.00
6
Requantization
0.00
7
Down sampling
0.00
8
Up sampling
-1
9
Time warping
-2
10
Low pass filtering
-3
11
The Diffgrade scale is partitioned into five ranges: imperceptible (> 0.00), not
annoying (0.00 to –1.00), slightly annoying (–1.00 to –2.00), annoying (–2.00 to –
3.00), and very annoying (–3.00 to –4.00).” The number of transparent items
represents the number of incorrectly identified items. Fifteen listeners participated in
48
the listening test. The quality of the watermarked audio signal was acceptable for all
the test signals.
4.4.2.
Robustness test
To test the robustness of the scheme the watermarked signal is passed through
different signal processing attacks and then watermark is recovered. Table 4.3 shows
the robustness test results for various attacks. The detailed robustness test procedure is
as follows:
Band pass filtering:
The watermarked signal is applied to the band pass filter having 100Hz
and 6 KHz cutoff frequencies.
Echo addition:
Echo signal of a delay of 100ms and a volume control of 50% is added
to the original audio signal.
MPEG compression:
To evaluate the robustness against data compression attack, an MPEG1 Audio Layer 3 with a bit rate of 64kbps/mono and MPEG-1 Audio Layer 3
with a bit rate of 128kbps/mono is selected.
Equalization:
A 10-band equalizer with the characteristics listed below is used.
Frequency Hz: 31, 62, 125, 250, 500, 1000, 2000, 4000, 8000, 16000. Gain
[db]: -6, +6, -6, +6, -6, +6, -6, +6, -6, +6.
Requantization:
The 8-bit watermarked signal is requantized to 16 bit/sample and back
to 8 bit/sample. The correlation coefficient after requantization is 0.9910 and
SNR is 39db.
Resampling:
The watermarked audio signal with original sampling rate 44100Hz
has been down sampled to 22050Hz and upsampled back to 44100Hz. Then
49
the watermarked signal is upsampled with sampling rate equal to 88200 Hz
and down sampled back to 44100 Hz.
Cropping:
10% signal of each segment of the watermarked signal is cropped and
watermark is recovered from it. The obtained correlation coefficient is 0.9884
and SNR is 37.29db.
Noise addition:
White noise with 15 of the power audio signal is added into the
watermarked audio signal. Correlation coefficient of recovered watermark
after noise addition in watermarked signal is 0.98797 SNR is 34.26db.
Time warped:
The signal is time scaled by 10/9 (the 6.5 sec signal is compressed for
the 6 sec duration).
Table 4.2 SNR of watermarked signal and BER of extracted watermark signal for various length of audio.
Sr.No
1
2
3
4
5
6
The length of watermark signal
in sec.
5
4
3
2
1
0.5
SNR in db
BER in %
28.1
28.9
30.4
31.7
35
37.9
0.00
0.00
0.00
0.00
0.00
0.00
Table 4.3 Robustness test for mp3 song
Sr.No
1
2
3
4
5
6
7
8
9
10
11
Type of attack
MPEG-1 Layer-3 128
kbps/mono
MPEG-1 Layer3 64 kbps/mono
Band pass filtering
Echo addition
Equalization
Cropping
Requantization
Down sampling
Up sampling
Time warping
Low pass filtering
BER in % for this
scheme
0.0017493
0.0017877
0.0017578
0.0017591
0.0017493
0.003564
0.001728
0.003564
0.042567
0.037863
2.1
50
4.5. Summary of chapter:
The results listed in Table 4.1 indicate that by the present technique it is
possible to hide audio information into the host audio signal. The listening test
confirms that the embedded information is inaudible and the extracted watermark
signal is indistinguishable from original watermark. The BER tells the similarity
between the original watermark and the extracted watermark. Experimental results
listed in Table 4.3 indicate that the present watermarking technique is robust to
common signal processing attacks such as compression, echo addition and
equalization. While performing the robustness test it is also observed that the
watermark will not withstand if the watermarked signal is passed through low pass
filtering of 11 kHz, 22 kHz.
51
Chapter 5
Spread spectrum
Audio
Watermarking
Algorithms
Chapter 5 Spread Spectrum Audio Watermarking
algorithms
Introduction
Most of the existing audio watermarking techniques embed the watermarks in
the time domain / frequency domain where as there are few techniques which embed
the data in cepstrum or compressed domain. Spread spectrum technique is most
popular technique and is used by many researchers in their implementations [13, 14,
15, 16, 17, 18, 19, 29]. Amplitude scaled Spread Spectrum Sequence is embedded in
the host audio signal which can be detected via a correlation technique. Generally
embedding is based on a psychoacoustic model (which provides inaudible limits of
watermark).
The spread spectrum techniques can be divided in to two categories blind and
non blind techniques. Blind watermarking techniques detect the watermark without
using the original host signal whereas the non blind techniques use the original host
signal to detect the watermark. The most blind watermarking techniques studied in
chapter 2 only detects the presence of the valid watermark and not concentrate on
recovering (extracting) the embedded watermark. Non blind techniques recover the
watermark whereas it requires the original signal to recover the watermark.
Section 5.1 provides the brief overview of conventional spread spectrum
method. Section 5.2 highlights on the non blind technique suggested by Li et al [29]
and implemented in the initial stage of the research to test the watermarking scheme
for our database and then compare the results with our proposed schemes. The results
of this implementation tested for our database are appeared in paper VI. In section 5.3
we propose the adaptive blind watermarking technique based on SNR using DWT and
lifting wavelet transform. The results of these implementations are published partly in
paper VII and paper VIII. To improve the imperceptibility between the original audio
signal and the watermarked audio signal the adaptive SNR based scheme using DWTDCT is implemented. Section 5.4 describes this scheme the results of which are
published in paper IX and Paper X. To make the system intelligent we propose to
embed the watermark using cyclic coding. The scheme which encodes the watermark
using cyclic codes and embeds the watermark is proposed in section 5.5.
53
5.1. Spread Spectrum watermarking: Theoretical background
A general model considered [14, 15] for SS-based watermarking is shown in
Figure 5.1. Vector x is considered to be the original host signal transformed in an
appropriate transform domain. The vector y is the received vector, in the transform
domain, after channel noise. A secret key K is used by a pseudo random number
generator (PRN) to produce a chip sequence with zero mean and whose elements are
equal to +σu or -σu. The use of secret key is essential to provide the security to the
watermarking system. The sequence u is then added to or subtracted from the signal
x according to the variable b, where b assumes the values of +1 or -1 according to the
bit (or bits) to be transmitted by the watermarking process (in multiplicative
algorithms multiplication operation is performed instead addition [25]). The signal s is
the watermarked audio signal. A simple analysis of SS-based watermarking leads to a
simple equation for the probability of error. Thus, inner product and norm is defined
N −1
as [15]: ⟨ x, u ⟩ = ∑ x i u i & x = ⟨ x, x ⟩ Where N is the length of the vectors x, s, u, n,
i =0
and y in Figure 5.1. Without a loss of generality, we assume that we are embedding
one bit of information in a vector s of N transform coefficients. Then, the bit rate is
1=N bits/sample. That bit is represented by the variable b, whose value is either +1 or
-1. Embedding is performed by
s= x + b u
(5.1)
The distortion in the embedded signal is defined by s − x . It is easy to see
that for the embedding equation (5.1), we have
D = bu = u = σ u
x
k
PRN
s
u
y
n
(5.2)
Correlation
detector
b
u
PRN
k
b
Fig. 5.1 General model of SS-based watermarking
54
The channel is modeled as an additive noise channel y = s + n, and the
watermark extraction is usually performed by the calculation of the normalized
sufficient statistics r:
r=
⟨ y, u ⟩ ⟨ bu + x + n, u ⟩
=
= b + c x + cn
σu
⟨ u, u ⟩
(5.3)
and estimating the embedded bit as b̂ = sign(r) , where c x =
⟨ x, u ⟩
u
and c n =
⟨ n, u ⟩
u
.
Simple statistical models for the host audio x and the attack noise n are assumed.
Namely, both sequences are modeled as uncorrelated white Gaussian random
processes. Then, it is easy to show that the sufficient statistics r is also
Gaussian x i ∼ N(0, σ 2x )and n i ∼ N(0, σ 2n ) , ~ Sian variable, i.e.:
r ∼ N(m r , σ 2r ), m r = E[r ] = b, σ 2r =
σ 2x + σ 2n
Nσ 2u
(5.4)
Let us consider the case when b is equal to 1. In this case error occurs when r <
0 and therefore, the error probability p is given by
{
}
p = Pr b̂ < 0 b = 1 =
⎛ mr
1
erfc ⎜⎜
2
⎝ σr 2
⎛
⎞ 1
⎟ = erfc ⎜
⎟ 2
⎜
⎠
⎝
σ 2u N
2 σ 2x + σ 2n
(
)
⎞
⎟
⎟
⎠
(5.5)
Where erfc(.) is complementary error function. The equal error probability is
obtained under the assumption that b = -1. A plot of that probability as a function of
the SNR (in this case defined as (mr/σr) is given in Figure 5.2. For example, from
Figure 5.2, it can be seen that if an error probability lower than 10-3 is needed, SNR
becomes:
55
Fig. 5.2. Error probability as a function of the SNR.
m r /σ r > 3 ⇒ Nσ u2 > 9(σ x2 + σ n2 )
(5.6)
or more generally, to achieve an error probability p we need:
(
)
Nσ u2 > 2 erfc −1 ( p ) (σ x2 + σ n2 )
2
(5.7)
Malvar et al [15] shows that one can make a trade-off between the lengths of
the chip sequence N with the energy of the sequence σu2. It lets us to simply compute
either N or σu2, given the other variables involved.
5.2. Adaptive SNR based non blind watermarking technique in
wavelet domain
The SNR based watermarking technique suggested by Li et al [29] is
implemented in wavelet domain. This non blind technique is implemented here in
order to compare our results with this scheme proposed by Li et al [29]. This method
is non blind means it requires the original host signal to recover the watermark. This
section describes the adaptive watermarking technique based on SNR. The goal of the
watermarking technique is to embed (add) the cover signal (watermark) in to the host
multimedia signal. The embedding process modifies the host signal by mixing
function. In most of the watermarking technique mixing function is the addition of
original host signal and scaled cover signal. Mathematically the function is defined as
y(i) = x(i) + α w ( ι )
(5.6)
56
Where y(i) is the watermarked signal, x(i) is the original host signal, w(i) the
cover signal and α scaling parameter used as the secret key. The value of α plays an
important role in embedding as well as in detection process. In embedding process α
is selected in such way that the watermark remains imperceptible (inaudible). The
watermark detection process is exactly the reverse process. Without the knowledge of
parameter α it is not possible to detect the watermark. Any statistical analysis should
not leave any possibility of detecting the cover signal or the parameter α. The
imperceptibility of the watermarking procedure is computed by computing the SNR
between the original host signal and the watermarked signal. The SNR can be
computed as
SNR = 10 log
∑ x 2 (i )
∀i
∑ ( x (i ) − y (i )) 2
(5.7)
∀i
The method proposed by Wu et al [29] is implemented in wavelet domain. To
embed the watermark in to host audio signal the host audio signal is divided in to
smaller segments of size N. One bit of binary watermark is then added in to one
segment of host audio signal.
if A 3 (i) is max(A 3k (i) )
k
⎧A 3 (i) + α(k ) ⋅ w (k )
⎪ k
3
B (i) = ⎨
kk
3
⎪ A k (i)
⎩
otherwise
(5.8)
Where A 3k (i) is cd3 coefficient of 3rd level DWT of xk (i ) and xk (i ) is ith
sample of kth segment of host audio signal. B3k (i) is watermarked cd3 coefficient.
α(k ) is scaling parameter used to scale the watermark bit w (k ) so that the added
watermark is inaudible in the audio. The SNR between the original coefficient
A 3k (i) and the modified coefficient B3k (i) can be represented by
2
SNR = 10 log
∑ A 3k (i )
(5.9)
∀i
∑ (A 3k (i ) − B 3k (i ) )
2
∀i
From formulas (5.8) and (5.9) the scaling parameter α(k) can be computed as
(−SNR 10 )
. 10 )
∑ A 3k (i )(
2
α(k ) =
i
w (k )
(5.10)
57
The value of w 2 (k ) is =1 because w(k) is either +1 or -1. For the threshold
value of SNR the scaling parameter α(k ) required in formula (5.8) can be computed
using this equation and used to embed the watermark. According to IFPI
(International Federation of the Phonographic Industry) [28] the imperceptible audio
watermark scheme requires at least 20 db SNR value between the watermarked signal
and the host signal, so the threshold value of SNR should be considered greater than
20 for solving the formula (5.10).
The watermark embedding process is shown in Fig 5.3. The host audio signal
x(n) is divided into subsections of size N=2i, each subsection is decomposed into 3level Discrete wavelet transform by means of Harr wavelet transform. The Haar
wavelet is used in the implementation because Haar wavelet gives perfect
reconstructability which is essential feature for this application. The authors in [29]
used db4 wavelet as for Daubechieves wavelets compact signal property is not
mentioned as we increase the number of points in the wavelet, reconstructability is not
mentioned. So we used Haar wavelet in our implementation. The detailed coefficients
(cd3) of host audio signal are selected for embedding the watermark in low frequency
part of highest energy of audio signal.
The M1 × M2 binary image is considered as a watermark. Before embedding
this 2-D watermark into host audio it is converted into its 1-D bipolar equivalent w(k)
{1,-1}. Then w(k) is scaled by the parameter α(k) and One bit of w(k) is embedded
into a single subsection of host audio signal.
To find the imperceptibility of the system SNR between the host audio signal
and watermarked signal is computed by the formula (5.7). It is observed that the SNR
obtained using this technique is 43.77 db. The watermarked signal and host audio
signal is played in front of the five personalities having the knowledge of music. It is
observed by the all five personalities that there is no perceptual difference between
the host audio signal and the watermarked audio signal.
In order to proceed to detect the watermark the knowledge of scaling
parameter α is very important without the knowledge of α(k) it is not possible to
detect the watermark. Scaling parameter α(k) is computed during the watermark
embedding process using the formula (5.10). The original signal x(n) is required to
compute the value of α(k) in order to recover the watermark properly. The formula
58
required to detect the watermark is exactly the reverse of formula (5.8). To detect the
watermark the x(n) and y(n) are divided into subsections of size N where N =2i.
Where the value of i is same as used during the watermark embedding process. Each
subsection is decomposed using 3-level wavelet transform. The watermark is then
recovered as shown in Fig.5.4.
Input host
audio x(n)
Divide host
into segment
of size N
Find start
of utterance
Watermarked
audio y(n)
Concatenate
the segments
Take third
level DWT of
each segment
Take IDWT of
each segment
Scaled watermark
Binary
Watermark
Image
Computation of
scaling parameter α(k)
for each segment
Bipolar
conversion
Dimension
mapping
Fig.5.3. Watermark Embedding Process.
x(n)
y(n)
Subsection
x(n) with
size=2k
Subsection
x(n) with
size=2k
DWT
Original
watermark
α
Computatio
n of α(k)
DWT
Watermark detection
Recovery of bipolar watermark
by threshold comparison.
1-D to 2-D conversion
Binary image
Fig.5.4 Watermark extraction.
59
After detecting the presence of watermark each value is compared against the
threshold value T to recover the bipolar watermark. If the value of the sample is
greater than the threshold then watermark bit is recovered as 1 otherwise if value of
sample is less than threshold the watermark bit is recovered as 0.The recovered
watermark is 1-D signal so it is converted into the required 2-D form of size M1× M2
to recover the binary image used as watermark. In order to test the similarity between
the recovered watermark and the original watermark the correlation coefficient
between original watermark and recovered watermark is computed.
The original host audio signal and the watermarked audio signal are shown in
fig 5.5. Fig 5.5.a represents the original host audio signal and the fig 5.5.b. represents
the watermarked signal. It can be observed that there is no perceptual difference
between the two. The two audio signals are played in front of the five personalities to
test the audible difference between them and found that there is no audible difference
between the original host audio signal and the watermarked signal. The SNR is
computed by using formula (5.7) and is 43.77 db. To find the similarity between the
original watermark and the recovered watermark the correlation coefficient is
computed and is 0.9917.
Fig 5.5. a) Original host audio signal b) Watermarked audio signal.
60
a)
b)
Fig 5.6 a) Original Watermark b) Recovered watermark.
To test the robustness of the technique the watermarked audio signal is passed
through different signal processing techniques. The Gold wave software is used for
editing the watermarked signal using different signal processing tools available in the
software. The experimental results show that the SNR between the original signal and
processed watermarked signal is above 25db in all cases. The watermark is
successfully recovered after all processing and the correlation coefficient is also
above the 0.88 which is shown in the table 5.1. To compute the error obtained in
recovering the watermark is also computed and is presented in the table 5.1.
Low pass filtering:
The watermarked signal is passed through the low pass filter with cutoff
frequency 11025 Hz. The watermark is recovered from the filtered signal. The results
of the signal processing attacks are shown in table 5.1.
Resampling:
The watermarked audio signal with original sampling rate 44100Hz has been down
sampled to 22050Hz and upsampled back to 44100Hz. Then the watermarked signal
is upsampled with sampling rate equal to 88200 Hz and downsampled back to 44100
Hz.
MP3 Compression:
The watermarked audio signal is mp3 compressed at 64 kbps and is observed
that the watermark resist the mp3 compression.
Requantization:
The 8-bit watermarked signal is requantized to 16 bit/sample and back to 8
bit/sample.
Cropping:
10% signal of each segment of the watermarked signal is cropped and
watermark is recovered from it.
61
Noise addition:
White noise with 15 of the power audio signal is added into the watermarked
audio signal.
Echo addition:
Echo signal of a delay of 100ms and a volume control of 50% is added to the
original audio signal.
Equalization:
A 10-band equalizer with the characteristics listed below is used. Frequency Hz: 31,
62, 125, 250, 500, 1000, 2000, 4000, 8000, 16000. Gain [db]: -6, +6, -6, +6, -6, +6, 6, +6, -6, +6.
(a)
(b)
(c)
(g)
(d)
(h)
(e)
(f)
(i)
Fig. 5.7 Results for SNR based scheme for non blind detection. a)without attack b) down
sampling c)up sampling d)MP3 compression e)requantization f)cropping g)low pass filtered with
fc= 11025 h) Lpfiltered fc=22050 i)time warping
Table 5.1 Experimental Results against Signal Processing Attacks for non blind technique (MP3
song)
Sr.
No
1
2
3
4
5
6
7
9
10
Attack
SNR
43.77
28.56
34.26
19.73
39
37.29
41.25
Correlation
Coefficient
0.9917
0.9889
0.9917
0.9917
0.9917
0.9889
0.9917
Without attack
Down Sampling
Up Sampling
LP-filtering
Requantization
Cropping
Mp3
compression
Noise addition
Echo addition
Equalization
BER
0.0027
0.0039
0.0027
0.0027
0.0027
0.0039
0.0027
34.26
37.4420
38.518
0.9889
0.9889
0.9889
0.0039
0.0039
0.0039
The experimental results of the developed scheme are shown in table 5.1. The
SNR between the watermarked signal and the original host audio signal is computed
and entered in the 2nd column of the table. Correlation coefficient between the original
62
watermark and the recovered watermark is entered in the 3rd column of the table. The
BER between the original watermark and the recovered watermark is entered in the
3rd column of the table. The experimental results show that the developed technique
recovers the watermark successfully after all kinds of signal processing attacks
mentioned in the table. It is observed that the SNR between the original signal and the
distorted signal is within the acceptable limit of 20db and is above this limit except
for the LP-filtering and down sampling.
The major draw back of this scheme is that it requires the original host signal
and the original watermark signal in order to recover the watermark and to provide the
proof of ownership. To overcome these drawbacks we modified this scheme and
proposed the SNR based blind watermarking scheme.
5.3. Proposed adaptive SNR based blind watermarking using
DWT /Lifting wavelet transform:
This section highlights on the proposed adaptive SNR based blind
watermarking scheme implemented in DWT and LWT domain. In this scheme we
have made an attempt to recover the watermark signal without the help of original
audio signal. We are also successful in recovering the watermark without using the
scaling parameter. The scaling parameter used to embed the watermark is varied
adaptively for each segment in order to achieve the imperceptibility and to take the
advantage of insensitivity of HAS to smaller variations in transform domain. To
provide the security to the system the secret key is used and the secret key is a PNR
sequence generated by any cryptographic method. Without the knowledge of this
secret key it is not possible to recover the watermark hence it is required to generate
with secret initial seed. This idea of generating the pseudo random number from
initial seed is borrowed from communications.
Our aim in developing this scheme is to devise a system which embeds the
watermark using spread spectrum method and does not require the original audio
signal to recover the embedded watermark. The embedding process modifies the host
signal x by mixing function f(.). f(x, w) performs the mixing of x (host multimedia
signal) and w(watermark signal) with the help of secrete key k. In most of the
watermarking technique [12, 19] mixing function is the addition of original host
signal and scaled watermark signal mathematically represented by formula (5.1). The
63
blind watermarking technique implemented here embeds the watermark in wavelet
domain using the additive formula similar to formula (5.1).
B3k (i) = A 3k (i) + α(k ) ⋅ w (k ) ⋅ r (i)
(5.11)
Where r(i) is the permuted pseudorandom binary signal with zero mean which
is the secret key of the owner, α(k) and w(k) are the scaling parameter and watermark
bit to be embedded in kth segment respectively. A 3k (i) is cd3 coefficient of 3rd level
DWT/LWT of xk (i ) where xk (i ) is ith sample of kth segment of host audio signal.
B3k (i) is watermarked cd3 coefficient. Solving the equation (5.11) and equation (5.9)
for α(k).
α(k) =
−SNR
⎛ 2
∑ ⎜⎜ A 3k (i )10 10
∀i⎝
∑ (r 2 (i )w 2 (k ))
∀i
⎞
⎟
⎟
⎠
(5.12)
w(k) is the bipolar watermark which is either 1 or -1 therefore the value of
w2(k) is 1. The r(i) is a pseudorandom binary signal with zero mean with value of 1 or
-1 so Σ r2(i) = N. The equation (5.12) reduces to
α(k) =
−SNR
⎛ 2
∑ ⎜⎜ A 3k (i ) × 10 10
∀ i⎝
N
⎞
⎟
⎟
⎠
(5.13)
The audio watermarking scheme developed here computes the value of α(k)
for every subsection of host audio signal using the formula (5.13) which is later on
used to embed the watermark bit in each segment of the host audio using the formula
(5.11).
5.3.1. Watermark embedding:
To embed the watermark signal in to the host audio signal present scheme uses
the additive watermarking method. The host audio signal is divided into the segments
of size N= 256, 512, 1024… etc.
64
Xk(i) = X (k . N + i)
i=0,1,2…..N-1, K=0,1,2…..
(5.14)
N=256, 1024
Where X(i) represent the original host audio signal and Xk(i) represent the kth
segment of the host audio. Then each Xk(i) is decomposed to Lth level wavelet
transform. The scheme is implemented using Discrete Wavelet Transform (DWT) and
Lifting Wavelet transform (LWT). To embed the watermark into low frequency part
of the highest energy of audio signal by taking advantage of frequency mask effect of
HAS [29] the 3rd level detail part of coefficients is selected. After selecting the
watermark embedding formula the 3rd approximate coefficients are modified as
A 3k (i) = A 3k (i) + α(k)w(k)r(i)
(5.15)
Where A 3k (i) 3rd level approximate coefficient. Where r(i) is the permuted
pseudorandom binary signal with zero mean which is the secret key of the owner, α(k)
and w(k) are the scaling parameter and watermark bit to be embedded in kth segment
respectively. The block schematic of the proposed scheme is shown in the fig. 5.8.
The SNR between the original signal and the watermarked signal can be computed by
using formula (5.7) to measure the imperceptibility of the watermarked signal.
In this method we compute the scaling parameter α(k) for every segment of
host audio signal and then embed according to the rules for bit one or bit 0 of
watermark signal. This variation of α(k) for every segment takes in to account the
feature of the host audio in that segment and then compute the value of α(k), which is
similar to finding the scaling parameter taking into consideration the perceptual
transparency of the host audio.
65
Input host
audio x(n)
Divide host
into segment
of size N
Find start
of utterance
Watermarked
audio y(n)
Take
IDWT/ILWT of
each segment
Concatenate
the segments
Initial Seed
Binary
Watermark
Image
Dimension
mapping
Take third level
DWT/LWT of
each segment
PN sequence
generation
Scaled watermark
Bipolar
conversion
Computation of
scaling parameter α(k)
for each segment
Fig.5.8. Watermark Embedding Process for proposed adaptive SNR based blind technique in
DWT/LWT domain.
5.3.2.
Watermark extraction
The block schematic of watermark extraction for the proposed scheme is
shown in fig. 5.9. To detect the watermark from the embedded signal the 3rd level
′
DWT/LWT of the watermarked signal is computed. Then the coefficients D 3k (i) are
modified by the same pseudorandom signal r(i) used while embedding the watermark.
′
s(i) = D 3k (i) r(i)
(5.16)
′
Where D 3k (i) = D 3k + α(k ) w (k )r (i) and s(i) wavelet coefficients modified by r(i).
∴ s(i) = (D 3k (i) + α(k)w(k)r(i))r(i)
(5.17)
∴ ∑ s(i) = ∑ D 3k (i)r(i) + ∑ α(k)w(k)r 2 (i)
∀i
∀i
(5.18)
∀i
The expected value of the first term in the equation (5.18) i.e.
∑ D Lk (i)r(i) is
∀i
approximately equal to zero and the value of
∑ r 2 (i) is N (α(k) w(k) are independent
∀i
of summation variation i ) therefore the value of the equation is approximately equal
66
to Nα(k)w(k), where N is the size of the segment. If the value of ∑ s(i) is greater than
∀i
threshold the watermark bit one will be recovered and if the value of summation is
less than the threshold the watermark bit 0 will be recovered.
Watermarked
audio y(n)
Find start
of utterance
Divide host
into segment
of size N
PN sequence
generation
Initial Seed
Recovered
Watermark
Take third level
DWT/LWT of
each segment
Dimension
mapping
Threshold comparison
and bit recovery from
each segment
Fig.5.9 Watermark extraction for adaptive SNR based blind technique in DWT/LWT domain.
5.3.3.
Experimental results
This part of the section highlights on the experimental results obtained during
the implementation of this scheme. The watermark signal is embedded in discrete
wavelet domain and lifting wavelet domain. The host audio signals used in section 5.2
are used for the experimentation of this scheme. Three level wavelet decomposition of
each segment of the host audio is performed. The subjective analysis test is conducted
in a similar manner as explained in previous section of this chapter. It is observed by
all five personalities that there is no audible different between the original host audio
signal and the watermarked signal. The imperceptibility of watermarked signal is
measure by computing the signal to noise ratio (SNR) between the original signal and
the watermarked signal. The observed SNR is shown in table 5.2.a. To measure the
similarity between the original watermark and recovered watermark the correlation
coefficient is computed and is presented in table 5.2.b. We have also computed the
BER between the original watermark and recovered watermark and is entered in the
table 5.2. To check the robustness of the scheme the watermarked signal is passed
through different signal processing techniques and then watermark is recovered. The
watermarked signal is checked against signal processing attacks described in section
67
5.1. The results are represented in the table 5.2 are obtained for blind technique in
DWT and LWT domain. The results obtained in Table 5.2 are obtained by
considering SNR=0.3 in equation (5.13) and length of segment=256. The original
watermark embedded in the signal is shown in fig 5.10 whereas the recovered
watermark after different signal processing attack is shown in fig. 5.11 (DWT based
blind technique) and fig. 5.12 (LWT based blind detection technique).
Fig. 5.10 Original Watermark
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Fig.5.11 Results for SNR based scheme for blind detection in DWT domain a) without attack b)
down sampling c)up sampling d)mp3 compression e)requantization f)cropping g) Lpfiltered
h)time warping
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Fig.5.12 Results for SNR based scheme for blind detection in LWT domain a) without attack b)
downsampling c)upsampling d)mp3 compression e)requantization f)cropping g) Lpfiltered
i)timewarping
68
Table 5.2. Results after signal processing attack
a) SNR between the original audio signal and watermarked audio signal after various
attacks.
Without attack
Downsampled
Upsampled
Mp3compressed
Requantization
Cropping
Lpfiltered
Time warping
Echo addition
Equalization
SNR based scheme
for blind detection
in DWT domain
SNR
18.6194
18.3863
18.3863
18.3863
18.3863
18.4208
16.2305
18.3863
17.4780
18.3863
SNR based scheme
for blind detection
in LWT
SNR
18.3601
18.2978
18.2978
18.2978
18.2978
18.3316
16.2178
18.2978
17.1564
18.2978
b) Correlation coefficient between original watermark and recovered watermark
Without attack
Downsampled
Upsampled
Mp3compressed
Requantization
Cropping
Lpfiltered
Time warping
Echo addition
Equalization
SNR based scheme
for blind detection
in DWT
Correlation BER
coefficient
0.9972
0.0027
0.9972
0.0027
0.9972
0.0027
0.9972
0.0027
0.9972
0.0027
0.9972
0.0027
0.9972
0.0027
0.9972
0.0027
0.9972
0.0027
0.9972
0.0027
SNR based scheme
for blind detection in
LWT
Correlation BER
coefficient
0.9862
0.0039
0.9862
0.0039
0.9862
0.0039
0.9862
0.0039
0.9862
0.0039
0.9862
0.0039
0.9531
0.0089
0.9862
0.0039
0.9862
0.0039
0.9862
0.0039
From the results presented in table 5.2 it is clear that the blind techniques
implemented in discrete wavelet domain and lifting wavelet domain are robust against
the various signal processing attacks such as resampling, requantization,
MP3compression, L.P. filtering and cropping. From the comparison of the results
presented in table5.2 and table 5.1 one can say that non blind technique has better
SNR compared to the blind detection scheme whereas it requires the original audio
signal to recover the watermark. The correlation coefficient test between the original
watermark and recovered watermark indicate that DWT/LWT based blind detection
techniques are robust with little difference in the recovery. Hence forth we have
concentrated on DWT domain only and not considered the LWT as it produces
69
similar results. The observed SNR between the original host signal and the
watermarked signal even after signal processing attacks is less than 20 during the
experimentation results obtained in table 5.2. This is because of the parameter α(k)
calculated from (5.13) by considering SNR=30 the lowest threshold requirement. As
explained in the next subsection these imperceptibility results can be improved with
increase in SNR consideration while computing α(k).
5.3.4. Selection criteria for value of SNR in computing α(k) and
selection criteria for segment length N
To increase the imperceptibility of the watermarked audio signal, we tried
varying the SNR requirement for different values of parameter α(k). We varied the
SNR from 30 to 80 and observed the results. We have also varied the segment length
and observed the performance of the system. The observed results of these variations
are shown in table 5.3.
Table 5.3 Relation between segment length, parameter α(k), observed SNR and correlation
coefficient
Sr.
Assumed SNR between
Segment
Observed SNR
Corr.
No. wavelet coefficients in (5.11)
length
between host audio
Coeff.
and watermarked
for calculation of α(k)
audio
01
30
512
16.3843
1
256
18.6826
1
128
21.5655
1
02
40
512
18.1608
1
19.6560
1
256
128
22.5764
1
03
50
512
19.0204
1
256
20.6227
1
128
24.5634
1
04
60
512
19.3452
1
256
21.6064
1
128
28.9675
1
05
70
512
19.6852
1
256
22.5887
0.9917
128
25.1256
0.9863
0.6
80
512
20.7438
0.9917
256
23.0733
0.9863,
128
27.7826
0.9684
From the table 5.3 it is clear that SNR can be increased by two ways i) By
increasing the SNR value in expression (5.13) ii) By reducing the segment length. For
the first case increase in SNR increases the imperceptibility of the watermarked signal
70
whereas reduces the robustness. For the second case i.e reduction in segment length
improves the SNR but reduce the robustness. The optimized results can be obtained
considering all the contradictory requirements of watermarking by selecting segment
length 256 and SNR in the range 40 – 60. Therefore while obtaining the results of
further implementation of spread spectrum watermarking the SNR is selected as 60
and segment length is considered as 256.
5.4. Proposed Adaptive SNR based spread spectrum scheme in
DWT-DCT domain:
This section highlights on the proposed SNR based spread spectrum technique
of audio watermarking which adaptively select the embedding strength to embed the
watermark. This scheme embeds the watermark by first taking the 3rd level DWT of
the host audio signal and then computing the DCT of the low pass filtered DWT
coefficients. DWT-DCT transform is used to embed the watermark in low-middle
frequency components of host audio in order to increase the imperceptibility of the
watermarking scheme as compared to the scheme implemented in previous section.
Using DCT of the Ca3 coefficients, we can exactly track the required frequency band
to embed the watermark for better imperceptibility. The spread spectrum
watermarking techniques [2, 3, 9] modifies the host multimedia signal using the
mathematical function defined by (5.1)
The scheme proposed here first divide the host audio signal into smaller
segments of size N. Then it computes 3 rd level DWT of each segment. Then the ca3
(Low pass filtered coefficients) coefficients are selected for embedding the watermark
and take the DCT coefficients of ca3 (3rd level DWT coefficients of host audio)
coefficients to add the watermark bit using the formula (5.19) described below.
′
A 3k (i) = A 3k (i) + α(k ) ⋅ r (i) ⋅ w (k )
(5.19)
where r(i) is the permuted pseudorandom binary signal of length L with zero
mean which is the secret key of the owner. The length L is equal to the length of the
3rd level DWT-DCT coefficients. α(k) is the adaptive scaling parameter computed for
kth segment in which watermark is required to be added and w(k) the watermark bit to
′
be embedded in kth segment. A 3k (i) is the DWT-DCT coefficient of the watermarked
signal for kth segment and A 3k (i) is the DWT-DCT coefficient of the original host
71
signal for kth segment. Solving the equation (5.19) and equation (5.9) for α(k).
⎛ A 3 2 (i) ⎞(10 )−(SNR /10)
⎜∑ k
⎟
∀i
⎝
⎠
α(k ) =
2
∑ (r (i ) w 2 (k ))
∀i
(5.20)
w(k) is the bipolar watermark which s either 1 or -1 therefore the value of
w2(k) is 1. The r(i) is a pseudorandom binary signal with zero mean with value of 1 or
-1 so Σ r2(i) = L. The equation (5.13) reduces to
⎛ A 3 2 (i) ⎞ −(SNR /10)
⎟10
⎜∑ k
∀i
⎠
⎝
α(k ) =
L
(5.21)
The audio watermarking scheme developed here computes the value of α(k)
for every subsection of host audio signal using the formula (5.21) which is later on
used to embed the watermark bit in each segment of the host audio using the formula
(5.19). The value of SNR to be used in computing α(k) is required to be selected by
user and should be in the range 40 – 60 dB as proved in section 5.3.4. Even though
2
the selected value of SNR is fixed is same for all segments, ∑ A 3k (i) will change for
∀i
each segment and hence α(k) will change for each segment.
5.4.1.
Watermark embedding
The process of the proposed watermark embedding is shown in the block
schematic of fig. 5.13. To embed the watermark signal in to the host audio signal
present scheme uses the additive watermarking method. The host audio signal is
divided into the segments of size N= 256, 512, 1024… etc
x k (i ) = x(k .N + i )
i=0,1,2…….N-1, k= 0,1,2…….
(5.22)
Where N=256,1024 X(i) represent the original host audio signal and Xk(i)
represent the kth segment of the host audio. Then each Xk(i) is decomposed to 3rd
level wavelet transform. To embed the watermark into low frequency part of the
highest energy of audio signal the 3rd level approximate part of coefficients is selected
and DCT of it is obtained. Then watermark is embedded by selecting the low
frequency DCT coefficients. The DWT-DCT coefficients are modified as
72
′
A 3k (i) = A 3k (i) + α(k ) ⋅ w (k ) ⋅ r (i)
(5.23)
Where A 3k (i) is DCT coefficient of ca3 coefficient of 3rd level DWT of xk (i )
where r(i) is the permuted pseudorandom binary signal with zero mean which is the
secret key of the owner, α(k)and w(k) are the scaling parameter and watermark bit to
′
be embedded in kth segment respectively and A 3k (i) are the watermarked coefficients.
The SNR between the original audio signal and the watermarked audio signal can be
computed by using formula (5.7) to measure the imperceptibility of the watermarked
signal.
Input host
audio x(n)
Divide host
into segment
of size N
Find start
of utterance
Watermarked
audio y(n)
Concatenate
the segments
Initial Seed
Binary
Watermark
Image
Dimension
mapping
Take third level
DWT of each
segment
DCT transform
the approximate
coeff. of DWT
Take IDCT and
then IDWT of
each segment
PN sequence
generation
Scaled watermark
Bipolar
conversion
Computation of
scaling parameter α(k)
for each segment
Fig.5.13. Watermark Embedding Process for proposed adaptive SNR based blind technique in DWTDCT domain.
5.4.2.
Watermark extraction
The watermark extraction procedure of the proposed scheme is depicted in the
fig 5.14. To extract the watermark from the embedded signal the 3rd level DWT of the
′
watermarked signal is computed. Then the DCT of coefficients A 3k (i) are modified
by the same pseudorandom signal r(i) used while embedding the watermark.
′
s(i) = A 3k (i) ⋅ r (i)
(5.24)
73
′
Where A 3k (i) = A 3k (i) + α (k ) ⋅ w (k ) ⋅ r (i) and s(i) low frequency DCT coefficients
modified by r(i).
∴ s(i) = (A 3k (i) + α(k ) ⋅ w (k ) ⋅ r (i))r (i)
(5.25)
∴ ∑ s(i) = ∑ (A 3k (i) + α(k ) ⋅ w (k ) ⋅ r (i))r (i)
∀i
∀i
∴ ∑ s(i) = ∑ A 3k (i)r (i) + ∑ α(k ) ⋅ w (k ) ⋅ r 2 (i)
∀i
∀i
(5.26)
∀i
The expected value of the first term in the equation (5.26) i.e. ∑ A 3k (i)r (i) is
∀i
approximately equal to zero and the value of ∑ r 2 (i) is N (α(k) ⋅ w(k) are independent
∀i
of variation i) therefore the value of the equation is approximately equal to N
α(k)w(k), where N is the size of the segment. If the value of ∑ s(i) is greater than
∀i
threshold the watermark bit one will be recovered and if the value of summation is
less than the threshold the watermark bit 0 will be recovered.
Watermarked
audio y(n)
Find start
of utterance
Divide host
into segment
of size N
PN sequence
generation
Initial Seed
Recovered
Watermark
Take third level
DWT-DCT of
each segment
Dimension
mapping
Threshold comparison
and bit recovery from
each segment
Fig.5.14 Watermark extraction for adaptive SNR based blind technique in DWT-DCT domain.
5.4.3.
Experimental Results
This section highlights on the experimental results obtained during the
implementation of the scheme. The results are obtained with the same set of audio
signals used in previous sections. Three level wavelet decomposition of each segment
of the host audio is performed. Then the low frequency coefficients (ca3) are selected
to embed the watermark. The watermark is embedded by modifying the low
frequency DCT coefficients of the ca3 coefficients as per equation (5.19). The α(k)
74
used in the equation (5.19) to add the watermark bit w(k) is computed using
equation(5.21) and the value of SNR considered in the equation(5.21) is 30 dB.
The subjective analysis test proved that there is no perceptible difference
between the host audio and watermarked audio. The signal to noise ratio (SNR)
between the host audio signal and watermarked audio signal is computed using
equation (5.9) and is shown in table 5.4. The BER between the original watermark
and recovered watermark is computed and is presented in table 5.4.
To check the robustness of the scheme the watermarked signal is passed
through different signal processing technique and then watermark is recovered. The
results are represented in the table 5.4. The original watermark embedded in the signal
is shown in fig 5.15 a, whereas the recovered watermark after different signal
processing attack is shown in fig 5.15 b - fig 5.15 h.
Table 5.4 SNR between the original audio signal and watermarked audio signal, BER of
recovered watermark
SNR
31.3456
18.4872
18.5820
31.3456
31.3456
25.4691
17.8538
16.6834
31.3456
31.3456
Without attack
Downsampled
Upsampled
Mp3compressed
Requantization
Cropping
Lpfiltered fc=22050
Time warping -10%
Echo addition
Equalization
a) Without attack
d)Mp3compress
BER
0
0.4531
0.0303
0
0
0.0162
0
0.0137
0
0
b) Down Sampling
c) Upsample
e) Requantized
f) Cropped
g) Time warping
h) Lpfiltered22
75
Fig 5.15 Results for SNR based scheme for blind detection in DWT-DCT domain
From the results presented in table 5.4 it is clear that the techniques
implemented here is robust against the various signal processing attacks such as
resampling, requantization, mp3compression and cropping. It is also observed that the
technique is robust against LP filtering with cutoff frequency of 22 KHz. From the
results presented in table 5.4 it is clear that SNR between the original audio and
watermarked audio after different signal processing satisfies the audio watermarking
requirements provided by IFPI [28]. The BER (bit error rate) test between the original
watermark and recovered watermark indicate that the technique is able to recover the
watermark after different signal processing attacks.
The algorithm is tested for 1000 different keys and the PDF (probability
density function) of SNR is shown in fig 5.16. The PDF of SNR in fig 5.16 computed
for 1000 different keys indicate that the SNR between original signal and
watermarked signal is always greater than 28.67. That is the performance of the
proposed algorithm is very good. It always provides better imperceptibility between
the original signal and watermarked signal and the imperceptibility is not dependent
on the key used to embed the watermark. Watermark cannot be recovered without the
knowledge of the key used.
Fig 5.16 PDF of SNR
76
5.5 Comparison chart of spread spectrum audio watermarking technique
implemented in this chapter.
Non blind SNR
based scheme
Without attack
Downsampled
Upsampled
Mp3compressed
Requantization
Cropping
Lpfiltered
Time warping 10%
Echo addition
Equalization
SNR in
db
52.2863
28.5660
52.2859
52.2859
28.5660
28.9923
20.3328
11.4711
0.0078
0.0078
0.0078
0.0078
0.0078
0.0107
0.0078
0.0078
Proposed blind
technique in
DWT
SNR in BER
db
21.6109 0
21.6109 0
21.6109 0
21.6109 0
21.6109 0
21.6102 0.0068
18.8769 0.0009
11.1957 0.0078
28.2335
28.2335
0.0127
0.0088
17.4780
19.1299
BER
0.0156
0
Proposed blind
technique in
LWT
SNR in BER
db
21.2621 0
21.2621 0
21.2621 0
21.2621 0
21.2621 0
20.3316 0.0078
16.2178 0.0029
10.2965 0.0089
Proposed blind
technique in
DWT-DCT
SNR in BER
db
31.7109 0
31.7109 0
31.7109 0
31.3456 0
31.3456 0
31.7056 0.0156
27.9929 0
20.0535 0.0186
17.1564
18.2978
16.9394
19.1391
0.0163
0
0.0049
0
The results of all the SNR based schemes implemented in this chapter are
summarized in table 5.5. These results are obtained for the segment length=256 and
SNR=60 required in computation of α(k). It is clear from the table that the schemes
implemented here are robust against the various signal processing attacks with little
variation in the BER. Though the SNR in blind detection using DWT/LWT method is
less compare to the non blind technique, there is no audible difference observed
between the original host audio signal and the watermarked audio signal. By
implementing the method in DWT-DCT domain the imperceptibility is increased. The
watermark is robust except for time warping, cropping and echo addition. To further
improve the robustness and we propose to implement the technique using cyclic
coding.
5.5. Proposed SNR based blind technique using cyclic coding
To improve the detection performance and to make the system intelligent we
propose to encode the watermark using cyclic coder before embedding in to the host
audio signal. The block schematic of this scheme is shown in fig 5.17 the watermark
to be embedded is first encoded by Cyclic encoder and then it is embedded in to the
host signal. At the decoder side the recovered watermark is decoded using cyclic
decoder and then it is recovered. The robustness results obtained using this method is
presented in the table 5.6. From the table it is clear that the detection performance is
significantly increased even in 10% time scaling, echo addition and low pass filtering.
77
Key
Key
Attack
Watermark encoder
Host
Audio
Watermark Decoder
Encoding using
cyclic codes
Encoding using
cyclic codes
Watermark
Dimension
mapping
Dimension
mapping
Recovered
Watermark
Fig 5.17 Improved encoder and decoder for blind watermarking using cyclic coding.
Table 5.6 results obtained for (6,4) cyclic codes
Without attack
Downsampled
Upsampled
Mp3compressed
Requantization
Cropping
Lpfiltered fc=22050
Time scaling -10%
Echo addition
Equalization
SNR
32.2044
32.2044
32.2044
32.2044
32.2044
30.2458
29.1862
21.2377
16.9391
21.0204
BER
0
0
0
0
0
0.0110
0
0.0029
0.0039
0
Table 5.7 results obtained for (7, 4) cyclic codes
Without attack
downsampled
upsampled
Mp3compressed
requantization
cropping
Lpfiltered fc=22050
Time scaling -10%
Time scaling +10%
Echo addition
Equalization
SNR
28.5381
28.3982
28.3982
28.3982
28.3982
22.3475
26.9378
23.5560
21.2156
16.0376
20.8502
BER
0
0
0
0
0
0.0094
0
0
0.0018
0.0021
0
It is clear from the table 5.6and 5.7 that the encoder and decoder model
proposed using cyclic coding is more robust compare to all the techniques proposed.
78
With small sacrifice in imperceptibility test the (7, 4) cyclic coder provides the best
robustness test as our main aim is to embed the watermark which sustain all kinds of
attacks and recover the watermark successfully.
5.6. Summary of chapter:
Adaptive SNR based schemes are proposed in this chapter. The main goal of
proposing these methods is to propose the blind watermark techniques which are
robust against various signal processing attacks. From the comparison of the results
presented in table 5.5 one can say that non blind technique has better SNR compared
to the blind detection scheme, but it requires the original audio signal to recover the
watermark. It can also be observed that blind techniques are robust against various
signal processing attacks. The scaling parameter used to embed the watermark is
varied adaptively for each segment in order to achieve the imperceptibility and to take
the advantage of insensitivity of HAS to smaller variations in transform domain. To
provide the security to the system the secret key is used which is a PNR sequence
generated by any cryptographic method. Without the knowledge of this secret key it is
not possible to recover the watermark.
The DWT-DCT based blind detection technique is more robust than
DWT/LWT based blind detection technique and is more imperceptible. The SNR
between original signal and watermarked signal is improved by DWT-DCT
technique. The multiresolution characteristics of discrete wavelet transform (DWT)
and the energy-compression characteristics of discrete cosine transform (DCT) are
combined to improve the transparency of digital watermark [26]. By computing the
DCT of 3-level DWT coefficients we take advantage of low-middle frequency
components to embed the information.
To further improve the detection convergence and to make the system more
secure the idea of encoding the watermark using cyclic coding is proposed due to this
the attacker will not be able to understand the statistical behavior of the embedded
watermark and will also be able to correct the one bit error. The results obtained for
watermark robustness using cyclic encoder are better than the methods not using any
such encoder.
79
Chapter 6
Adaptive
watermarking by
GOS
modifications
Chapter 6 Adaptive watermarking by modifications of GOS
Introduction
The audio watermarking techniques can be categorized into three main groups:
SS-based coding [12-18], echo-hiding [4], and phase coding [30]. Although the SS
method is the most popular in the literature, the pseudorandom sequence used for
watermarking may be audible to human ears; even though its power is low. However,
two distinct time-offsets for “1” and “0” in echo hiding methods frequently cause the
watermarks to be audible and require tradeoffs to be made with the echo volume
(reducing the robustness). The transform-domain (Fourier transform, discrete cosine
transform, subband, or cepstrum) methods: 1) take advantage of humans’ insensitivity
to phase variation; 2) are compatible with techniques used in audio compression; and
3) involve little variation in transform coefficients due to disturbances. The phasecoding method may be sensitive to misalignment of reference points due to the phase
shifts that it theoretically causes. However, controlling and predicting variations in the
time-domain audio waveform are difficult when disturbances are added directly in the
transform domain.
The method tackled in this chapter directly modifies the audio waveform to
embed watermarks. Unlike conventional algorithms, which add modulated and scaled
PN-sequences or other disturbing patterns, no basic form of the watermark signal is
set in advance. Instead, segments of host audio signals are modified into the “1” or
“0” state based on the principle of differential amplitudes. Patchwork algorithms work
in a similar manner. The watermark disturbance is actually strongly related, or similar
to, the host signal itself. It is just like an echo signal with a zero time delay, thus
yielding a high degree of watermark transparency. Besides, the proposed embedding
process is performed subject to the frequency-masking test of watermark signals so
that disturbances of the host audio signal are beyond the ability of human ears to
perceive.
The first section of this chapter gives the overview of audio watermarking
technique [23] which modifies the group of samples (GOS) to embed the watermark
in time domain. The section second introduces proposed adaptive audio watermarking
technique which modifies the GOS in transform domain. The third section provides
the results obtained by implementing the proposed scheme in DWT/DWTDCT/DCT
domains. The parts of the results of this scheme are published in paper XI.
81
6.1. Introduction to Audio watermarking technique based on GOS
modification in time domain:
Conventional watermarking algorithms can be categorized as operating on
individual samples or a GOS [23]. The former are preferred for high embedding
capacities (one sample used for hiding one watermark message), but suffers from
potentially heavy and uncontrolled noisiness. The latter often modify samples in a
group such that the considered GOS will have a particular feature value and any
individual corruption of the watermark in a sample does not necessarily cause a
wrong decision concerning watermark retrieval. The method explained in this chapter
is proposed by Lie et al [23] and is implemented by the authors in time domain. As
the modifications done in time domain directly affect the perceptual quality of the
signal and is immune to the noise disturbances. So we have proposed the similar kind
of technique in transform domain. Our main goal is to propose the intelligent encoder
and decoder model of audio watermarking which is adaptive, blind and robust
keeping in mind the imperceptibility
To implement the algorithm using GOS modification one “state”, or one
feature value, can be specified to embed each type of data (for example, “1” or “0”).
Keeping sufficient distance between the “0” and “1” states will typically enhance
robustness against pirate attacks. The main point of efficient watermarking is to
optimize audio quality for a given distance between two states. First, the GOS and
binary states are defined as follows.
Definition 1: The complete audio signal is partitioned into consecutive GOS’s, each
containing three nonoverlapping sections of samples. These three sections have equal
or unequal lengths L1, L2 and L3, respectively, and denoted as sec_1, sec_2, and
sec_3, as shown in Fig. 1. Hence, a GOS contains samples.
Definition 2: One watermark message represents one binary bit of value 0 or 1,
embedded in one GOS.
Definition 3: Average of Absolute Amplitudes (AOAA) is chosen as the feature for
each section of samples. Embedding a watermark message depends on the differential
AOAA relations among sec_1, sec_2, and sec_3. AOAA items are computed as
E i1 =
1 L1 −1
⋅ ∑ f (L ⋅ i + x )
L1 x =0
(6.1)
82
1 L1 + L 2 −1
⋅ ∑ f (L ⋅ i + x )
L 2 x =L1
L −1
1
⋅ ∑ f (L ⋅ i + x )
E i3 =
L 3 x = L1 + L 2
Where i represents the GOS index i=0, 1, 2, 3……………..
Ei2 =
(6.2)
(6.3)
Definition 4: After E i1 , Ei 2 and E i3 are sorted, they are renamed as E max , E min
and Emid , according to their computational value respectively. The differences
between them are computed to obtain
A = E max − E mid
(6.4)
B = E mid − E min
(6.5)
The state “1” represents A ≥B and otherwise (B<A) state is “0”.
6.1.1. Rules of watermark embedding
The proposed embedding scheme is based on the following rules.
To embed watermark bit “1”: If (A-B ≥ THD1), then no operation is
performed. Else increase E max and decrease Emid by the amount δ, so that the above
condition is satisfied.
To embed watermark bit “0”: If (B-A ≤ THD1), then no operation is
performed. Else increase Emid and decrease E min by the same amount δ so that the
above condition is satisfied.
The above rules state that when the status of a GOS does not confirm to the
state definition (subject to a threshold), the AOAA values of two selected sections
will be modified so that the GOS’s status is changed into a conformable one.
6.1.2. Watermark extraction:
The algorithm for watermark retrieval is simple and straightforward.
Assuming that the start point of data embedding has been recognized and the section
lengths L1, L2 and L3 are known, every three consecutive sections of samples are
grouped as a GOS and examined to extract the watermark. The AOAA values E′i1 , E′i 2
and E′i3 are computed for the ith GOS, as in (6.1)–(6.3)
E′i1 =
1 L1 −1
⋅ ∑ f ′(L ⋅ i + x )
L1 x =0
(6.6)
83
1 L1 + L 2 −1
(6.7)
⋅ ∑ f ′(L ⋅ i + x )
L 2 x = L1
L −1
1
(6.8)
⋅ ∑ f ′(L ⋅ i + x )
E′i 3 =
L 3 x = L1 + L 2
where f ′(x ) is the watermarked signal E′i1 , E′i 2 and E′i3 are then ordered to yield E′max ,
E′i 2 =
E′min and E′mid . Their differences are
A′ = E′max − E′mid
(6.9)
B′ = E′mid − E′min
(6.10)
Comparing A′ and B′ yields the retrieved bit “1” if A′ ≥ B′ , and “0” if A′ < B′ . This
process is repeated for every GOS to determine the entire embedded bit stream.
Clearly this scheme is blind meaning that the watermarks can be recovered without
using the original host audio signal.
6.2. Proposed adaptive watermarking using GOS modifications in
transform domain:
The scheme proposed in this section is the modified version of the scheme
highlighted in section 6.1. As stated earlier our main goal is to embed the watermark
in such way that it sustain all kinds of attack and also provide good imperceptibility.
The scheme in section 1 modifies the GOS in time domain. Practically, linear
amplification or attenuation is not the only way to modify AOAA, but the simplest
way to retain signal waveforms and to alleviate degradations that result from random
disturbing noise. The underlying embedding scheme addressed above is actually not
feasible due to signal discontinuities after amplitude scaling is performed near section
boundaries. (Notably, only two out of the three sections are scaled and adjacent
sections may have different scale factors.) These discontinuities will then cause
“click” sounds that are perceivable to human ears. This problem is solved by adopting
a progressive scaling scheme to make the audio waveform continuous and smooth or
modifying the GOS in transform domain. We have concentrated on modifying the
samples in transform domain.
In the above scheme only two out of the three sections are scaled and adjacent
sections have different scale factors. In the proposed scheme according to the
embedding rule the entire GOS is modified. Transform domain amplitude variations
84
and modification of entire GOS in the proposed scheme avoids discontinuities and
improve the SNR between the original host audio and watermarked audio.
6.2.1. Proposed blind watermarking using GOS modification in DWT
domain:
The method we propose in this subsection is implemented in DWT domain.
First the complete audio signal is partitioned into consecutive GOS each containing
two non overlapping sections of equal/ unequal length. To implement cryptographic
method the length of sections can be kept unequal of size L1, L2 selected randomly for
each GOS based on cryptographic key. Hence a GOS contains L = L1 + L2 samples.
One watermark message represents one binary bit of value 0 or 1, embedded in one
GOS. To embed one watermark message the mean of two sections of each GOS is
considered as a feature space and is modified.
To embed the watermark signal in to the host audio signal the host audio
signal is first partitioned into the GOS’s of size L.
x k (i) = x(k.L + i)
i=0,1,2…….L-1, k=0,1,2…….
(6.11)
Where L=256, 512, 1024…etc. x(i) represent the original host audio signal
and xk(i) represent the kth segment of the host audio. Decompose each audio segment
into 3-level discrete wavelet transform using Haar wavelet. Modify the low frequency
coefficients (ca3) to embed the watermark according to the rule defined below
Then define A = mean(ca 3 k (i)) 0 ≤ i ≤ L1 − 1
(6.12)
and B = mean(ca3 k (i))
(6.13)
L1 ≤ i ≤ L 2 + L1 - 1
Where L1 the length of first subsection and L2 is the length of second
subsection of DWT coefficients. The lengths L1 and L2 are required to be selected by
cryptographic method. N the length of DWT coefficients equal to L/23 should be
greater than L 2 + L1 .
To embed the watermark bit 1 if A≤B then no operation is performed and if
A>B then decrease A and increase B by the amount δ so that the condition A≤B
satisfies.
To embed the watermark bit 0 if A>B then no operation is performed and if
A<B then increase A and decrease B by the amount δ so that the condition A>B
85
satisfies. The Block schematic to embed the watermark using GOS modifications is
shown in Fig 6.1.
Input x
Segmentation
DWT
transformation
W
Selection of
length L1 & L2
Dimension
mapping
Scaling
parameter K
Computation
of mean A &
mean B
Watermark
insertion
IDWT
transformation
Watermarked
signal
Concatenation
of Segments
Fig 6.1 Block schematic of GOS based watermark embedding in DWT domain.
As per the above rules of watermark embedding if the status of GOS does not
confirm the state definition then the sections will be modified so that GOS status is
changed to the required condition. The amount δ used to modify the sections can be
selected by using following expression.
δ≥
(A − B) × K
(6.14)
2
where the parameter K is selected in such a way that the modified signal
remains imperceptible. We have tested the performance of the system for various
values of K and then propose the range of the K in which we get the optimized
performance of the system.
To detect the watermark from the embedded signal the DWT transformation
of the each GOS of watermarked signal is performed. Then the parameters A´ and B´
are computed as explained in previous section.
A′ = mean(ca ′3k (i)) 0 ≤ i ≤ L1 − 1
(6.15)
B′ = mean(ca ′3 k (i))
(6.16)
L1 ≤ i ≤ L1 + L 2 − 1
86
Where ca ′3k (i) are 3rd level DWT coefficients of kth GOS of watermarked signal. A´
and B´ yields the retrieved bit as 1 if A´≤ B´ and 0 if A´ >B´. This process is repeated
for every GOS to determine the entire stream of embedded watermark. The detailed
procedure of watermark extraction is depicted in the block schematic of fig 6.2. This
proposed scheme is blind because it does not require the original host signal to
recover the embedded watermark. To measure the similarity between the retrieved
signal and the original signal the bit error rate (BER) is computed.
BER = w ⊕ w r
(6.17)
⊕ is X-OR operation between original watermark w and retrieved watermark wr.
Watermarked
Segmentation
signal
DWT
transformation
Selection of
length L1 & L2
Recovered Concatenation
watermark and Dimension
ŵ
recovery
Computation
of mean A′
& mean B′
Watermark bit
recovery
Fig 6.2 Block schematic of GOS based watermark extraction in DWT domain.
Selection criterion of K
We have tested the performance of the system by varying the value of K from
0.5 to 3 and observed the imperceptibility parameter SNR and Robustness parameter
BER. The graphs of SNR Vs K and BER Vs K are shown in fig. 6.2. The SNR
between the original signal and the watermarked signal can be computed by using
formula (5.2) to measure the imperceptibility of the watermarked signal and the BER
is computed by expression (6.17).
87
(a)
SNR Vs k without attack
(b)
BER Vs K
Fig 6.3 (a) Variation in SNR with factor k (b) BER Vs K
Fig.6.3.(a) shows the variation in SNR for various values of K without
attacking the watermarked signal. It is clear from the fig that the SNR decreases
exponentially with increase in value of K whereas as for smaller values of K though
SNR is increased the BER increases. Plot in fig 6.3 b is showing the variations in
BER for Harmonium signal as Harmonium signal is very immune to noise compare to
other musical signals because of its higher frequency than the other signals. From the
observations of fig 6.3 (a) and 6.3 (b) it is clear that the optimized value of K is
obtained from 0.75 to 1.5 because in this range BER is in between 0.05 to 0.04.
88
Experimental results:
The results obtained for this technique implemented in DWT domain are
presented in table 6.1. The results are obtained for various Indian musical signals like
Song, Tabla, Harmonium, Bhimsen Joshi’s Classical music. These signals are in
windows media format and are PCM coded with sampling rate of 44100 Hz, 8 bit
single channel, 6.5 sec signals.
These results are obtained for segment length = 256, for simplicity of
implementation we have considered L1=L2= 16. To test the robustness of the scheme
the watermarked signal is passed through common signal processing attacks and are
summarized in table 6.1, the recovered watermarks obtained from watermarked
version of ‘Tabla’ are shown in fig. 6.4.a to 6.4.g.
(a)Without attack
(b) Down-Sampling
(d)Mp3compress
(h) Lpfiltered22
(f) Cropped
(c) Up-sample
(e) Requantized
(g) Time warping
Fig. 6.4 Results of robustness test recovered watermark in DWT domain
TABLE 6.1
SNR BETWEEN THE ORIGINAL AUDIO SIGNAL AND WATERMARKED AUDIO SIGNAL, BER OF RECOVERED WATERMARK
Song.wav
SNR
Without attack
31.4344
Downsampled
31.5231
Upsampled
31.1987
Mp3compressed 31.4344
Requantization
31.4344
Cropping
31.4344
Lpfiltered
29.7558
fc=22050
Time warping 24. 8741
10%
Echo addition
18.7530
Equalization
19.9434
BER
0
0.0098
0.0098
0.0098
0.0098
0.0154
0.0210
Tabala.wav
SNR
BER
31.1032 0
30.4534 0.0184
30.0056 0.0184
31. 1032 0.0039
31. 1032 0.0039
31.2989 0.0195
30.5924 0.0214
Bhimsenbahar.wav
SNR
BER
31.5349
0
30.7345
0.0103
30.7345
0.0103
31. 5349
0.0103
31. 5349
0.0103
31.1489
0.0242
30.6054
0.0542
Harmonium.wav
SNR
BER
38.9913 0.0025
36.6782 0.0157
36. 6782 0.0157
38. 9913 0.0157
38. 9913 0.0157
37.7654 0.0285
36. 6782 0.0801
0.0838
24.7654
0.0894
21.6434
0.0929
31.6576
0.3164
0.0465
0.0386
22.8352
23.3258
0.0953
0.0402
17.0064
18.5831
0.0387
0.0362
16.5880
27.0980
0.1088
0.0456
89
6.2.2. Proposed blind watermarking using GOS modification in DCT
domain.
This section highlights on the implementation of the scheme in DCT domain.
To embed the watermark into low frequency part of audio signal by taking advantage
of frequency mask effect of HAS the DCT of each GOS is obtained. Each xk(i) is
DCT transformed. Then the DCT coefficients of each GOS are sub sectioned into two
non overlapping sections of equal /unequal length. The watermark embedding and
extraction for each GOS is done as explained in the previous section. In place of
computing DWT in this method DCT of each GOS is computed and then the
modifications for embedding each bit is done according to the rule defined in previous
section.
To embed the watermark signal in to the host audio signal, the host audio
signal is first partitioned into the GOS’s of size L using equation (6.11). Compute the
DCT of each GOS xk(i) to obtain DCT coefficients Ck(i). To embed the watermark
states define the mean A and B similarly defined in (6.12) and (6.13).
A = mean(C k (i)) 0 ≤ i ≤ L1 − 1
(6.18)
B = mean(C k (i)) 0 ≤ i ≤ L 2 + L1 − 1
(6.19)
Where L1 the length of first subsection and L2 is the length of second
subsection of DCT coefficients.
To test the imperceptibility of watermarked signal the signal to noise ratio
(SNR) between the original signal and the watermarked signal is computed using
equation (5.2). To measure the similarity between the original watermark and
recovered watermark the BER is computed using (6.17). The observed results during
experimentation are presented in table 6.2.
To embed the watermark bit 1 if A≤B then no operation is performed and if
A>B then decrease A and increase B by the amount δ so that the condition A≤B
satisfies.
To embed the watermark bit 0 if A>B then no operation is performed and if
A<B then increase A and decrease B by the amount δ so that the condition A>B
satisfies. The Block schematic to embed the watermark using GOS modifications is
shown in Fig 6.5 and the schematic to extract the watermark is shown in fig 6.6.
90
Input x
Segmentation
DCT
transformation
W
Selection of
length L1 & L2
Computation
of mean A &
mean B
Dimension
mapping
Watermark
insertion
Scaling
parameter K
IDCT
transformation
Watermarked
signal
Concatenation
of Segments
Fig 6.5 Block schematic of GOS based watermark embedding in DCT domain.
Watermarked
Segmentation
signal
DCT
transformation
Selection of
length L1 & L2
Recovered Concatenation
watermark and Dimension
ŵ
recovery
Computation
of mean A′
& mean B′
Watermark bit
recovery
Fig 6.6 Block schematic of GOS based watermark extraction in DCT domain.
To detect the watermark from the embedded signal the DCT transformation of
the each GOS of watermarked signal is performed. Then the parameters A´ and B´ are
computed as explained in previous subsection.
A′ = mean(C′k (i)) 0 ≤ i ≤ L1 − 1
(6.20)
B′ = mean(C′k (i)) 0 ≤ i ≤ L 2 + L1 − 1
(6.21)
C′k (i) are DCT coefficients of kth segment of watermarked signal. A´ and B´
yields the retrieved bit as 1 if A´≤ B´ and 0 if A´ >B´. This process is repeated for
every GOS to determine the entire stream of embedded watermark.
To check the robustness of the scheme the watermarked signal is passed
through different signal processing technique such as low pass filtering, resampling,
requantization, mp3compression, time scaling, cropping etc. The results are
represented in the table 6.2. The original watermark embedded in the signal is shown
in Fig 6.7 a, the recovered watermark after various signal processing attacks is shown
91
in Fig 6.7 b to Fig. 6.7 g. These results are obtained for Tabla.wav. The algorithm is
tested for various Indian musical signals like Hindi song, Tabla, Harmonium and
Bhimsen Joshi’s classical.
(a)Without attack
(b) Down-Sampling
(d)Mp3compress
(h) Lpfiltered22
(f) Cropped
(c) Up-sample
(e) Requantized
(g) Time warping
Fig. 6.7 Results of robustness test recovered watermark signal DCT domain.
TABLE 6.2
SNR BETWEEN THE ORIGINAL AUDIO SIGNAL AND WATERMARKED AUDIO SIGNAL, BER OF RECOVERED WATERMARK
Song.wav
SNR
Without attack
44.5607
Downsampled
44.5607
Upsampled
44.5607
Mp3compressed 44.5607
Requantization
44.5607
Cropping
44.5607
Lpfiltered
31.8993
fc=22050
Time warping 23.4781
10%
Echo addition
17.0730
Equalization
19.3434
BER
0
0.0029
0.0029
0.0029
0.0029
0.0049
0.0238
Tabala.wav
SNR
BER
54.4350 0
54.4350 0.0137
54.4350 0.0137
54.4350 0.0137
54.4350 0.0137
54.4350 0.0079
44.0477 0.0379
Bhimsenbahar.wav
SNR
BER
43.2635
0.0039
43.2635
0.0039
43.2635
0.0039
43.2635
0.0039
43.2635
0.0039
43.2635
0.0056
30.7531
0.0246
Harmonium.wav
SNR
BER
61.0525 0
61.0525 0.0137
61.0525 0.0137
61.0525 0.0137
61.0525 0.0137
61.0525 0.0263
45.8223 0.0371
0.0938
25.4836
0.0840
22.5946
0.0929
33.8252
0.5098
0.0598
0.0313
22.8352
23.3258
0.0823
0.392
17.0064
18.5831
0.0637
0.0402
16.5880
27.0980
0.2988
0.0840
The results in the table represent that use of DCT domain improves the
imperceptibility but decreases the robustness especially for the signals like
Harmonium and Tabla for attacks like time warping, echo addition. As the frequency
of these signals is high compare to the voice signal. We observed the improved
robustness for the case of song and classical music as it involves the low frequency
voice signals along with the instrumental signals.
The results obtained with the variation in segment length along with number
of DCT coefficients modified presented in table 6.3. From the table it is clear that
though the SNR goes on improving with the smaller size of the segment, but the
92
watermark does not survive after time scaling and low pass filtering attacks.
Therefore for the better robustness results it is required that the segment length should
be greater than 128. To keep the trade off between the survivals of the mark and the
SNR we kept segment length is equal to 256 in our implementations. To implement
good cryptographic method the length can be larger. It is also observed that if lengths
L1 and L2 are kept unequal there is variation in SNR but no effect on BER. The main
concentration of the work is on the robustness and if lengths are kept unequal the
robustness is not varied therefore for simplicity we kept equal lengths for our further
observations.
TABLE 6.3
RELATION BETWEEN SEGMENT LENGTH VARIATION AND NO. OF COEFF. MODIFIED WITH SNR AND BER
Segment length=256
BER
without
attack
Length of DCT
coeff. Modified
BER
without
attack
Robustness
Length of DCT
coeff. Modified
SNR
256( L1=L2= 128)
39.313
0
L1= 147, L2=107
38.4747
0
Robust
128( L1=L2= 64)
36.548
0
L1= 73, L2=55
36.1779
0
Robust
SNR
Segment length=128
128( L1=L2= 64)
39.529
0
L1=73, L2=55
38.4975
0
Robust
96( L1=L2= 48)
39.023
0
L1= 59, L2=37
38.6325
0
Not sustained
in time
scaling
Segment length=64
64( L1=L2= 32)
44.377
0
L1=37, L2=27
44.4504
0
32( L1=L2= 16)
41.464
0
L1=19, L2=13
41.7952
0
16( L1=L2= 8)
38.44
0
L1= 9, L2=7
38.2531
0
Not sustained
in time
scaling,
lpfiltering
Not sustained
in time
scaling,
lpfiltering
Not sustained
in time
scaling,
lpfiltering
Segment length=32
32( L1=L2= 16)
48.917
0
L1=19, L2=13
48.8208
0
16( L1=L2= 8)
46.01
0
L1= 9,L2=7
45.3319
0
Not sustained
in time
scaling,
lpfiltering
Not sustained
in time
scaling,
lpfiltering
Segment length=16
16( L1=L2= 8)
47.91
0
L1=9,L2=7
47.3868
0
Not sustained
in time
scaling,
lpfiltering
93
6.2.3. Proposed blind watermarking technique implemented in
DWTDCT domain:
The results of DWTDCT domain technique are presented in this subsection.
Each GOS is first DWT transformed to three level wavelet decomposition, and then
the ca3 coefficients are selected to embed the watermark. The ca3 coefficients are DCT
transformed in a similar manner how DWT-DCT transformation is done in section
5.4. Watermark embedding and extraction schematics are depicted in fig.6.8 and 6.9
respectively. To obtain the comparison between the three techniques implemented the
same set of audio signal are used for experimentation. The embedding and extraction
equations involved in this procedure are
A = mean(C k (i)) 0 ≤ i ≤ L1 − 1
(6.22)
B = mean(C k (i)) 0 ≤ i ≤ L 2 + L1 − 1
(6.23)
A′ = mean(C′k (i)) 0 ≤ i ≤ L1 − 1
(6.24)
B′ = mean(C′k (i)) 0 ≤ i ≤ L 2 + L1 − 1
(6.25)
C k (i)
are DWT-DCT coefficients of xk(i) and C′k (i)are DWT-DCT
coefficients of kth segment of watermarked signal. L1=L2= 16
Input x
Segmentation
DWT
transformation
Followed by
DCT
W
Selection of
length L1 & L2
Dimension
mapping
Scaling
parameter K
Computation
of mean A &
mean B
Watermark
insertion
IDWT followed
by IDCT
transformation
Watermarked
signal
Concatenation
of Segments
Fig 6.8 Block schematic of GOS based watermark embedding in DWT-DCT domain.
94
Watermarked
Segmentation
signal
Selection of
length L1 & L2
DWT-DCT
transformation
Recovered Concatenation
watermark and Dimension
ŵ
recovery
Computation
of mean A′
& mean B′
Watermark bit
recovery
Fig 6.9 Block schematic of GOS based watermark extraction in DWT-DCT domain.
(a)Without attack
(b) Down-Sampling
(d)Mp3compress
(h) Lpfiltered22
(f) Cropped
(c) Up-sample
(e) Requantized
(g) Time warping
Fig. 6.10 Results of robustness test recovered watermark signal.
The robustness test results are presented in Table 6.4 and the recovered
watermarks from watermarked signal after various signal processing modifications
are shown in Fig 6.10.a to Fig 6.10.i.
TABLE 6.4
SNR BETWEEN THE ORIGINAL AUDIO SIGNAL AND WATERMARKED AUDIO SIGNAL, BER OF RECOVERED WATERMARK
Song.wav
SNR
Without attack
41.7059
Downsampled
41.7059
Upsampled
41.7059
Mp3compressed 41.7059
Requantization
41.7059
Cropping
41.7059
Lpfiltered
31.6594
fc=22050
Time warping 23.6284
10%
Echo addition
17.7346
Equalization
19.8
BER
0
0.0020
0.0020
0.0020
0.0020
0.0038
Tabala. wav
SNR
BER
38.9251 0
38.9252 0.0039
38.9252 0.0039
38.9252 0.0039
38.9252 0.0039
38.9252 0.0059
Bhimsenbahar.wav
SNR
BER
31.5349
0
30.7345
0.0029
30.7345
0.0029
31. 5349
0.0029
31. 5349
0.0029
31.1489
0.0046
Harmonium.wav
SNR
BER
38.9913 0
36.6782 0.0043
36. 6782 0.0043
38. 9913 0.0043
38. 9913 0.0043
37.7654 0.0068
0.0130
37.8637
0.0156
30.6054
0.0149
36. 6782
0.0253
0.0734
25.3734
0.0840
17.6704
0.0805
17.8623
0.0853
0.0204
0.0313
22.8352
23.3258
0.0523
0.0492
31. 5349
31. 5349
0.0042
0.0037
38. 9913
38. 9913
0.0013
0.0013
95
The results presented in table 6.4 indicate that the scheme implemented in
DWT-DCT domain is more robust compare to all the schemes proposed in this
chapter. As our main aim of present work is to achieve the better robustness test
results it clear from the techniques implemented and tested in this chapter as well as
in chapter5 that the DWT-DCT domain is the more suitable domain to embed the
watermark. Using DWT-DCT domain we take the advantage of low-middle frequency
regions to embed the watermark.
It can also be observed that embedding using GOS modification increases the
imperceptibility of the watermarked signal compare to the spread spectrum method.
But in spread spectrum based technique robustness is increased. The comparison for
song signal is shown in the table 6.5. In order to further increase the robustness of
DWT_DCT based watermarking scheme we propose to embed the watermark using
cyclic coder.
Table 6.5 Comparison between the proposed GOS based scheme and proposed SS based scheme
Song.wav
Without attack
Downsampled
Upsampled
Mp3compressed
Requantization
Cropping
Lpfiltered fc=22050
Time warping -10%
Echo addition
Equalization
GOS based scheme
SNR
BER
41.7059
0
41.7059
0.0020
41.7059
0.0020
41.7059
0.0020
41.7059
0.0020
41.7059
0.0038
31.6594
0.0130
23.6284
0.0734
17.7346
0.0204
19.8
0.0313
SNR
SS based scheme
BER
31.7109
31.7109
31.7109
31.3456
31.3456
31.7056
27.9929
20.0535
16.9394
19.1391
0
0
0
0
0
0.0056
0
0.0186
0.0049
0
6.3. Proposed GOS based blind technique using cyclic coding
To improve the detection performance and to make the system intelligent we
propose to encode the watermark using cyclic coder before embedding in to the host
audio signal. The block schematic of this scheme is shown in fig 6.11 the watermark
to be embedded is first encoded by cyclic encoder and then it is embedded in to the
host signal. At the decoder side the recovered watermark is decoded using cyclic
decoder and then it is recovered. The robustness results obtained using this method is
presented in the table 6.6. From the table it is clear that the detection performance is
significantly increased even in 10% time scaling, echo addition and low pass filtering.
96
Key
Key
Attack
Watermark encoder
Host
Audio
Watermark Decoder
Encoding using
cyclic codes
Encoding using
cyclic codes
Watermark
Dimension
mapping
Dimension
mapping
Recovered
Watermark
Fig 6.11 Improved encoder and decoder for GOS based blind watermarking using cyclic coding.
Table 6.6 results obtained for (7,4) cyclic codes
Song.wav
SNR
Without attack
38.7784
Downsampled
38.7784
Upsampled
38.7784
Mp3compressed 38.7784
Requantization
38.7784
Cropping
38.7784
Lpfiltered
31.6594
fc=22050
Time warping 23.6284
10%
Echo addition
19.8
Equalization
41.7059
0
0
0
0
0
0.0038
Tabala. wav
SNR
BER
36.4719 0
36.4719 0.0039
36.4719 0.0039
36.4719 0.0039
36.4719 0.0039
36.4719 0.0059
Bhimsenbahar.wav
SNR
BER
28.8574
0
28.8574
0
28.8574
0
28.8574
0
28.8574
0
28.9863
0.0048
Harmonium.wav
SNR
BER
35.8765 0
35.8765 0.0022
35.8765 0.0022
35.8765 0.0022
35.8765 0.0022
35.1187 0.0058
0.0098
33.4554
0.0156
26.4356
0.0102
31.2845
0.0187
0.0107
22.8455
0.0840
16.2565
0.0214
14.6572
0.0657
0.0089
0.0098
21.6675
22.4543
0.0523
0.0492
25.9766
27.6252
0.0038
0.0015
35.8765
35.8765
0
0
BER
It is clear from the table 6.6 that the encoder and decoder model proposed
using cyclic coding is more robust compare to all the techniques proposed. With small
sacrifice in imperceptibility test (Computed SNR) the (7, 4) cyclic coder provides the
best robustness test as our main aim is to embed the watermark which sustain all
kinds of attacks and recover the watermark successfully.
97
6.4. Comparison of proposed method and with various well known
watermarking algorithms:
To compare the performance of the proposed method we compare our method
with other well known watermarking methods. This kind of comparison we borrowed
from Lie et al from their publication in [23] where they have compared their method
with other algorithms. Table 6.7 shows comparison of methods handled in literature
with our proposed methods with respect to handling of different attacks, embedding
position, domain etc. From the chart it is clear that the schemes proposed by us are
tested for common signal processing attacks and are able to provide highest
watermark embedding capacity compare to the all techniques mentioned in the chart
except the technique proposed by Boney [12]. Though the technique suggested by
Boney [12] provides highest watermark embedding, it is not robust against the few of
the signal processing attacks such as time scaling, resampling and requantization.
The table 6.8 gives comparison between the blind methods proposed by us and
the various blind methods highlighted in literature survey. The results of the methods
[13,16,46] are taken from the literature. The entry NA in the table indicates that the
particular type of attack not addressed in the literature. The comparison in table 6.8 is
done with respect to the robustness of the schemes against signal processing attacks.
The method [13] has addressed only four kinds of attacks and their scheme does not
sustain even against mp3 compression. The scheme in [16] has not addressed the
attacks like down sampling, cropping, time scaling and echo addition attacks. The
BER for this scheme in[16] for other attacks is always greater than 0.04 and their
scheme is not able to recover the watermark with 0 BER for any of the attacks. The
scheme in [46] has reported better SNR and good BER but their scheme produces
poor BER for the down sampling attack. They have not addressed the major sensitive
attacks like upsampling, cropping, time scaling, echo addition and equalization.
Though our proposed methods give less SNR compare to scheme in [46], it is
successful to sustain all kinds of attacks. Our schemes are more robust compared to
the other schemes; also we have taken care to add the security to the implemented
schemes.
98
TABLE 6.7
COMPARISION CHART OF VARIOUS WATERMARKING ALGORITHMS THE SYMBOL “?” REPRESENT THE CASE OF NOT BEING ADDRESSEDOR ANALYZED IN THE WORK.
Embedding
domain
Secret keys
used
Embedding
positions
Blind detection
Subjective test
reported
Bits embedded
Cropping
attack
LP filtering
attack
Time scaling
attack
Resampling
attack
Requantization
attack
Compression
attack
Lowest BER
Our
scheme
using SS
DWTDCT
Yes[PN
sequence]
FX
Our
scheme
using GOS
DWT-DCT
Lie[23]
Bassia[24]
Wu[26]
Li[29]
Shin[90]
Ko[44]
Boney
[12]
Time
Time
Fourier
Subband
Time
Time
Time
Yes[Section
length]
FX
Yes[Section
length]
FX
Yes[watermark
key]
FX
NO
NO
DY
Yes[PN
sequence]
FX
No
DY
Yes[Frame
length]
FX
FX
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
Yes
No
No
?
159bps
Yes
159bps
Yes
43bps
Yes
44bps
Yes
?
Yes
38bps
No
25bps
No
4bps
No
860bps
No
Yes
Yes
Yes
Yes
Yes
Yes
No
No
Yes
Yes
Yes
Yes
No
No
Yes
Yes
No
No
Yes
Yes
No
Yes
No
No
No
No
No
Yes
Yes
No
Yes
No
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
0
0
0.03
?
0.305
0.028
0.084
?
0.001
Fx-Fixed Dy-Dynamic
99
TABLE 6.8 COMPARISON CHART FOR VARIOUS BLIND AUDIO WATERMARKING TECHNIQUES.
Proposed GOS based
method using cyclic
coder
SNR in db BER
28.5381
28.3982
28.3982
28.3982
28.3982
22.3475
26.9378
23.5560
0
0
0
0
0
0.0094
0
0
38.7784
38.7784
38.7784
38.7784
38.7784
38.7784
31.6594
0
0
0
0
0
0.0038
0.0098
23.6284
0.0107
16.0376
0.0021
20.8502
0
In 6sec 32*32
19.8
41.7059
0.0089
0.0098
In 6sec 32*32
Scheme in [13]
Scheme in [16]
Scheme in [46]
SNR in
db
NA
NA
NA
NA
NA
NA
NA
NA
SNR in
db
SNR in
db
50.445
30.4116
NA
50.4542
45.3332
NA
36.6381
NA
BER
0
NA
NA
0.91
NA
NA
NA
NA
NA
0
NA
0
In 15 sec 128 bits
Range is between -10 to 20
Without attack
Downsampled
Upsampled
Mp3compressed
Requantization
Cropping
Lpfiltered
Time warping 10%
Echo addition
Equalization
Bits embedded
Proposed SS based
method using cyclic
coder
SNR in db BER
BER
0
NA
0.04
0.08
0.08
NA
0.06
NA
NA
0.13
In 5sec 25 bits
BER
0
0.0513
NA
0
0
NA
0.0002
NA
NA
NA
NA
NA
In 9.57sec 64*64
NA-Not addressed in literature.
100
6.5. Summary of chapter:
The schemes implemented in this chapter provide good results. The SNR
between the original watermark and the recovered watermark is increased compare to
the schemes implemented in previous chapter. It is observed that the robustness of the
techniques is improved by implementing the scheme using cyclic coder.
The techniques implemented in this chapter are blind meaning that it does not
require the original signal to recover the watermark. The scaling parameter used to
scale the host signal during the watermark embedding is adaptive meaning that the
parameter is different for each GOS depending on the audible requirements of each
GOS. It is observed that the DWT-DCT domain is more suitable domain to embed the
watermark as it gives us an advantage of embedding in low-middle frequency regions.
The watermark embedded in low-middle frequency sustains all kind of signal
processing attacks.
101
Chapter 7
Intelligent
Encoder decoder
modeling
Chapter 7 Intelligent encoder and decoder modelling
Introduction
In order to describe the link between watermarking and standard data
communications, the traditional model of a data communications system is often used
to model watermarking systems. The basic components of a data communications
system, related to the watermarking process, are highlighted. One of the most
important parts of the communications models of the watermarking systems is the
communications channel, because a number of classes of the communications
channels have been used as a model for distortions imposed by watermarking attacks.
The other important issue is the security of the embedded watermark bits, because the
design of a watermark system has to take into account access that an adversary can
have to that channel.
In this chapter we highlight on the communications models of watermarking.
The first section of this chapter introduces the basic model of watermarking. The
second section proposes the intelligent encoder decoder model of audio watermarking
based on the implementations of chapter 5 and third section of this chapter proposes
the intelligent encoder decoder model of audio watermarking based on the
implementations of chapter 6.
7.1. Basic model of watermarking
The process of watermarking is viewed as a transmission channel through
which the watermark message is being sent, with the host signal being a part of that
channel. In Figure 7.1, a general mapping of a watermarking system into a
communications model is given. After the watermark is embedded, the watermarked
signal is usually distorted after watermark attacks. The distortions of the watermarked
signal are, similar to the data communications model, modeled as additive noise.
A watermark message m is embedded into the host signal x to produce the
watermarked signal s. The embedding process is dependent on the key k and must
satisfy the perceptual transparency requirement, i.e. the subjective quality difference
between x and s must be below the just noticeable difference threshold. Before the
103
watermark detection and decoding process takes place, s is usually intentionally or
unintentionally modified. The intentional modifications (n) are usually referred to as
attacks; an attack produce attack distortion at a perceptually acceptable level. After
attacks, a watermark extractor receives attacked signal r.
Input
m
Watermark embedder
Watermark Wa
s
encoder
Noise
n
Watermark detector
r
Watermark
Decoder
Output
x
Watermark key
Host signal
Watermark key
Fig. 7. 1. A watermarking system and an equivalent communications model.
Depending on a watermarking application, the detector performs informed or
blind watermark detection. The term attack requires some further clarification.
Watermarked signal s can be modified without the intention to impact the embedded
watermark (e.g. dynamic amplitude compression of audio prior to radio broadcasting).
Why this kind of signal processing is called an attack? The first reason is to simplify
the notation of the general model of digital watermarking. The other, an even more
significant reason, is that any common signal processing impairing an embedded
watermark drastically will be a potential method applied by adversaries that
intentionally try to remove the embedded watermark. The watermarking algorithms
must be designed to endure the worst possible attacks for a given attack distortion
which might be even some common signal processing operation (e.g. dynamic
compression, low pass filtering etc.). Furthermore, it is generally assumed that the
adversary has only one watermarked version s of the host signal x. In fingerprinting
applications, differently watermarked data copies could be exploited by collusion
attacks. It has been proven that robustness against collusion attacks can be achieved
by a sophisticated coding of different watermark messages embedded into each data
copy. However, it seems that the necessary codeword length increases dramatically
with the number of watermarked copies available to the adversary.
The importance of the key k has to be emphasized. The embedded watermarks
should be secure against detection, decoding, removal or modification or modification
104
by adversaries. Kerckhoff’s principle [35], stating that the security of a crypto system
has to reside only in the key of a system, has to be applied when the security of a
watermarking system is analyzed. Therefore, it must be assumed that the watermark
embedding and extraction algorithms are publicly known, but only those parties
knowing the proper key are able to receive and modify the embedded information.
The key k is considered a large integer number, with a word length of 64 bits to 1024
bits. Usually, a key sequence k is derived from k by a cryptographically secure
random number generator to enable a secure watermark embedding for each element
of the host signal.
In order to properly analyze digital watermarking systems, a stochastic
description of the multimedia data is required. The watermarking of data whose
content is perfectly known to the adversary is useless. Any alteration of the host
signal could be inverted perfectly, resulting in a trivial watermarking removal. Thus,
essential requirements on data being robustly watermarkable are that there is enough
randomness in the structure of the original data and that quality assessments can be
made only in a statistical sense.
Let the original host signal x be a vector of length Lx. Statistical modeling of
data means to consider x a realization of a discrete random process x. In the most
general form, x is described by probability density function (PDF). A further
simplification is to assume independent, identically distributed (IID) data elements of
x. Most multimedia data cannot be modeled adequately by an IID random process.
However, in many cases, it is possible to decompose the data into components such
that each component can be considered almost statistically independent. In most
cases, the multimedia data have to be transformed, or parts have to be extracted, to
obtain a component-wise representation with mutually independent and IID
components. The watermarking of independent data components can be considered as
communication over parallel channels.
Watermarking embedding and attacks against digital watermarks must be such
that the introduced perceptual distortion - the subjective difference between the
watermarked and attacked signal to the original host signal is acceptable. In the
previous section, we introduced the terms embedding distortion and attack distortion,
but no specific definition was given. The definition of an appropriate objective
105
distortion measure is crucial for the design and analysis of a digital watermarking
system.
Watermark extraction reliability is usually analyzed for different levels of
attack distortion and fixed data features and embedding distortion. Different reliability
measures are used for watermark decoding and watermark detection. In the
performance evaluation of the watermark decoding, digital watermarking is
considered as a communication problem. A watermark message m is embedded into
the host signal x and must be reliably decodable from the received signal r. Low
decoding error rates can be achieved only using error correction codes. For practical
error correcting coding scenarios, the watermark message is usually encoded into a
vector b of length Lb with binary elements bn = 0; 1. Usually, b is also called the
binary watermark message, and the decoded binary watermark message is b̂ . The
decoding reliability of b can be described by the bit error rate (BER)
The BER can be computed for specific stochastic models of the entire
watermarking process including attacks. The number of measured error events
divided by the number of the observed events defines the measured error rates, word
error rate, WER.
The capacity analysis provides a good method for comparing the performance
limits of different communication scenarios, and thus is frequently employed in the
existing literature. Since there is still no solution available for the general
watermarking problem, digital watermarking is usually analyzed within certain
constraints on the embedding and attacks. Additionally, for different scenarios, the
watermark capacity might depend on different parameters (domain of embedding,
attack parameters, etc.).
If an informed watermark detector is used, the watermark detection is
performed in two steps. In the first step, the unwatermarked host signal may be
subtracted from the received signal r in order to obtain a received noisy added
watermark pattern wn. It is subsequently decoded by a watermark decoder, using the
same watermark key used during the embedding process. Because the addition of the
host signal in the embedder is exactly canceled by its subtraction in the detector, the
only difference between wa and wn is caused by the added channel noise. Therefore,
106
the addition of the host signal can be neglected, making watermark embedding,
channel noise addition and watermark extraction equivalent to the data
communications system.
7.2. Method-1: Proposed Intelligent encoder and decoder model for
robust and secure audio watermarking based on Spread Spectrum
The main aim of the present work is to make the system robust to all kinds of
attacks. In this section we propose the intelligent model of encoding and decoding
based on spread spectrum method. To make the system intelligent we propose to add
the following features.
1) Add synchronization patterns to indicate the start of the file from where the
watermark is embedded in the host signal.
2) To improve the robustness of the system through diversity, add multiple
watermarks at different locations in a file using time diversity.
3) Encode the watermark to reduce the bit error rate and to improve the
robustness further.
4) Make the system secure by using SS technique.
In figure 7.2 we propose the spread spectrum based model of watermarking
system incorporating all above mentioned features. The start of the signal is first
identified and marked. The host audio signal x is decomposed in to smaller segments
of user defined size from the marked point. Then the synchronization pattern is added
to the host audio signal at various points of audio signal from where the watermark is
embedded in host signal. A 6 bit zero mean synchronous pattern with +1 and -1 values
alternatively is added in the music signal before embedding the watermark. The
pattern is very small and it does not affect the signal quality. Due to the continuous
higher value of amplitude the pattern is easily recognizable.
Watermark embedding in time domain directly modifies the sample in time
domain and hence the added distortion creates a smaller hum in the watermarked
signal. Transforming the signal in to any suitable transform domain modifies the
transformed samples of the signal and does not affect much on the imperceptibility.
Each segment is therefore transformed into DWT-DCT transform domain. Design of
107
the technique to recover the watermark without using the host audio signal is required
so the PRN (Pseudo random number) sequence is generated using any secret key k
using cryptographic methods.
Find start
Input host of utterance
audio x(n)
and add
sync
pattern
Watermarked
audio y(n)
Divide host
into segment
of size N
Concatenate
the segments
Take third
level DWT of
each segment
DCT transform
the approximate
coeff. of DWT
Take IDCT and
then IDWT of
each segment
Wa
Initial Seed
Binary
Watermark
W
Dimension
mapping
Bipolar
conversion
PN sequence
generation
Cyclic
coding
using (7,4)
encoder
Scaled watermark
Computation of
scaling parameter
α(k) for each
segment
Fig7.2 .Intelligent encoder model for proposed adaptive SNR based blind technique in DWT-DCT
domain
The watermark message to be transmitted is mapped into an added pattern, wa,
of the same type and dimension of the host signal (one dimensional patterns). The
watermark bit stream is then encoded using (7,4) cyclic encoder. Each resulting
watermark bit is scaled and multiplied by PN sequence to embed in each segment of
host audio. The encoded message pattern is then perceptually weighted in order to
obtain the added pattern wa. After that, wa is added to the host signal, to construct the
watermarked signal. If the watermark embedding process does not use information
about the host signal, it is called the blind watermark embedding; otherwise the
process is referred to as an informed watermark embedding.
To find the imperceptibility between the two watermarked signal and original
signal SNR is computed. While watermark embedding we modify each segment
adaptively to embed the binary data. The computation of α (the scaling parameter) is
made based on the energy of the signal. This helps for keeping the audibility of added
watermark below the masking threshold. After the added pattern is embedded, the
108
watermarked work y is usually distorted during watermark attacks. We model the
distortions of the watermarked signal as added noise, as in the data communications
model. The types of attacks may include compression and decompression, low pass
filtering, resampling, requantization, cropping, time scaling, etc.
When the signal ‘r’ reaches to the destination it is required to recover the
embedded watermark signal from ‘r’. The watermark decoder model is shown in fig
7.3. First start of the signal is identified. The synchronization pattern is tracked. Then
‘r’ is decomposed in to smaller segments of the same size used while embedding the
watermark. Each segment is then transformed to the DWT-DCT domain. Every
segment is then multiplied with the watermark key used. Watermark bit is then
recovered from each segment as explained in the chapter 5. Decode the watermark
using (7,4) cyclic decoder. Concatenate each watermark bit to get one dimensional
watermark and then dimension transformation is used to convert back the one
dimensional watermark into its original two dimensions.
Received
audio r(n)
Find start
of utterance
and search
Sync
pattern
Initial Seed
Recovered
Watermark
Dimension
mapping
Divide host
into segment
of size N
Take third level
DWT-DCT of
each segment
PN sequence
generation
Decoding
using cyclic
decoder
Threshold comparison
and bit recovery from
each segment
Fig.7.3 Decoder model for adaptive SNR based blind technique in DWT-DCT domain.
As in most applications, the watermark system cannot perform its function if
the embedded watermark cannot be detected; robustness is a necessary property if a
watermark is to be secure. Generally, there are several methods for increasing
watermark robustness in the presence of signal modifications. Some of these methods
aim to make watermarks robust to all possible distortions that preserve the perceptual
quality of the watermarked signal.
109
One of the earliest methods of attack characterization consisted of diversity.
Diversity is employed through watermark repetition. Although it is well known that
the repetition can improve the reliability of robust data hiding schemes, it is
traditionally used to decrease the effect of fading. If properly designed, a repetition
can often significantly improve performance and may be worth the apparent sacrifice
in the watermark bit rate. If the repetition is viewed as the application of
communication diversity principles, it can be shown that a proper selection of an
appropriate watermark embedding domain with an attack characterization can notably
improve reliability.
A communication channel can be broken into independent sub channels,
where each sub channel has a certain capacity. Since, in a fading environment, some
of these channels may have a capacity of zero in a particular time instant, diversity
principles are employed. Specifically, the same information is transmitted through
each sub channel with the hope that at least one repetition will successfully be
transmitted. For watermarking, it is referred to as coefficient diversity because
different coefficients within the host signal are modulated with the same information.
The host audio signal x with longer length is selected to achieve the purpose.
The signal x is then split into smaller signals of duration 7 sec. The watermark
embedding is then applied to each of the 7 sec duration signal. Once embedding the
watermark in all signals is over they are all concatenated to form a one signal and sent
on communication channel. At the reception the received signal is again split in to
smaller and 7sec duration signals and the watermark recovery is done from each
signal of 7sec duration.
The robustness results of this scheme implemented through time diversity are
provided in table 7.1. In a 50 sec signal 5 watermarks are embedded after every 7 sec
and the results are observed. The 5 watermarks of five 7 sec duration signals are
recovered and their BER with the original watermark is computed. The recovered
watermark which results in lowest BER is then identified and reported as a valid
mark. The recovered watermark is then enhanced to remove the isolated error bits.
The window of 3* 3 is used to identify the isolated pixel in the neighborhood of the
center pixel. The values of neighborhood pixels are compared with the center pixel
and if the value of center pixel is different than the maximum values of its
110
neighborhood pixels then it is replaced with the opposite one. As the watermark
image is binary image it contain either 0 or 1 value.
Table 7.1 Robustness results after attack characterization using time diversity
Without attack
downsampled
upsampled
Mp3compressed
requantization
cropping
Lpfiltered fc=22050
Time scaling -10%
Time scaling +10%
Echo addition
Equalization
SNR
Lowest BER
after diversity
27.5764
26.6478
26.8021
26.8021
26.8021
23.3522
24.8021
20.4113
20.3347
20.6543
24.6745
0
0
0
0
0
0.0029
0
0.0105
0.0052
0.0019
0
BER after
removing
isolated error
pixels
0
0
0
0
0
0.0009
0
0.0038
0.0029
0.0009
0
7.3. Method-2: Proposed Intelligent encoder and decoder model for
robust and secure audio watermarking based on GOS modification
We also propose the model based on patchwork algorithm which performs
blind detection of watermark. The proposed intelligent encoder model is shown in
Fig. 7.4 which modifies the group of segments (GOS) to embed the watermark. The
start of the signal is first identified and marked. The host audio signal x is
decomposed in to smaller segments of user defined size from the marked point. Then
the synchronization pattern is added to the host audio signal at various points of audio
signal from where the watermark is embedded. Now decompose the original signal x
into smaller segments of user defined size. Transform the signal in to DWT-DCT
domain. Each segment is again decomposed in to two part of group of samples (GOS)
of equal/unequal size. Then compute the mean of each GOS and define them as A and
B.
The watermark message to be transmitted is mapped into an added pattern, wa,
of the same type and dimension of the host signal (one dimensional patterns). To
provide the security the watermark is encoded by error control coding (Cyclic encoder
(7,4)) and then embedded into each segment. As explained earlier in the chapter 6 one
111
bit of binary watermark is embedded in one segment of host signal by modifying each
GOS to satisfy the required condition of data embedding.
The watermarked signal y is then transferred from source to destination
through transmission channel. While traveling the signal on transmission channel it
undergoes various intentional and unintentional signal processing attacks. The
proposed decoder model based on GOS modification is shown in fig. 7.5. The
received signal r is decomposed in to smaller segments. The r is transformed into
DWT-DCT domain and the watermark is recovered by observing the mean of each
GOS according to the rules.
Input x
Find start
of utterance
and add
sync
pattern
DWT
transformation
Followed by
DCT
Segmentation
W
Dimension
mapping
Selection of
length L1 & L2
Cyclic
encoding
Computation
of mean A &
mean B
Watermark
insertion
Scaling
parameter K
Watermarked
signal
Concatenation
of Segments
IDWT followed
by IDCT
transformation
Fig 7.4 Block schematic of GOS based intelligent encoder of watermark in DWT-DCT domain.
Received
signal r
Find start
of utterance
and search
Sync
pattern
Segmentation
DWT-DCT
transformation
Recovered
watermark
ŵ
Selection of
length L1 &
L2
Concatenation
and Dimension
recovery
Computation
of mean A′
& mean B′
Watermark bit
recovery and
cyclic decoder
Fig 7.5 Block schematic of GOS intelligent decoder of watermark in DWT-DCT domain.
112
Table 7.2 Robustness results after attack characterization using time diversity.
Without attack
downsampled
upsampled
Mp3compressed
requantization
cropping
Lpfiltered
Time scaling -10%
Time scaling +10%
Echo addition
Equalization
SNR
Lowest BER
36.4356
35.2543
35.2543
35.2543
35.2543
33.2541
30.8021
23.6543
22.4532
20.7645
24.8437
0
0.0009
0.0009
0.0009
0.0009
0.0029
0.0009
0.0185
0.0193
0.0124
0.0098
BER after
removing
isolated pixels
0
0
0
0
0
0.0009
0
0.0068
0.0078
0.0048
0.0019
The robustness results of this scheme implemented through time diversity are
provided in table 7.2. The results presented in table 7.1, and 7.2 it is clear that the
diversity in time increases the robustness of the system. The system implemented
using GOS modification increases the imperceptibility where as the spread spectrum
based technique is more robust. So we propose to use the model-1 based SS when
robustness is more concerned than the imperceptibility and use model-2 when
imperceptibility is the requirement.
7.4. Summary of Chapter
In this chapter we modeled the audio watermarking techniques using data
communication principles. The techniques implemented are more robust in transform
domain than in time domain, so we propose to embed the data in transform domain. In
both the models watermark is embedded adaptively meaning that discrimination
factor used to embed each bit is varied according to the segment characteristics.
Robustness is significantly improved by cyclic encoding and decoding of watermark.
Diversity in time further increases the robustness of the system. We have also added
the synchronization pattern to trace the start of watermark in the watermarked file. T
The intelligent models we proposed are able to embed the watermark bit
adaptively and perform the blind extraction of watermark successfully. Finally it is
observed that the model proposed in method-1 is more robust compare to the method2. The max value of BER for method-1 is 0.0038 for time scale modification and for
method-2 is 0.0078.Hence is applicable in the areas like copy protection, piracy
113
control, fingerprinting applications where robustness is preferred. The mehod-2 model
is more imperceptible than method-1 and therefore applicable in applications where
high degree of imperceptibility is the requirement. The maximum value of SNR for
method-1 is 27.5764 and for method-2 is 36.4356.
114
Chapter 8
Discussion and
Conclusion
Chapter 8 Discussion and Conclusion
Introduction
Audio watermarking algorithms are studied in this thesis. Main goal of this
thesis is to develop the intelligent encoder and decoder model of robust and secure
audio watermarking. We have proposed two basic models one based on spread
spectrum technique and other based on patch work algorithm. The implementation
procedure and the results obtained for these methods were presented in chapter 5 and
chapter 6 respectively. The chapter 7 proposed the models of the watermarking
techniques based on the digital communication principles. This chapter concludes the
thesis and gives suggestions for the further research.
8.1.
Discussion and conclusion
The main aim of the present work is the development of audio watermarking
algorithms, with the state-of-the-art performance. The algorithms performance is
validated in the presence of the standard watermarking attacks.
In chapter 2 a survey of the key digital audio watermarking algorithms and
techniques are presented. The referred algorithms are classified according to domain
used for inserting a watermark and statistical method used for embedding and
extraction of watermark bits. Scientific publications referred in the literature survey
are chosen in order to build sufficient background that helps in identifying and
solving the research problems.
The main point of the "magic triangle" concept is that if the perceptual
transparency parameter is fixed, the design of a watermark system cannot obtain a
high robustness and watermark data rate at the same time. Therefore, the research
problem was divided into three specific sub problems. These sub problems are stated
in chapter 3.
To solve the sub problems stated in chapter 3 we implemented the
various algorithms as mentioned.
i)What is the highest watermark bit rate obtainable, under perceptual transparency
constraint, and how to approach the limit?
115
ii) How can the detection performance of a watermarking system be improved using
algorithms based on communications models for the system?
iii) How can overall robustness to attacks of a watermark system be increased using
an attack characterization at the embedding side?
These problems are tackled as below
1.To obtain a distinctively high watermark data rate, embedding algorithm were
implemented in to transform domain.
2. To improve detection performance, a spread spectrum method and patch work
algorithms are used, bit error rate is improved using cyclic encoding scheme.
3. To increase the robustness the attack characterization is employed through
diversity.
To increase the high bit rate we embed the data in LSB of host audio in
wavelet domain. The wavelet domain LSB algorithm is described in chapter 4. The
wavelet domain was chosen for data hiding due to its low processing noise and
suitability for frequency analysis, because of its multiresolutional properties that
provide access both to the most significant parts and details of signal’s spectrum. The
wavelet domain algorithm produces stego objects perceptually hardly discriminated
from the original audio clip even when LSBs of coefficients are modified, in
comparison with the time domain LSB algorithm.
The audio watermark is added into the host audio. In this scheme the attempt
is made to embed the audio watermark in host audio signal. We have tried to embed
the audio data of 0.5 to 5sec duration in 45 sec duration host audio. To measure the
imperceptibility between the original signal and the watermarked signal we computed
the SNR between them. The computed SNR after embedding the audio of length from
0.5 sec to 5 sec is presented in Table 4.2. The observed SNR for all cases is above 28
dB. The test results provided in Table 4.1 confirmed that the embedded information is
inaudible except for the attacks like time warping and low pass filtering.
116
The BER is calculated to prove the similarity between the original watermark
and the extracted watermark. The results listed in Table 4.3 indicate that the
watermarking technique is robust to common signal processing attacks such as
compression, echo addition and equalization with poor detection results for low pass
filtering. This technique can be used for covert communication. One disadvantage of
this technique is that the technique is non blind meaning that it requires original host
audio signal to recover the watermark.
To develop the blind watermarking technique and to increase the detection
performance of added copy right information we embed the data using spread
spectrum method. The spread spectrum based techniques are focused in chapter 5.
From the test results in table 5.2 it is observed that non blind technique has better
SNR compared to the blind detection scheme but it requires the original audio signal
to recover the watermark. As in case of non blind techniques number of modifications
made to embed the watermark bit is less as compare to the blind technique. In non
blind as well as in blind techniques only one bit of information is added in one
segment of host audio. To embed one bit of information we modify only one sample
in the host audio segment, where as we modify each and every sample of host audio
to implement blind watermarking technique.
In table 5.2 the results obtained for the proposed blind technique implemented
in DWT/LWT domain are also presented. Though the SNR in these cases is below 20
dB the listening test confirms that there is no perceptual difference between the
original watermark and the recovered watermark. The BER test between the original
watermark and recovered watermark indicate that DWT based blind detection
technique is slightly more robust than LWT based blind detection technique. The
blind watermarking schemes proposed in this chapter embed the watermark
adaptively in each segment.
To make the scheme adaptive we compute the scaling parameter α(k) for each
segment by making assumption of the expected SNR between the original segment
coefficients and the modified segments coefficients. The selection criteria for the
value of SNR in computation of α(k) and the selection criteria for segment length is
decided based on the results presented in table 5.3. From the table 5.3 it is clear that
117
the optimized results are obtained for segment length 256 and SNR between 40 – 60
dB. To provide the security to the system and to make the system secure the PN
sequence is used as a secret key to embed the watermark. The embedded watermark
cannot be recovered without the knowledge of the PN sequence used, the watermark
the key is very important to keep the method secure.
It is also observed that the SNR between original signal and watermarked
signal is improved by embedding the watermark in DWT-DCT (DWT transform
followed by DCT transform of low frequency coefficients) domain. By computing the
DCT of 3-level DWT coefficients we take advantage of low frequency middle
components to embed the information. This scheme proposed in section 5.4 provides
better SNR compared to other two techniques implemented in chapter 5 and is also
robust to various signal processing attacks. To embed the data using spread spectrum
method we propose to use DWT-DCT domain to increase the imperceptibility and to
obtain the better robustness test. To add the security and to further improve the
robustness results the cyclic coding is used. The watermark bits are encoded using
cyclic coder before embedding into host audio. At the receiver side the cyclic decoder
is used to decode the watermark. Cyclic coding and decoding corrects the one bit
errors generated during the watermark recovery. Due to encoding of watermark the
opponent will not come to know the statistical behavior of watermark signal and
hence can not guess the watermark from the statistical behavior of the audio signal.
The results of this scheme are presented in table 5.6 and 5.7 using (6,4) and (7,4)
cyclic coder respectively. Since our main goal of watermarking is to increase the
robustness we preferred to use (7, 4) cyclic coder because it is more robust as
compared to (7,4).
Schemes based on patch work algorithm are proposed in chapter 6 in different
transform domains. It is observed that this scheme is more imperceptible than the
spread spectrum technique. The obtained SNR between the original signal and
watermarked signal is increased for this case. The maximum value of SNR obtained
using this scheme is 61.0525dB for harmonium signal and 44.5607 for song signal.
This is obtained due to the smaller amount of δ (discrimination parameter) used to
embed the information in this scheme. Also we make the modification of the
segments if the required condition is not satisfied. Since the less number of
118
modifications are done to embed the watermark this scheme provides better
imperceptibility than the Spread spectrum based method. The results provided in table
6.1, 6.2 and 6.4 again confirm that DWT-DCT domain is the more suitable domain to
embed the information. The schemes implemented in this chapter calculate the
adaptive scaling parameter in each subsection/GOS preserving the imperceptibility of
the watermarked signal and providing good robustness results. The robustness results
for the scheme proposed in chapter 5 and chapter 6 indicate that the scheme proposed
in chapter 5 is more robust than the scheme proposed in chapter 6.
Chapter 7 proposes the intelligent encoder decoder models of the schemes
implemented in chapter 5 and chapter 6. To make the models intelligent robust and
secure we added the following features in the proposed model. i) We added
synchronization pattern to indicate the start of watermark in the audio, ii) The time
diversity is employed to improve robustness. iii) Encoded the watermark using cyclic
coder. In both the models watermark is embedded adaptively meaning that
discrimination factor used to embed each bit is varied according to the segment
characteristics. The observed test results are significant even in timescale
modification, low pass filtering. The models we propose are able to embed 1024 bits
in 6.5 sec duration signal preserving the imperceptibility of the signal.
The watermark is repeatedly embedded in a longer audio signal and observed
the recovery of the same. It is observed that after various attack at least one out of five
watermarks embedded provide the minimum BER result. In addition, it is clear that
the introduction of the attack characterization module additionally improved the
extraction reliability of both algorithms, decreasing the bit error rate, most discernibly
in the presence of time scale modification; low pass filtering, echo addition and
equalization. From the results presented in this chapter it is again confirmed that the
SS based method is more robust than the method based on patch work algorithm. The
SS based algorithm obtained high detection robustness and increasing the perceptual
transparency of the watermarked signal. Time required to embed and recover the
watermark from SS based scheme is 21.131 sec. Time required to embed and recover
the watermark from GOS based scheme is 24.401 sec.
119
8.2. Main contribution of the present research
1. Audio watermarking scheme is proposed to embed an audio watermark in the
audio signal
2. Adaptive blind watermarking algorithms based on SNR are developed using
Spread spectrum technique.
3. Spread spectrum based techniques are implemented in different transform
domains.
4. Cyclic coder and attack characterization by diversity is used to increase the
robustness of the scheme.
5. Synchronization pattern is added to track the watermark.
6. New intelligent encoder and decoder model is proposed for an audio
watermarking system using Spread Spectrum method.
7. Blind watermarking algorithms based on GOS modification are developed in
different transform domain.
8. Cyclic coder and attack characterization by diversity is employed to increases
the robustness.
9. Synchronization pattern is added to track the watermark.
10. New intelligent encoder and decoder model is proposed for an audio
watermarking system using GOS method.
8.3. Future scope
The research in watermarking is progressing along two paths while the new
algorithms of watermarking are developed. Researchers are working on attacks on the
watermarked signals. Proposing the new attacks and suggesting the counter attack is
one of the hot areas of watermarking research. One can work on attacks to propose
new attack for the existing algorithms and suggest the counter measures to be taken to
survive against that attack. The researchers can work on setting a buyer seller protocol
for watermarking techniques. The algorithms can be developed in DWT-DCT domain
to meet the requirement of imperceptibility and robustness at the same time.
Cryptographic methods can be employed to provide the security to the developed
watermarking technique. Proposing asymmetric watermarking method is also
appreciable in the field of watermarking.
120
References
References
[1]. I. J. Cox and Matt L Miller, “ The first 50 years of electronic watermarking”,
Journal of applied signal processing, 2002, pp. 126-132.
[2]. I. J. Cox, J. Kilian, F. Thomson Leighton and T. Shamoon, “Secure Spread
Spectrum Watermarking for Multimedia”, IEEE Transactions on Image Processing, Vol.
6, No. 12, December 1997, p.p. 1673-1686.
[3]. J. A. Bloom, I. J. Cox, T. Kalker, J. M. G. Linnartz and M. L. Miller, C. B. S. Traw ,
“Copy protection for DVD Video” ,Proc. of the IEEE, Vol. 87, No. 7, July 1999.
[4]. C.T. Hsu, J.L. Wu., “Hidden Digital Watermarks in Images”, IEEE Transactions on
Image Processing, Vol. 8,No. 1, January 1999, p.p.58-68.
[5]. D. Kundur and D. Hazinakos, “Digital watermarking for telltale tamper proofing
and authentication”, Proc. IEEE No.7, July 1999, pp. 1167-1180.
[6]. D. Kundur and D. Hazinakos, “Diversity and Attack Characterization for Improved
Robust Watermarking”, IEEE Transactions on Signal Processing, Vol. 49, No. 10,
October 2001, p.p.2383-2396.
[7]. Ming Shing Hsish and Din Chang, “Hidding the digital watermarks using
Multiresolution wavelet transform”, IEEE Transactions on industrial electronics. Vol. 48
No. 5. October 2001, pp. 875-891.
[8]. L. M. Marvel, C.G. Boncelet and C.T. Retter, “Spread Spectrum Image
steganography”, IEEE Transactions on Image Processing, Vol. 8. August 1999. p.p.
1075-1083.
[9]. C. S. Lu, S.K. Huang, C. J. Sze and H. Y. M. Liao, “Cocktail Watermarking for
Digital Image Protection”, IEEE Transaction on Multimedia, Vol. 2., No.4, December
2000, p.p. 209-224.
[10]. D.Kundur, “Watermarking with Diversity: insights and Implications”, IEEE
Multimedia magazine , October-Decemer 2001, p.p 46-52.
[11]. A.P.Fabien and Petitcols, “Watermarking Schemes evaluation”, IEEE Signal
Processing Magazine, September 2000, p.p.58-64.
121
[12]. L.B. Boney ,A. Twefik and K. Hamdy, “Digital Watermarks for Audio Signals”,
IEEE Int. Conf. on Multimedia computing and Systems, June 1996, p.p. 473-480.
[13]. J. Seok, J. Hong and J. Kim, “A Novel Audio Watermarking Algorithm for
Copyright Protection of Digital Audio,” ETRI Journal, Volume 24, No. 3, June 2002, p.p.
181-189.
[14]. D.Kirovski and H.S.Malvar, “Spread spectrum watermarking of audio signals” ,
IEEE Transactions on Signal Processing, Vol. 51, No. 4, April 2003 p.p. 1020-1033.
[15]. H.S.Malvar and D.A.F. Florencio, “Improved Spread Spectrum: A New Modulation
Technique for Robust Watermarking”, IEEE Transactions on signal Processing, Vol. 51,
No. 4, April 2003, p.p. 898-905.
[16]. S. Esmaili, S. Krishnan and K. Raahemifar, “A Novel Spread Spectrum Audio
Watermarking Scheme Based on Time-Frequency Characteristics”, Proc. of CCECE
2003, p.p. 1963-1966.
[17]. D. Kirovaki and H. Malver, “Robust covert communication over a public audio
channel using spread spectrum”, available at site www.cs.ucla.edu/~darko/papers.
[18]. Hafiz Malik, Ashfaq Khokhar and Rashid Ansari, “Robust audio watermarking
using frequency selective spread spectrum theory” Proceeding of ICASSP 2004, IEEE,
p.p. V385- V388.
[19]. W. Bender, D. Gruhl, N. Morimoto and A. Lu, “Techniques for data hiding”, IBM
system Journal, 1996,Vol. 35, p.p. 313-336.
[20]. I.K. Yeo and H.J. Kim, “Modified patchwork algorithm: a novel audio
watermarking scheme”, Proc. ICITCC 2001, p.p. 237-242.
[21]. N. Cvejic and T. Seppanen, “Robust Audio watermarking in Wavelet Domain Using
Frequency Hopping and Patchwork method”, Proc. of 3rd International Symposium on
Image and Signal processing and Analysis 2003, p.p. 251-255.
[22]. R.Wang and Dawen Xu,Q. Li, “Multiple audio watermarks based on lifting wavelet
transform”, Proc. of 4th international conference on m/c learning and cybernetics IEEE,
August 2005,p.p. 1959-1964.
[23]. W. N. Lie and L. C. Chang, “Robust and High Quality Time-Domain Audio
Watermarking Based on Low Frequency Amplitude Modulation”, IEEE Transactions on
Multimedia, Vol.8., No.1., February 2006, p.p. 46-59.
122
[24]. P. Bassia and I. Pitas, “Robust audio watermarking in the time domain”, IEEE
Transactions on Multimedia,Vol. 3, No.2, June 2001 p.p.232-241.
[25]. A. N. Lemma, J. Aprea, W. Oomen and L. V. D. Kerkhof, “A Temporal Domain
Audio Watermarking Technique”, IEEE Transaction on Signal Processing, Vol. 51, No. ,
April 2003, p.p. 1088-1097.
[26]. S. Wu., J. Huang, D. Huang and Y. Q. Shi, “Self-Synchronized Audio Watermark in
DWT Domain”, Proc. of ISCAS 2004, p se.p. V.712-V.715.
[27]. J. Huang, Y. Wang and Y. Q. Shi, “A blind audio watermarking algorithm with self
synchronization,” Proc. IEEE Int. Symp. On circuits and systems, vol. 3, 2002, p.p.627630
[28]. S. Wu., J. Huang, D. Huang and Y. Q. Shi, “Efficiently self synchronized Audio
watermarking for Assured Audio Data Transmission”, IEEE Transaction on
Broadcasting, Vol. 51, No1, March 2005, p.p. 69-76.
[29]. X. Li, M. Zhang and S.Sun, “Adaptive audio watermarking algorithm based on SNR
in wavelet domain”, Proc. of IEEE conference 2003 p.p.287-292.
[30]. A. Takahashi, R. Nishimura and Y. Suzuki, “Multiple Watermarks for Stereo Audio
Signals Using Phase-Modulation Techniques”, IEEE Transactions on Signal Processing,
Vol. 53, No. 2, February 2005, p.p. 806-815.
[31]. H. H. Tsai, J. S. Cheng and P. T. Yu, “Audio Watermarking Based on HAS and
Neural Networks in DCT Domain”, EURASIP Journal on Applied signal processing
2003, p.p. 252-263.
[32]. C.Xu, J. Wu, D.D. Feng, “Content based Digital Watermarking for Compressed
Audio”, Available on line at site http://133.23.229.11/~ysuzuki/Proceedingsall/ .
[33]. X. Li and H.H. Yu., “Transparent and Robust Audio Data Hiding in Cepstrum
Domain”, Proc. IEEE International conference on Multimedia and Expo. New York(I),
2000, P.P.397-400.
[34]. S. K. Lee and Y. S. Ho, “Digital Audio Watermarking in the Cepstrum Domain”,
IEEE Transaction on Consumer Electronics. Vol.46, No.3 August 2000, p.p.744 -750.
[35]. N. Sriyingyong and K. Attakimongeol, “Wavelet based Audio watermarking using
adaptive Tabu search”, Proc. of IEEE Int. Symp. Wireless Pervacive computing 2006 pp
1-5 on line at http://sutir.sut.ac.th:8080/sutir/bitstream/123456789/1821/1/BIB990_F.pdf
123
[36]. Wang, Xu Dawen, Chen Jiner and Duchengtou, “Digital Audio Watermarking
algorithm based on Linear Predictive Coding in wavelet domain”, ICSP-04 Proc. of
IEEE, 2004, p.p. 2393-2396.
[37]. Chin-Su Ko, K.Kim, R.Hwang, Y. Kim and S.Rhee, “Robust Audio Watermarking
in wavelet domain using PN sequences”, Proc. of ICIS-2005 published by IEEE.
[38]. R.Vieru, R. Tahboub,C. Constantinescu and V. Lazarescu, “New results using the
audio watermarking based on wavelet transform”, International Symposium on Signals,
Circuits, and Systems, Kobe, Japan 2 (2005) published by IEEE 2005, p.p. 441-444.
[39]. R.Wang, Dawen Xu and L Qian, “Audio Watermarking based on wavelet packet
and Psychoacoustic model”, IEEE Proc. of PDCAT-2005.
[40]. Wang, Xu, Z. Hang and C. Youngrui, “2-D digital Audio Watermarking based on
Integer Wavelet Transform”, IEEE Proc. of ISCIT 2005, p.p. 877-880.
[41]. S.Ratansanya, S.Poomdaeng, S. Tachaphetpiboon and T. Amornraksa, “New
Psychoacoustic models for wavelet based Audio watermarking”, IEEE Proc. of ISCIT
2005, p.p. 582-585.
[42]. S. D. Larbi, M. J. Saidane , “Audio Watermarking A Way to Stationnarize Audio
Signals”, IEEE Transactions on Signal Processing, Vol. 53, No. 2, February 2005 p.p.
816-823.
[43]. T. Furon and P. Duhamel, “An Asymmetric Audio Watermarking Method”, IEEE
Transactions on Signal Processing, Vol. 51, No. 4, April 2003 p.p. 981-995.
[44]. B. S. Ko, R. Nishimura and Y. Suzuki, “Time-spread Echo method for digital audio
watermarking”, IEEE Transaction on multimedia, Vol. 7, No.2, April 2005, pp 212-221.
[45]. S. Eerüçük, S. Krishnan and M. Zeytinoğlu, “A Robust audio watermark
representation based on linear chirps”, IEEE Transaction on multimedia, Vol. 8, No.5,
October 2005, pp 925-936.
[46]. X. Wang, W. Qi and P. Niu, “A new Adaptive Digital Audio watermarking based on
support vector regression”, IEEE Transaction on audio , speech and language processing,
Vol. 15, No.8, November 2007, pp 2270-2277.
[47]. M.Mansour and A. Twefik, “Audio watermarking by time scale modification”, in
Proc. Int. conf. on Acoustics, speech and signal processing, May 2001, Vol. 3, pp 13531356.
124
[48]. M.Mansour and A. Twefik, “Data embedding in audio using time-scale
modification”, IEEE Transaction on speech audio Processing, Vol. 13 No.3 pp432-440,
2005.e
[49]. Wei Li, X Xue and P. Lu , “Localized Audio watermarking technique Robust
against Time scale modification”, IEEE Transaction on multimedia, Vol. 8, No.1,
February 2006, pp 60-69.
[50]. S. Xiang and J. Huang, “Histogram-based audio watermarking against time-scale
modification and cropping attacks”, IEEE Transaction on multimedia, Vol. 9, No.7,
November 2007, pp1357-1372.
[51]. X. Y. Wang and H. Zhao, “A Novel Synchronization Invariant Audio Watermarking
Scheme based on DWT and DCT”, IEEE Transaction on signal processing, Vol.54,
No.12, December 2006, pp 4835-4839.
[52]. J. D. Gordy and L. T. Bruton, “Performance Evaluation of Digital Audio
Watermarking Algorithms”, Proc. of Midwest symposium on Circuits and Systems, Vol.
1, August 2000, p.p. 456-459.
[53]. F. Bartolini, M. Barni and A Piva, “Performance Analysis of ST-DM Watermarking
in Presence of Non Additive Attacks”, IEEE Transactions on Signal Processing, Vol. 52,
No. 10, October 2004 p.p. 2965-2974.
[54]. F.A.P. Petitcolas and R. J. Anderson, “Evaluation of copyright marking systems”,
Proc. of IEEE Multimedia systems’ 99, Vol. 1, pp 574-579.
[55]. G.C. Roddriguez, M. N. Miyatake and H. M. P. Meana, “Analysis of Audio
Watermarking Schemes”, Proc. of ICEEE 2005, p.p. 17-20.
[56]. M. Wu, S. A. Craver, E. W. Felten and B. Liu, “Analysis of Attacks on SDMI Audio
Watermarks”, IEEE Int. Conf. on acoustics, speech and Signal Processing, Vol. No.3,
p.p. 1369-1372.
[57]. F. Hartung, J. K. Su and B. Girod, “Spread Spectrum Watermarking: Malicious
Attacks and Counterattacks”, Proc. SPIE, Vol. 3657 January 1999.
[58]. M. Kuttur, S. Voloshynovskiy and A. Herrigel, “ The Watermark Copy Attack”,
Proc. SPIE, Vol. 3971, security and watermarking of multimedia. 2000, p.p.371-380.
[59]. Alexander Herrigel, Sviatoslav Voloshynovskiy and Yuriy Rystar, “The watermark
template attack.” Proc. SPIE, Vol 4314, 2001, p.p. 394-400.
125
[60]. J. K. Su., F. Hartung and B. Girod, “A Channel Model for a Watermark Attack”,
Proc. SPIE, Vol. 3657, January 1999, p.p. 159-170.
[61]. D.Kirovski and A. P. Peticolas, “Blind Pattern Matching Attack on Watermarking
Systems”, IEEE Transactions on Signal Processing, Vol. 51. No. 4, April 2003, p.p.10451053.
[62]. N Cvejic, “Algorithms for audio watermarking and steganography”, academic
dissertation report, available on line http://hekules.olu.fi/isbn9514273842.
[63]. M. Swanson, B. Zhu and A. Twefik, L.B. Boney, “Robust audio watermarking
using perceptual masking”, Signal process. Special issue on watermarking, 1997, pp.
337-355
[64]. M. Swanson, M. Kobayashi and A.H. Tewfik, “Multimedia data-embedding and
watermarking technologies”, Proc. IEEE vol.86. No.6. June 1998.p.p. 1064-1087.
[65]. M. Arnold and Z. Huang, “Fast Audio Watermarking: Concepts and Realizations”,
Proc. SPIE 2004.p.p.105-111.
[66]. P. Noll, “MPEG Audio coding”, IEEE Signal Processing Mag. September 1997,
Vol.1, No.5 p.p.59-81.
[67]. T. Painter and A. Spanias, “A Perceptual audio Coding”, Proc. of IEEE Vol. 88, No.
4, April 2000.
[68]. P. Cayre, C. Fontaine and T. Furon, “Watermarking Security: Theory and Practice”,
IEEE Transactions on Signal Processing, Vol. 53. No. 10, October 2005, p.p.3976-3987.
[69]. Q. Cheng and T. S. Huang, “Robust Optimum Detection of Transform Domain
Multiplicative Watermarks”, IEEE Transactions on Signal Processing, Vol. 51. No. 4,
April 2003, p.p.906-920.
[70]. O. Dabeer, k. Sullivan, U. Madhow, S. Chandrasekaran and B. S. Manjunath,
“Detection of Hiding in the Least Significant Bit”, IEEE Transactions on Signal
Processing, Vol. 52. No. 10, October 2004, p.p.3046-3048.
[71]. T. K. Das, S. Maitra and J. Mitra, “Cryptanalysis of Optimal Differential Energy
Watermarking and a Modified Robust Scheme”, IEEE Transactions on Signal Processing,
Vol.53, No. 2. February 2005. p. p. 768-775.
126
[72]. H. H. Yu, D. Kundur and C. Y. Lin, “Spies, Thieves and Lies: The Battle for
Multimedia in the Digital Era”, IEEE Multimedia Magazine July-September 2001, p.p.812.
[73]. D.Kundur, “Watermarking with Diversity: insights and Implications”, IEEE
Multimedia magazine , October-Decemer 2001, p.p 46-52
[74]. C. P. Yu and C. C. J. Kuo, “Fragile Speech Watermarking Based on Exponential
Scale Quantization for Tamper Detection”, Available on IEEE Digital Library from
2002, p.p.3305-3308.
[75]. M. Barni, F. Bartolini, A. D. Rosa and A. Piva, “Optimum Decoding and Detection
of Multiplicative Watermarks”, IEEE Transactions on Signal Processing, Vol. 51, No.4,
April 2003, p.p.1118-1123
[76]. G. Wu and E. H. Yang, “Joint Watermarking and compression Using Scalar
Quantization for Maximizing Robustness in the Presence of Additive Gaussian Attacks”,
IEEE Transactions on Signal Processing, Vol. 53, No.2, February 2005, p.p.834-844.
[77]. A. Abrardo and M. Barni, “Informed Watermarking by Means of Orthogonal and
Quasi-Orthogonal Dirty Paper Coding”, IEEE Transactions on Signal Processing, Vol.
53, No.2, February 2005, p.p.824-833.
[78]. S. Trivedi and R. Chandramouli, “Secret Key Estimation in Sequential
Steganography”, IEEE Transactions on Signal Processing, Vol. 53, No.2, February 2005,
p.p.746-757.
[79]. T. P. C. Chen and T. Chen, “A Framework for Optimal Watermark Detection”,
Proc. of ACM Multimedia 2001.
[80]. D Kirovski, H.S. Malvar and Y Yacobi, “A dual watermarking and fingerprinting
system”, Microsoft research (online) Available at http://research.microsoft.com
[81]. C. T. Hsu and J. L. Wu, “DCT based watermarking for Video”, IEEE Transactions
on Consumer Electronics, Vol.44, No. 1, February 1998, pp. 206-216.
[82]. J. Tzeng, W-L Hwang and I-L. Chern, “An Asymmetric Subspace Watermarking
Method for Copyright Protection,” IEEE Transactions on Signal Processing, Vol. 53,
No.2, February 2005, p.p.784-792.
[83]. J.Fridrich, M. Goljan, P. Lisonek and D. Soukal, “Writing on Wet Paper”, IEEE
Transactions on Signal Processing, Vol. 53, No.10, October 2005, p.p.3923-3935
127
[84]. S. Dumitrescu and X. Wu, “A New Frame Work of LSB Steganalysis of Digital
Media”, IEEE Transactions on Signal Processing, Vol. 53, No.10, October 2005,
p.p.3936-3947.
[85]. P Kumsawat, K. Attakitmongcol and A. Srikaew, “A New Approach for
Optimization in Image Watermarking by Using Genetic Algorithms”, IEEE Transactions
on Signal Processing, Vol. 53, No.12, December 2005, p.p.4707-4719.
[86]. J. Sigut, J. Demetrio, L. Moreno, J. Estévez, R. Aguilar and S. Alayón, “An
Asymptotically Optimal Detector for Gaussianity Testing”, IEEE Transactions on Signal
Processing, Vol. 53, No.11, Novmber 2005, p.p.4186-4793.
[87]. F. Balado, K. M. Whelan, G. C. M. Silvestre and N. J. Hurley, “Joint Iterative
Decoding and Estimation for Side-Informed Data Hiding”, IEEE Transactions on Signal
Processing, Vol. 53, No.12, December 2005, p.p.4006-4019.
[88]. H. Gou and M. Wu, “Data Hiding in Curves With Applications to Fingerprinting
Maps” IEEE Transactions on Signal Processing, Vol. 53, No.10, October 2005, p.p.39884005.
[89]. F. P. González, C. Mosquera, M. Barni and A. Abrardo, “Rational Dither
Modulation: A High-Rate Data-Hiding Method Invariant to Gain Attacks”, IEEE
Transactions on Signal Processing, Vol. 53, No.10, October 2005, p.p.3960-3975.
[90]. N.K.Kalantri, M. A. Akhaee, S. M. Ahadi and H. Amindavar, “Robust
Multiplicative Patchwork Method for Audio Watermarking”, IEEE Transaction on Audio,
Speech and Language Processing, Vol. 17, No. 6, August 2009, p.p.1133-1141.
[91]. Y. Nakashima, R. Tachibana and N. Babaguchi, “Watermarked Movie Soundtrack
Finds the Position of the camcorder in a Theater”, IEEE Transaction on Multimedia,
Vol. 11, No. 3, April 2009, p.p. 443-454.
[92]. Oscal T.C. Chen and W.C. Wu. “Highly Robust, Secure, and Perceptual-Quality
Echo Hiding scheme”, IEEE Transaction on Audio, Speech and Language Processing,
Vol. 16, No. 3, March 2008, p.p. 628-638.
[93]. X. Y. Wang, P.P. Niu and H.Y. Yang, “A Robust Digital Audio Watermarking”,
IEEE Multimedia Magazine, July-September 2009,p.p.60-68 .
128
Appendix
129
View publication stats
Download