See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/338548454 INTELLIGENT ENCODER AND DECODER MODEL FOR ROBUST AND SECURE AUDIO WATERMARKING Under the Faculty of Engineering Thesis · January 2009 CITATIONS READS 0 284 1 author: Meenakshi R Patil Jain AGM Institute of Technology Jamkhandi, India 33 PUBLICATIONS 23 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: Image Inpainting View project Audio watermarking View project All content following this page was uploaded by Meenakshi R Patil on 13 January 2020. The user has requested enhancement of the downloaded file. INTELLIGENT ENCODER AND DECODER MODEL FOR ROBUST AND SECURE AUDIO WATERMARKING A Thesis submitted to SHIVAJI UNIVERSITY, KOLHAPUR For the Degree of, DOCTOR OF PHILOSOPHY In ELECTRONICS ENGINEERING (Digital Signal Processing) Under the Faculty of Engineering By MRS. MEENAKSHI RAVINDRA PATIL Under the Guidance of DR. MRS. S. D. APTE Department of Electronics Engineering. Walchand College of Engineering, Sangli YEAR 2009 DECLARATION BY GUIDE This is to certify that the thesis entitled INTELLIGENT ENCODER AND DECODER MODEL FOR ROBUST AND SECURE AUDIO WATERMARKING which is being submitted herewith for the award of the DEGREE OF DOCTOR OF PHILOSOPHY in Electronics Engineering, under the Faculty of Engineering of Shivaji University, Kolhapur is the result of the original Research work completed by MRS. MEENAKSHI RAVINDRA PATIL under my supervision and guidance and to the best of my knowledge and belief the work embodied in this thesis has not formed earlier the basis for the award of any degree or similar title of this or any other university or examining body. Place: Kolhapur Date: Research Guide (Dr. Mrs. S. D. Apte) ii DECLARATION BY STUDENT I hereby declare that the thesis entitled INTELLIGENT ENCODER AND DECODER MODEL FOR ROBUST AND SECURE AUDIO WATERMARKING completed and written by me has not previously formed the basis for the award of any Degree or Diploma or other similar title of this or any other university or examining body. Place: Kolhapur Date: Research Student (Mrs. Meenakshi Ravindra Patil) iii CERTIFICATE This is to certify that the thesis entitled INTELLIGENT ENCODER AND DECODER MODEL FOR ROBUST AND SECURE AUDIO WATERMARKING which is being submitted herewith for the award of the Degree of Doctor of Philosophy in Electronics Engineering, under the Faculty of Engineering of Shivaji University, Kolhapur is the result of the original Research work completed by MRS. MEENAKSHI RAVINDRA PATIL to the best of our knowledge and belief the work embodied in this thesis has not formed earlier the basis for the award of any degree or similar title of this or any other university or examining body. Place: Kolhapur Date: Research Guide (Dr. Mrs. S. D. Apte) Examiners: 1) 2) iv ACKNOWLEDGMENT I consider myself fortunate to have got an opportunity to work under the valuable guidance of Dr. Mrs. S. D. Apte. I wish to take this opportunity to convey my deep gratitude to her for her valuable advice, constant encouragement, keen interest, painstaking corrections, constructive criticism, and scholarly guidance right from the suggestion of the topic up to the completion of manuscript. It is hard finding words to express my gratitude to my husband Mr. Ravindra P. Patil for everything he has done for me. I thank him for his continuous encouragement, valuable discussions during the entire research right from beginning till the completion of this thesis. Without his positive, kind and silent support this work would neither have been possible nor could it have been completed. I have no words to express my gratitude’s to His Holiness Swastishree Charukirti Bhattarak Mahsamiji, Chairman SSDJJP Sangha, Shravanbelgola, Dr. J. J. Magdum, honorable Chairman Dr. J. J. Magdum Trust, Jaysingpur, and Shri. Vijayraj Magdum, honorable Executive Director, Dr. J. J. Magdum College of Engineering, Jaysingpur for their kind support. I am grateful to the Prof. R. P.Pungin, Former Principal Bahubali College of Engg. Shravanbelgola, for his guidance and encouragement especially during the early stages of this research. I am also grateful to Prof Mrs. R.S. Patil, Former H.O.D. Electronics department, Walchand College of Engineering Sangli for her guidance and encouragement. Due to her support only I could able to complete my post graduate and later registered for PhD. I extend my sincere thanks to Prof. G. M. Ravannavar, Vice Principal Bahubali college of Engineering, Shravanbelgola, Dr. D. S. Bormane, Principal RSCOE Pune, Dr. P. R. Mutagi, Principal Dr. J. J. Magdum College of Engineering, Jaysingpur and Prof. A.K. Gupta, Vice Principal Dr. J. J. Magdum College of Engineering Jaysingpur for providing the facilities to carry out the research at their institutes while my working in the respective institutes during the research. My Special thanks to Dr. Mrs. A. A. Agashe and Dr. A. S. Yadav for their valuable suggestions and help during the thesis writing. I am thankful to my colleagues and staff members from Bahubali college of Engineering, Shravanbelgola for their support and good wishes during the early stages of my research. I am also thankful to my colleagues and staff members from v Electronics, Electronics and communication Dept. and Electrical department of Dr. J. J. Magdum College of Engineering Jaysingpur for their good wishes. Last but not the least; I acknowledge with affection my indebtedness to my kids Yukta and Abhishek, my parents, parents in law, brother, sister, sister in laws, brother in laws and all my family members who have provided me the prolonged encouragement and all possible comfort besides showing active interest in the work. Mrs. Meenakshi Ravindra Patil vi List of Contributions This thesis is based on the eleven original papers (Appendices I–XI) which are referred in the text by Roman numerals. All analysis and simulation results presented in publications or this thesis have been produced by the author. Professor Mrs. S. D. Apte gave guidance and needed expertise in signal processing methods. She had an important role in the development of the initial ideas and shaping of the final outline of the publications. Conference Publications I) “DWT based Image Watermarking” at international conference ICWCN 2003 organized by S.S.N.C.O.E., Kalavakkam, Chennai during 27-28 June 2003. II) “DWT based Audio Watermarking” at international conference ICWCN 2003 organized by S.S.N.C.O.E., Kalavakkam, Chennai during 27-28 June 2003. III) “Audio Watermarking for covert communication” in 12th annual Symposium on mobile computing and applications organized by IEEE Bangalore section November 2004. IV) “Insight on Audio Watermarking” at national conference NC2006 organized by Padre Conceicao College of Engineering Verna – Goa during 14-15 September 2006 V) “Performance analysis of Audio Watermarking in Wavelet Domain” at international conference RACE organized by Engineering College Bikaner during 24-25 March 2007 VI) “SNR based audio watermarking in Wavelet domain” International conference ICCR-08 April 2008 Mysore. International Journals/Digital library publications VII) “SNR based audio watermarking Scheme for blind detection” International conference ICETET - 08, July 2008 Nagpur, published by IEEE Computer Digital Library. vii VIII) “SNR based Spread Spectrum Watermarking using DWT and Lifting Wavelet Transform” in International Journal IJCRYPT October 2008. IX) “Adaptive Spread Spectrum Audio Watermarking for Indian Musical Signals by Low Frequency Modification” in proc. IEEE international conference ASID 2009 held in 22-25 August 2009 at Hongkong. X) “Adaptive Spread spectrum audio watermarking” in ICFAI Journal on Information technology, September 2009. XI) “Adaptive Audio Watermarking for Indian Musical Signals by GOS Modification” selected in IEEE International conference TECNCON 2009 held on 23-26 November 2009 at Singapore. Publications VIII and X are International Journal publications. Publications VII, IX and XI are published by IEEE Digital library and are available on line on www.ieeexplore.org. viii ABSTRACT Recent advancement in digital technology for broadband communication and multimedia data distribution in digital format opened many challenges and opportunities for researchers. Simple-to-use software and decreasing prices of digital devices have made it possible for consumers from all around the world to create and exchange multimedia data. Broadband Internet connections and error-free transmission of data facilitate people to distribute large multimedia files and make identical digital copies of them. A perfect reproduction in digital domain has promoted the protection of intellectual ownership and the prevention of unauthorized tampering of multimedia data to become an important technological and research issue. Digital watermarking has been proposed as a new, alternative method to enforce intellectual property rights and protect digital media from tampering. Digital watermarking is defined as technique which directly embeds and extracts the data from the host signal. The main challenge in digital audio watermarking is that if the perceptual transparency parameter is fixed, the design of a watermark system cannot obtain high robustness and a high watermark data rate at the same time. In this thesis, we address the research problem on audio watermarking. First, what is the imperceptibility of the watermarked data and how to measure it? Second, how can the detection performances of a watermarking system are improved for blind detection? Third, whether the system is robust to different signal processing attacks? Is it possible to increase the robustness trough attack characterization? An approach that combined theoretical consideration and experimental validation, including digital signal processing, is used in developing algorithms for audio watermarking. The results of this study are the development and implementation of audio watermarking algorithms. The algorithms performance is validated in the presence of the standard watermarking attacks. The thesis also includes a thorough review of the literature in the digital audio watermarking. ix List of abbreviation and Symbols 1-D One dimension A/D Analog to digital AOAA Average of absolute amplitudes AWGN Additive white Gaussian noise BEP Bit error probability BER Bit error rate CD Compact Disk CMF Conjugate Mirror Filters CPU Central processing unit CWT Continuous wavelet transform D/A Digital to analog dB Decibels db2 Daubechies wavelet DCT Discrete Cosine transform DSSS Direct sequence spread spectrum DWT Discrete wavelet transforms FFT Fast Fourier transform GOS Group of samples HAS Human auditory system HRT Hough-Radon transform HVS Human Visible system IFPI International Federation of the Phonographic Industry IID Independent, identically distributed IMF Instantaneous mean frequency ISS Improved Spread Spectrum JND Just noticeable difference LP Low pass LSB Least Significant bits x LWT Lifting wavelet transform MASK Modified Signal Keying Mp3 MPEG -1 Compression, Layer III MPEG Moving Picture Experts group MSE Mean-squared error NMR Noise to mask ratio PDF Probability density function PN Pseudo Noise PRN Pseudo random Number SDMI Secure Digital Music Initiative SMR Signal to mask ratio SNR Signal to Noise ratio SPL Sound pressure level SS Spread Spectrum ST-DM Spread-transform dither modulation SVM Support vector machines SVR Support vector regression TF Time frequency THD1 Threshold 1 TSM Time scale modification WER Word error rate WMSE Weighted mean-squared error List of symbols m Input message w ′′ Recovered watermark D Lk (i) Lth level detail coefficient ′ D Lk (i) Modified Lth level detail coefficients by the pseudorandom signal r(i) AkL (i ) DCT coefficient of ca3 coefficient of Lth level DWT of xk (i ) xi xk (i ) kth segment of the host audio. Ak′ L (i ) Modified Lth level DCT coefficient by pseudorandom signal r(i) E in Absolute average of nth section Emax Max. AOAA E min Min AOAA Emid Middle value of AOAA wr Recovered watermark Pb Bit error probability Pr Error probability that b n ≠ b̂ n ŵ Recovered watermark ω Highest frequency α Scaling parameter δ Discrimination parameter used to modify mean σn Variance of noise introduced σr Variance of r σu Variance of signal u σx Variance of host audio a(n) Coarse approximation A,B Difference mean A´, B´ Means of received signal b Bits to be transmitted by watermarking process ca3 Third level coarse approximation coefficients of xk (i ) Co Host audio signal Cw Watermarked signal Cwn The received watermarked signal after noise addition d[n] Detail information d3 Third level detailed wavelet coefficients f(x) Signal in which watermark to be embedded xii G0 Low-pass filter G1, H1 Synthesis filter/ Hypothesis 1 H0 High-pass filter/Hypothesis 0 k Secret key k Segment number K Imperceptibility parameter L Level of decomposition of signal using wavelets L1, L2 Section lengths M Total no. of watermark bits M1× M2 Size of the watermark, row pixel no. × Column pixel no. mr Mean of r N Length of the segment, length of PN sequence p Error probability r Sufficient statistic required to detect the mark r(i) PN- sequence used to embed the watermark in x using SS method s Signal after embedding the watermark bit u Zero mean PN – sequence to be added in to signal x w Original watermark w’ Scaled watermark Wa Encoded watermark wk Watermark key x Original host audio signal y Watermarked signal, received vector Y(i) Watermarked signal xiii List of figures and tables Figures Figure Title of Figure No. Page No. 1.1 Block diagram of encoder 3 3.1 A watermarking system & an equivalent communications model 31 4.1 Magic triangle, three contradictory requirements of watermarking 35 4.2 Frequency masking in the human auditory system 38 4.3 Signal-to-mask ratio & Signal to Noise ratio values 39 4.4 Temporal masking in the human auditory system 40 4.5 Mallat-tree decomposition 41 4.6 Reconstruction of original signal from wavelet coefficient 42 4.7 Watermark embedding 45 4.8 Watermark extraction 46 4.9 Original Host audio 47 4.10 Watermarked Signal 47 4.11 Original Watermark 47 4.12 Recovered watermark 48 5.1 General model of SS-based Watermarking 54 5.2 Error probability as a function of the SNR 56 5.3 Watermark embedding Process 59 5.4 Watermark extraction 59 5.5 a) Original host audio b) watermarked audio 60 5.6 a) Original watermark b) recovered Watermark 61 5.7 Results for SNR based scheme for non blind detection 62 5.8 Watermark embedding process for proposed adaptive SNR based 66 blind tech. in DWT/LWT domain 5.9 Watermark extraction for adaptive SNR based blind technique in 67 DWT/LWT domain. 5.10 Original watermark 68 xiv 5.11 Results for SNR based scheme for blind detection in DWT domain 68 5.12 Results for SNR based scheme for blind detection in LWT domain 68 5.13 Watermark embedding process for proposed adaptive SNR based 73 blind technique in DWT-DCT domain 5.14 Watermark extraction for adaptive SNR based blind technique in 74 DWT_DCT domain. 5.15 Results for SNR based scheme for blind detection in DWT-DCT 75 domain 5.16 PDF of SNR for various keys 76 5.17 Improved encoder and decoder for blind watermarking using 78 cyclic coding 6.1 Block schematic of GOS based watermark embedding in DWT 86 domain 6.2 Block schematic of GOS based watermark extraction in DWT 87 domain 6.3 a) SNR Vs K without attack b) BER Vs K 88 6.4 Results of robustness test recovered watermark DWT domain 89 6.5 Block schematic of GOS based watermark embedding in DCT 91 domain 6.6 Block schematic of GOS based watermark extraction in DCT 91 domain 6.7 Results of robustness test recovered watermark DCT domain 92 6.8 Block schematic of GOS based watermark embedding in DWT- 94 DCT domain 6.9 Block schematic of GOS based watermark embedding in DWT- 95 DCT domain 6.10 Results of robustness test recovered watermark signal 95 6.11 Improved encoder decoder for GOS based blind watermarking 97 using cyclic codes 7.1 A watermarking system & an equivalent communication model 104 xv 7.2 7.3 Intelligent encoder model for proposed adaptive SNR based blind technique in DWT-DCT domain Decoder model for adaptive SNR based blind technique in DWT- 108 109 DCT domain. 7.4 Block schematic of GOS based intelligent encoder of watermark in 112 DWT-DCT domain. 7.5 Block schematic of GOS intelligent decoder of watermark in 112 DWT-DCT domain TABLES Table Table details No. Page No. 1.1 Four categories of Information hiding technique 4 4.1 Subjective listening test for MP3 song 48 4.2 SNR of watermarked signal & BER of extracted watermark signal 50 4.3 Robustness test results for MP3 song 50 5.1 Experimental results against signal processing attack for non blind 62 tech. 5.2 Results after signal processing attacks a)SNR between original signal and watermarked signal after 69 attack for three schemes b)Correlation coefficient between original watermark and recovered 69 watermark for three schemes 5.3 Relation between segment length, parameter α(k), observed SNR 70 and correlation coefficient 5.4 SNR between original audio signal and watermarked signal BER at 75 recovered watermark 5.5 Comparison chart of spread audio watermarking technique 77 implemented in this chapter 5.6 Results obtained for (6,4) cyclic codes 78 5.7 Results obtained for (7,4) cyclic codes 78 xvi 6.1 SNR between original audio signal and BER of recovered 89 watermark for various musical signal in DWT domain 6.2 SNR between original audio signal and BER of recovered 92 watermark for various musical signal in DCT domain 6.3 Relation between segment length variation and No. of coeff. 93 Modified with SNR and BER 6.4 SNR between original signal and watermarked signal, BER of 95 recovered watermark results in DWT-DCT domain 6.5 Comparison between the proposed GOS based scheme and proposed 96 SS based scheme 6.6 Results obtained for (7,4) cyclic codes 97 6.7 Comparison chart of various watermarking algorithms 99 6.8 Comparison chart for various blind audio watermarking techniques 100 7.1 Robustness results after attack characterization (SS method) 111 7 .2 Robustness test result after attack characterization(GOS method) 113 xvii Contents Declaration by Guide…………………………………………………………... ii Declaration by Student………………………………………………………… iii Certificate……………………………………………………………………….. iv Acknowledgement……………………………………………………………… v List of contributions…………………………………………………………..... vii Abstract…………………………………………………………………………. ix List of symbols and abbreviation…………………………………………….. x List of Figures and tables……………………………………………………... xiv Contents………………………………………………………………………... xviii 1. Introduction………………………………………………………………… 1 1.1 Basic functionalities of watermarking schemes……………………… 4 1.1.1 Perceptibility…………………………………………………….. 4 1.1.2 Level of reliability……………………………………………….. 5 1.1.3 Capacity………………………………………………………….. 6 1.1.4 Speed……………………………………………………………... 6 1.1.5. Statistical undetectability………………………………………. 6 1.1.6 Asymmetry………………………………………………………. 6 1.2 Evaluation of Schemes………………………………………………….. 7 1.2.1 Perceptibility…………………………………………………….. 8 1.2.2 Robustness……………………………………………………….. 8 1.2.3 Capacity………………………………………………………….. 10 1.2.4 Speed……………………………………………………………... 10 1.2.5 Statistical undetectability……………………………………….. 11 1.3 organization of thesis…………………………………………………… 11 2. Literature Survey…………………………………………………………... 13 2.1 Spread spectrum audio watermarking………………………………... 13 2.2 Methods using patch work algorithm…………………………………. 16 2.3 Methods implemented in time domain………………………………... 17 xviii 2.4 Methods implemented in transform domain………………………….. 18 2.5 Other recently developed algorithms………………………………….. 21 2.6 Audio watermarking techniques against time scale modifications….. 23 2.7 Papers studied on performance analysis and evaluation of 25 watermarking Systems…………………………………………………... 2.8 Watermark attacks……………………………………………………... 27 2.9 Research problems identified………………………………………….. 28 2.10 Concluding remarks…………………………………………………... 29 3. Research problem…………………………………………………………... 31 3.1 Summary of chapter…………………………………………………… 33 4. High capacity covert communication for Audio………………………….. 35 4.1 Overview of the properties of HAS …………………………………... 36 4.2 Discrete wavelet transform……………………………………………. 40 4.2.1 Conditions for perfect reconstruction………………………….. 42 4.2.2 Classification of wavelets……………………………………….. 43 4.2.2.1 Features of orthogonal wavelet filter banks…………... 43 4.2.2.2 Features of biorthogonal wavelet filter banks………... 43 4.3 Audio watermarking for Covert Communication…………………… 44 4.4 Results of high capacity covert communication technique………….. 46 4.4.1 Subjective Listening test………………………………………... 48 4.4.2 Robustness test…………………………………………………... 48 4.5 Summary of chapter……………………………………………………. 51 5. Spread Spectrum Audio watermarking algorithms……………………… 53 5.1 Conventional spread spectrum method of watermarking…………... 54 5.2 Adaptive SNR based non blind watermarking technique in wavelet domain …………………………………………………………………. 5.3 Proposed adaptive SNR based blind watermarking using DWT/ Lifting wavelet transform……............................................................... 5.3.1 Watermark embedding………………………………………….. 56 64 5.3.2 Watermark extraction…………………………………………... 66 5.3.3 Experimental results ……………………………………………. 67 63 xix 5.3.4 Selection criteria for value of SNR in computing α(k) and selection criteria for segment length N ………………………… 5.4 Proposed adaptive SNR based spread spectrum scheme in DWTDCT …………………………………………………………………….. 5.4.1 Watermark embedding………………………………………….. 72 5.4.2 Watermark extraction…………………………………………... 73 5.4.3 Experimental results……………………………………………... 74 70 71 5.5 Proposed SNR based blind technique using cyclic coding …………... 77 5.6 Summary of chapter…………………………………………………… 79 6. Adaptive watermarking by GOS modification………………………….. 81 6.1 Introduction to audio watermarking technique based on GOS 82 modification in time domain…………………………………………. 6.1.1 Rules of watermark embedding………………………………... 83 6.1.2 Watermark extraction…………………………………………... 83 6.2 Proposed adaptive watermarking using GOS modifications in 84 transform domain…………………………………………………………………... 6.2.1 Proposed blind watermarking using GOS modification in DWT domain 6.2.2 Proposed blind watermarking using GOS modification in DCT domain 6.2.3 Proposed blind watermarking using GOS modification in DWT-DCT domain 6.3 Proposed GOS based blind technique using cyclic coding 85 90 94 96 6.4 Comparison of proposed method with well known watermarking algorithms 6.5 summary of chapter……………………………………………………. 101 7. Intelligent encoder and decoder modeling ……………………………….. 103 7.1. Basic model of watermarking………………………………………... 103 7.2 Method-1: Proposed Intelligent encoder and decoder model for robust and secure audio watermarking based on Spread Spectrum 7.3 Method-2: Proposed Intelligent encoder and decoder model for robust and secure audio watermarking based on GOS modification 7.4 Summary of chapter………………………………………………. 107 8. Discussion and Conclusion…………………………………………. 98 111 113 115 xx 8.1 Discussion and conclusion……………………………………………… 115 8.2 Main contribution of the present research……………………………. 120 8.3 Future scope…………………………………………………………….. 120 9. References…………………………………………………………………… 121 Appendix (Details of Published papers)…..………………………………….. 129 xxi Chapter 1 Introduction Chapter 1 Introduction Introduction to Watermarking In 1954, Emil Hembrooke of the Muzac Corporation filed a patent entitled “Identification of sound and like signals” which described a method for imperceptibly embedding an identification code into music for the purpose of proving ownership. The patent states “The present invention makes possible the positive identification of the origin of a musical presentation and thereby constitutes an effective means of preventing piracy. Electronic watermarking had been invented. Presently there is increased usage of Internet and multimedia information. The availability of different multimedia editing software and the ease with which multimedia signal are manipulated have opened many challenges and opportunities for the researchers. A possibility for unlimited copying without a loss of fidelity causes a considerable financial loss for copyright holders. The ease of content modification and a perfect reproduction in digital domain have promoted the protection of intellectual ownership and the prevention of the unauthorized tampering of multimedia data to become an important technological and research issue A wide use of multimedia data combined with a fast delivery of multimedia to users having different devices with a fixed quality of service is becoming a challenging and important topic. Traditional methods for copyright protection of multimedia data are not sufficient. Hardware-based copy protection systems have already been easily circumvented for analogue media. Hacking of digital media systems is even easier due to the availability of general multimedia processing platforms, e.g. a personal computer. Simple protection mechanisms that were based on the information embedded into header bits of the digital file are not useful because header information can easily be removed by a simple change of data format, which does not affect the fidelity of media. Encryption of digital multimedia prevents access of the multimedia content to an individual without a proper decryption key. Therefore, content providers get paid for the delivery of perceivable multimedia, and each client that has paid the royalties must be able to decrypt a received file properly. Once the multimedia has been decrypted, it can be repeatedly copied and distributed without any obstacles. Modern 1 software and broadband Internet provides the tools to perform it quickly and without much effort and deep technical knowledge. It is clear that existing security protocols for electronic commerce serve to secure only the communication channel between the content provider and the user and are useless if commodity in transactions is digitally represented. From a business perspective, the question is whether watermarking can provide economic solutions to real problems. Current business interest is focused on a number of applications that broadly fall into the categories of security and device control. From a security perspective, there has been criticism that many proposed watermark security solutions are “weak”, i.e. it is relatively straightforward to circumvent the security system. While this is true, there are many business applications where “weak” security is preferable to no security. Therefore it is expected that businesses will deploy a number of security applications based on watermarking [1]. In addition, many device control applications have no security requirement, since there is no motivation to remove the watermark. Device control, particularly as it pertains to the linking of traditional media to the Web, is receiving increased attention from businesses and this interest will increase. From an academic perspective, the question is whether watermarking introduces new and interesting problems for basic and applied research. Watermarking is an interdisciplinary study that draws experts from communications, cryptography and audio and image processing. Interesting new problems have been posed in each of these disciplines based on the unique requirements of watermarking applications. Commercial implementations of watermarking must meet difficult and often conflicting economic and engineering constraints. Digital watermarking has been proposed as a new, alternative method to enforce the intellectual property rights and protect digital media from tampering. It involves a process of embedding the digital signature into a host signal in order to "mark" the ownership. The digital signature is called the digital watermark. The digital watermark contains data that can be used in various applications, including digital rights management, broadcast monitoring and tamper proofing. The existence of the watermark is indicated when watermarked media is passed through an appropriate watermark detector. 2 Watermark Host Audio Watermark Embedder Secret Key Watermarked Audio Watermark Detecter Detected Watermark Secret Key Fig1.1 Block diagram of encoder Figure 1.1 gives an overview of the general watermarking system. A watermark, which usually consists of a binary data sequence, is inserted into the host signal in the watermark embedder. Thus, a watermark embedder has two inputs; one is the watermark message accompanied by a secret key and the other is the host signal (e.g. image, video clip, audio sequence etc.). The output of the watermark embedder is the watermarked signal, which cannot be perceptually discriminated from the host signal. The watermarked signal is then usually recorded or broadcasted and later presented to the watermark detector. The detector determines whether the watermark is present in the tested multimedia signal, and if so, will decode the message. The research area of watermarking is closely related to the fields of information hiding and steganography. The three fields have a considerable overlap and many common technical solutions. However, there are some fundamental philosophical differences that influence the requirements and therefore the design of a particular technical solution. Information hiding (or data hiding) is a more general area, encompassing a wider range of problems than the watermarking. The term hiding refers to the process of making the information imperceptible or keeping the existence of the information secret. Steganography is a word derived from the ancient Greek words steganos, which means covered and graphia, which in turn means writing. It is an art of concealed communication. Therefore, we can define watermarking systems as systems in which the hidden message is related to the host signal and nonwatermarking systems in which the message is unrelated to the host signal. On the other hand, systems for embedding messages into host signals can be divided into steganographic systems, in which the existence of the message is kept secret, and nonsteganographic systems, in which the presence of the embedded message does not have to be secret. Division of the information hiding systems into four categories is given in Table 1.1. 3 Message Hidden Message Known Table1.1 Four categories of Information hiding techniques Host Signal Dependent Message Host Signal Independent Message Covert Communication Steganographic Watermarking Non-steganographic Watermarking Covert Embedded Communications The primary focus of this thesis is the watermarking of digital audio (i.e., audio watermarking). The watermarking algorithms were primarily developed for digital images and video sequences [1-11]; interest and research in audio watermarking started slightly later. In the past few years, several algorithms[12-33] for the embedding and extraction of watermarks in audio sequences have been presented. All of the developed algorithms take advantage of the perceptual properties of the human auditory system (HAS) in order to add a watermark into a host signal in a perceptually transparent manner. Embedding additional information into audio sequences is a more tedious task than that of images, due to dynamic supremacy of the HAS over human visual system. In addition, the amount of data that can be embedded transparently into an audio sequence is considerably lower than the amount of data that can be hidden in video sequences as an audio signal has a dimension less than two-dimensional video files. On the other hand, many attacks that are malicious against image watermarking algorithms (e.g. geometrical distortions, spatial scaling, etc.) cannot be implemented against audio watermarking schemes. 1.1. Basic Functionalities of Watermarking Scheme The objectives of the scheme and its operational environment dictate several immediate constraints (a set of minimal requirements) on the algorithm. In the case of automated radio monitoring, for example, the watermark should clearly withstand distortions introduced by the radio channel. Similarly, in the case of MPEG video broadcast, the watermark detector must be fast (to allow real-time detection) and simple in terms of number gates required for hardware implementation. One or more of the following general functionalities can be used. 1.1.1. Perceptibility The perceived quality of the medium after watermark embedding should be imperceptible. In the case of digital watermarking in images there should not be any 4 visual difference between the original image and the watermarked image. For audio watermarking schemes the embedded watermark should be inaudible. 1.1.2. Level of Reliability There are two main aspects to reliability: Robustness and false negatives occur when the content was previously marked but the mark could not be detected. The threats centered on signal modification are robustness issues. Robustness can range from no modification at all to destruction of the signal. (Complete destruction may be too stringent a requirement. Actually, it is not clear what it means. Instead one could agree on a particular quality measure and a maximum quality loss value.) This requirement separates watermarking from other forms of data hiding (typically steganography). Without robustness, the information could simply be stored as a separate attribute. Robustness remains a very general functionality as it may have different meanings depending on the purpose of the scheme. If the purpose is image integrity (tamper evidence), the watermark extractor should have a different output after small changes have been made to the image while the same changes should not affect a copyright mark. In fact, one may distinguish at least the following main categories of robustness: 1. The threats centered on modifying the signal to disable the watermark (typically a copyright mark), willfully or not, remain the focus of many research articles that propose new attacks. By “disabling a watermark” means making it useless or removing it. 2. The threats centered on tampering of the signal by unauthorized parties to change the semantic of the signal are an integrity issue. Modification can range from the modification of court evidences to the modification of photos used in newspapers or clinical images. 3. The threats centered on anonymously distributing illegal copies of marked work are a traitor-tracing issue and are mainly addressed by cryptographic solutions. 4. Watermark cascading—that is, the ability to embed a watermark into an audiovisual signal that has been already marked—requires a special kind of robustness. The order in which the marks are embedded is important because different types of marks may be embedded in the same signal. For example, one may embed a public and a private watermark (to simulate asymmetric watermarking) or a strong public watermark together with a tamper evidence watermark. As a consequence, the 5 evaluation procedure must take into account the second watermarking scheme when testing the first one. At last, false positives occur whenever the detected watermark differs from the mark that was actually embedded. The detector could find a mark A in a signal where no mark was previously hidden, in a signal where a mark B was actually hidden with the same scheme, where a mark B was hidden with another scheme. 1.1.3. Capacity Knowing how much information can reliably be hidden in the signal is very important to users, especially when the scheme gives them the ability to change this amount. Knowing the watermarking access unit (or granularity) is also very important; indeed, spreading the mark over a full sound track prevents audio streaming, for instance. (A “watermark access unit” is the smallest part of a cover signal in which a watermark can be reliably detected and the payload extracted.) 1.1.4. Speed Some applications require real-time embedding and/or real time detection. Ultimately for such application the implemented scheme should be able to embed and detect the embedded watermark with high speed. 1.1.5. Statistical Undetectability For some private watermarking systems—that is, a scheme requiring the original signal—one may wish to have a perfectly hidden watermark. In this case it should not be possible for an attacker to find any significant statistical differences between an unmarked signal and a marked signal. As a consequence an attacker could never know whether an attack succeeded or not. Note that this option is mandatory for steganographic systems. 1.1.6. Asymmetry Private-key watermarking algorithms require the same secret key both for embedding and extraction. They may not be good enough if the secret key has to be embedded in every watermark detector (that may be found in any consumer electronic or multimedia player software); then malicious attackers may extract it and post it to the Internet allowing anyone to remove the mark. In these cases the party that embeds a mark may wish to allow another party to check its presence without revealing its embedding key. This can be achieved using asymmetric techniques. Unfortunately, 6 robust asymmetric systems are currently unknown, and the current solution (which does not fully solve the problem) is to embed two marks: a private one and a public one. Other functionality classes may be defined but the ones listed above seem to include most requirements used in the recent literature. The first three functionalities are strongly linked together, and the choice of any two of them imposes the third one. In fact, when considering the three-parameter watermarking model (perceptibility, capacity, and reliability), the most important parameter to keep is the imperceptibility. (“Capacity” is the bit size of a payload that a watermark access unit can carry.) Then two approaches can be considered: emphasize capacity over robustness or favor robustness at the expense of low capacity. This clearly depends on the purpose of the marking scheme, and this should be reflected in the way the system is evaluated. 1.2. Evaluation of Watermarking Scheme A full scheme is defined as a collection of functionality services to which a level of assurance is globally applied and for each of which a specific level of strength is selected. So a proper evaluation has to ensure that all the selected requirements are met to a certain level of assurance. The number of levels of assurance cannot be justified precisely. On the one hand, it should be clear that a large number of them make the evaluation very complicated and unusable for particular purposes. On the other hand, too few levels prevent scheme providers from finding an evaluation close enough to their needs. We are also limited by the accuracy of the methods available for rating. Information technology security evaluation has been using six or seven levels for the reasons it is just mentioned above but also for historical reasons. This seems to be a reasonable number for robustness evaluation. For perceptibility we preferred to use fewer levels and, hence, follow more or less the market segmentation for electronic equipment. Moreover, given the roughness of existing quality metrics it is hard to see how one could reasonably increase the number of assurance levels. Following section discuss possible methods to evaluate the functionalities listed above. 7 1.2.1. Perceptibility Perceptibility can be assessed to different levels of assurance. The problem here is very similar to the evaluation of compression algorithms. The watermark could be just slightly perceptible but not perceptible under domestic/ consumer viewing/listening conditions. Another level is nonperceptibility in comparison with the original under studio conditions. Finally, the best assurance is obtained when the watermarked media are assessed by a panel of individuals who are asked to look or listen carefully at the media under the above conditions. As it is stated, however, this cannot be automated, and one may wish to use less stringent levels. In fact, various levels of assurance can also be achieved by using various quality measures based on human perceptual models. Since there are various models and metrics available, an average of them could be used. Current metrics do not really take into account geometric distortions, which remain a challenging attack against many watermarking schemes. 1.2.2. Robustness The robustness can be assessed by measuring the detection probability of the mark and the bit error rate for a set of criteria that are relevant for the application that is considered. The levels of robustness range from no robustness to provable robustness. For level zero, no special robustness features have been added to the scheme apart from the one needed to fulfill the basic constraints imposed by the purpose and operational environment of the scheme. So if we go back to the radiomonitoring example, the minimal robustness feature should make sure that the mark survives the distortions of the radio link in normal conditions. The low level corresponds to some extra robustness features added but which can be circumvented using simple and cheap tools publicly available. These features are provided to prevent “honest” people from disabling the mark during normal use of the work. In the case of watermarks used to identify owners of photographs, the end users should be able to save and compress the photo, resize it, and crop it without removing the mark. Moderate robustness is achieved when more expensive tools are required, as well as some basic knowledge on watermarking. So if we use the previous example, the end user would need tools such as Adobe Photoshop and apply more processing to the image to disable the mark. 8 Moderately high means tools are available but special skills and knowledge are required and attempts may be unsuccessful. Several attempts and operations may be required and one may have to work on the approach. High robustness means all known attempts have been unsuccessful. Some research by a team of specialists is necessary. The cost of the attempt may be much higher than what it is worth and its success is uncertain. Provable robustness means it should be computationally (or even more stringent: theoretically) infeasible for a willful opponent to disable the mark. This is similar to what we found for cryptography where some algorithms are based on some difficult mathematical problem. The first levels of robustness can be assessed automatically by applying a simple benchmark algorithm: For each medium in a determined set: 1. Embed a random payload with the greatest strength that does not introduce annoying effects. In other words, embed the mark such that the quality of the output for a given quality metric is greater than a given minima. 2. Apply a set of given transformations to the marked medium. For each distorted medium try to extract the watermark and measure the certainty of extraction. Simple methods may just use a success/failure approach, that is, to consider the extraction successful if and only if the payload is fully recovered without error. The measure for the robustness is the certainty of detection or the bit error rate after extraction. This procedure must be repeated several times since the hidden information is random and a test may be successful by chance. Levels of robustness differ by the number and strength of attacks applied and the number of media on which they are measured. The set of test and media will also depend on the purpose of the watermarking scheme and are defined in evaluation profiles. For example, schemes used in medical systems need only to be tested on medical images while watermarking algorithms for owner identification have to be tested on a large panel of images. The first levels of robustness can be defined using a finite and precise set of robustness criteria (e.g., S.D.M.I., IFPI, or E.B.U. requirements) and one just need to check them. 9 False Positives False positives are difficult to measure, and current solutions use a model to estimate their rate. This has two major problems: first, “real world” watermarking schemes are difficult to model accurately, and second, modeling the scheme requires access to details of the algorithm. Despite the fact that not publishing algorithms breaches Kerckhoffs’ principles, details of algorithms are still considered trade secrets, and getting access to them is not always possible. (In 1883, Auguste Kerckhoffs enunciated the first principles of cryptographic engineering, in which he advises that we assume the method used to encipher data is known to the opponent, so security must lie only in the choice of key. The history of cryptology since then has repeatedly shown the folly of “security- by-obscurity”—the assumption that the enemy will remain ignorant of the system in use.) So one way to estimate the false-alarm rate is to count the number of false alarms using large sample of data. This may turn out to be another very difficult problem, as some applications require one error in 108 or even 1012. 1.2.3. Capacity In most applications the capacity will be a fixed constraint of the system so robustness testing will be done with a random payload of a given size. While developing a watermarking scheme, however, knowing the tradeoff between the basic requirements is very useful and graphing with two varying requirements—the others being fixed—is a simple way to achieve this. In the basic three-parameter watermarking model, for example, one can study the relationship between robustness and strength of the attack when the quality of the watermarked medium is fixed between the strength of the attack and the visual quality or between the robustness and the visual quality [6]. This is useful from a user’s point of view: the performance is fixed (we want only 5% of the bits to be corrupted so we can use error correction codes to recover all the information we wanted to hide) and so it helps to define what kind of attacks the scheme will survive if the user accepts such or such quality degradation. 1.2.4. Speed Speed is dependent on the type of implementation: software or hardware. The complexity is an important criteria and some applications impose a limitation on the maximum number of gates that can be used, the amount of required memory, etc. For 10 a software implementation, success also depends very much on the hardware used to run it but comparing performance results obtained on the same platform (usually the typical platform of end users) provides a reliable measure. 1.2.5. Statistical Undetectability All methods of steganography and watermarking substitute part of the cover signal, which has some particular statistical properties, with another signal with different statistical properties; in fact, embedding processes usually do not pay attention to the difference in statistical properties between the original cover signal and the stegosignal. This leads to possible detection attacks. As for false positives, evaluating such functionality is not trivial but fortunately very few watermarking schemes require it, so we will not consider it in the next section. 1.3. Organization of Thesis Robust digital audio watermarking algorithms and high capacity watermarking methods for audio are studied in this thesis. The purpose of the thesis is to develop novel audio watermarking algorithms providing a performance enhancement over the other state-of the- art algorithms with an acceptable increase in complexity and to validate their performance in the presence of the standard watermarking attacks. Presented as a collection of eleven original publications enclosed as appendices I-XI, the thesis is organized as follows. Following this introductory chapter, thesis is organized as follows. Chapter 2 Literature Review provides the sufficient background that would help out in solving the research problems. The research work is continued with review of literature published in various international journals and conferences to study latest development in the field of watermarking. The focus is given on research related to audio watermarking. Chapter 3 states the research problem. Also provides the research hypothesis to solve the problem and assumption made while solving the problem. A general background and requirements for high capacity covert communications for audio are presented in Chapter 4. In addition, the results which are in part documented in Papers II and III, for the Wavelet domain LSB watermarking algorithm is presented. In Chapter 5, the contents of which are in part included in Papers VI, VII, VIII, IX and X spread spectrum audio watermarking algorithms in wavelet domain 11 are presented. A general model for the spread spectrum-based watermarking is described in order to place in context the developed algorithms. These developed algorithms were able to perform blind detection and are perceptually transparent. The perceptual transparency is measured by computing the SNR between the host audio and watermarked audio. Also the transparency is measured through subjective listening tests. Chapter 6, the contents of which are presented in Paper XI focuses on the increasing of the robustness of embedded watermark. The Scheme is based on the patch work algorithm. The schemes are implemented in time, DCT, DWT, DWTDCT domains. The results are also produced simultaneously with the explanation of each implementation. These developed algorithms increase the perceptual transparency significantly and performs the blind detection of watermark. Chapter 7 of the thesis concentrate on modeling the developed algorithm based on principles of communication. To make the system intelligent and secure a method to embed the watermark using cyclic coding is suggested. A principle important to increase the robustness by attack characterization through diversity is also discussed in the subsequent section of this chapter. Chapter 8 concludes the thesis discussing its main results and contributions. Directions for further development and open problems for future research are also mentioned. 12 Chapter 2 Literature Survey Chapter 2 Literature Survey Introduction Several digital watermarking techniques are proposed which includes watermarking for images, audio and video. The watermarking is primarily developed for the images [1-11], the research in audio is started later. There are less watermarking techniques are proposed for audio compared to the images/video. Embedding the data in audio is difficult compared to the images because the Human auditory system (HAS) is more sensitive than the Human visual System (HVS). In last ten years there is a lot of advancement in audio watermarking few of them are discussed here. This chapter reviews the literature of information hiding in audio sequences. Scientific publications included into the literature survey have been chosen in order to build a sufficient background that would help out in identifying and solving the research problems. During the last decade audio watermarking schemes [12-50] have been applied widely. These schemes are sophisticated very much in terms of robustness and imperceptibility. Robustness and imperceptibility are important requirements of watermarking, while they conflict each other. Non-blind watermarking schemes are theoretically interesting, but not so useful in practical use, since it requires double storage capacity and communication bandwidth for watermark detection. Of course, non-blind schemes may be useful for copyright verification mechanism in a copyright dispute. On the other hand, blind watermarking schemes can detect and extract watermarks without use of the unwatermarked audio. Therefore it requires only a half storage capacity and half bandwidth compared with the non-blind watermarking scheme. Hence, only blind audio watermarking methods are considered in this chapter. 2.1 Spread Spectrum Audio Watermarking Most of the existing audio watermarking techniques embed the watermarks in the time domain/ frequency domain where as there are few techniques which embed the data in cepstrum or compress domain. Spread spectrum (SS) technique is most popular technique and is used by many researchers in their implementations [12-18]. Amplitude scaled Spread Spectrum Sequence is embedded in the host audio signal which can be detected via a correlation technique. Generally embedding is based on a 13 psychoacoustic model (which provides inaudible limits of watermark). Watermark is spread over a large number of coefficients and distortion is kept below the just noticeable difference level (JND) by using the occurrence of masking effects of the human auditory system (HAS). Change in each coefficient can be small enough to be imperceptible because the correlated detector output still has a high signal to noise ratio (SNR), since it dispreads the energy present in a large number of coefficients. Boney et al [12] generated watermarks by filtering a pseudo noise sequence with a filter that approximates the frequency masking characteristics of the HAS. Thus the different watermarks are created for different audio signal. Their study and results show that their scheme is robust in the presence of additive noise, lossy coding/decoding, resampling and time scaling. They also state that using their scheme it is easy to detect the watermark for the author. However they have used the original signal to detect the watermark. The scheme is also robust in presence of other watermarks. J. Seok et al [13] proposed a novel audio watermarking algorithm which is based on a direct sequence spread spectrum method. The information that is to be embedded is modulated by a pseudo noise (PN) sequence. The spread spectrum signal is then shaped in the frequency domain and inserted into the original audio signal. To detect the watermark they used linear predictive coding method. Their experimental results show that their scheme is robust against different signal processing attacks. D. Kirovski et al [14] developed the techniques which effectively encode and decode the direct sequence spread spectrum watermark in audio signal. They have used the modulated complex lapped transform to embed the watermark. To prevent the desynchronization attack they developed the technique based on block repetition coding. Though they have proved that they can perform the correlation test in perfect synchronization, the wow and flutter induced in watermarked signal may cause false positive/false negative detection of watermark. To improve the reliability of watermark detection they proposed the technique which uses cepstrum filtering and chess watermarks. They have also shown that psychoacoustic frequency masking creates an imbalance in the number of positive and negative watermark chips in the part of the SS sequence that is used for correlation detection which corresponds to the audible part of the frequency spectrum. To compensate this problem they propose a modified covariance test. 14 Malvar et al [15] introduces a new watermarking modulation techniques called as Improved Spread Spectrum (ISS) technique. This scheme proposes a new embedding approach based on traditional SS embedding by slightly modifying it. In this scheme they introduced two parameters to control the distortion level and control the removal of carrier distortion on the detection statistics. At a certain values of these control parameters traditional SS can be obtained from this scheme. S. Esmaili et al [16] presented a novel audio watermarking scheme based on spread spectrum techniques that embeds a digital watermark within an audio signal using the instantaneous mean frequency (IMF) of the signal. This content-based audio watermarking algorithm was implemented to satisfy and maximize both imperceptibility and robustness of the watermark. They used short-time Fourier transform of the original audio signal to estimate a weighted IMF of the signal. Based on the masking properties of the psychoacoustic model the sound pressure level of the watermark was derived. Based on these results then modulation is performed to produce a signal dependent watermark that is imperceptible. This method allows 25 bits to embed and recover within a 5 second sample of an audio signal. Their experimental results show that the scheme is robust to common signal processing attacks including filtering, and noise addition whereas the Bit error rate (BER) increased to 0.08 for mp3 compression i.e. 2 out of 25 bits where not identified. D. Kirovski et al [17] devised a scheme for robust covert communication over a public audio channel using spread spectrum by imposing particular structures of watermark patterns and applying nonlinear filter to reduce carrier noise. This technique is capable to reliably detect watermark, even in audio clips modified using a composition attacks that degrade the content well beyond the acceptable limit. Hafiz Malik et al [18] proposed an audio watermarking method based on frequency selective direct sequence spread spectrum. The method improves the detection capability, watermarking capacity and robustness to desynchronization attacks. In this scheme the process of generating a watermark and embedding it into an audio signal is treated in the framework of spread spectrum theory. The original signal is treated as noise whereas the message information used to generate a watermark sequence is considered as data. The spreading sequence also called as PNsequence is treated as a key. The technique introduces lower mean square as well as perceptual distortion due to the fact that a watermark is embedded in a small frequency band of complete audible frequency range. 15 2.2 Methods using patchwork algorithm The patchwork technique was first presented in 1996 by Bender et al [19] for embedding watermarks in images. It is a statistical method based on hypothesis testing and relying on large data sets. As a second of CD quality stereo audio contains 88200 samples, a patchwork approach is applicable for the watermarking of audio sequences as well. The watermark embedding process uses a pseudorandom process to insert a certain statistic into a host audio data set, which is extracted with the help of numerical indexes (like the mean value), describing the specific distribution. The method is usually applied in a transform domain (Fourier, wavelet, etc.) in order to spread the watermark in time domain and to increase robustness against signal processing modifications [19]. Multiplicative patchwork scheme developed by Yeo et al [20] provides a new way of patchwork embedding. The Most of the embedding schemes are additive such as y=x+αw, while multiplicative embedding schemes have the form y=x (1+αw). Additive schemes shift average, while multiplicative schemes changes variance. Thus detection scheme exploits such facts. In this scheme first mean and variance of the sample values are computed in order to detect the watermarks, second they assume that distribution of the sample values is normal, third they try to decide the value of detection threshold adaptively. Cvejic et al [21] presented a robust audio watermarking method implemented in wavelet domain which uses the frequency hopping and patchwork method. Their scheme embeds the watermark to a mapped sub band in a predefined time period similar to frequency hopping approach in digital communication and detection method is modified patchwork algorithm. Their results show that the algorithm is robust against the mp3 compression, noise addition, requantization and resampling. For this system to be robust against the resampling attack it is required to find out the proper scaling parameter. R. Wang et al [22] proposed an audio watermarking scheme which embedded robust and fragile watermark at the same time in lifting wavelet domain. Robust watermark is embedded in the low frequency range using mean quantization. It had great robustness and imperceptibility. Fragile watermark is embedded in the high frequency range by quantizing single coefficient. When the audio signal is tampered, the watermark information will change synchronously. So it can be used for audio 16 content integrity verification. The watermark can be extracted without the original digital audio signal. Their experimental results show that robust watermark is robust to many attacks, such as mp3 compression, low pass filtering, noise addition, requantization, resampling and so on whereas fragile watermark is very sensitive to these attacks. 2.3 Methods implemented in Time Domain There are few algorithms implemented in time domain [23-26]. These algorithms embed the watermarks in the host signal in time domain by modifying the selected samples. W. Lie et al [23] proposed a method of embedding digital watermarks into audio signals in the time domain. Their algorithm exploits differential average-ofabsolute-amplitude relations within each group of audio samples to represent one-bit information. The principle of low-frequency amplitude modification is employed to scale amplitudes in a group manner (unlike the sample-by-sample manner as used in pseudo noise or spread-spectrum techniques) in selected sections of samples so that the time-domain waveform envelope can be almost preserved. Besides, when the frequency-domain characteristics of the watermark signal are controlled by applying absolute hearing thresholds in the psychoacoustic model, the distortion associated with watermarking is hardly perceivable by human ears. The watermark can be blindly extracted without knowledge of the original signal. Subjective and objective tests reveal that the proposed watermarking scheme maintains high audio quality and is simultaneously highly robust to pirate attacks, including mp3 compression, lowpass filtering, amplitude scaling, time scaling, digital-to-analog/analog-to-digital reacquisition, cropping, sampling rate change, and bit resolution transformation. Security of embedded watermarks is enhanced by adopting unequal section lengths determined by a secret key. In a method suggested by Bassia et al [24] watermark embedding depends on the audio signals amplitude and frequency in a way that minimizes the audibility of the watermark signal. The result is a slight amplitude modification of each audio sample in a way that does not produce any perceived effect. The audio signal is divided in Ns segments of N samples each. Each of these segments is watermarked with the bipolar sequence Wi ∈ {− 1,1}, i = 0,1,2......N − 1 , which is generated by 17 thresholding a chaotic map. The seed (starting point) of the chaotic sequence generator is the watermark key. By using generators of a strongly chaotic nature ensures that the system is cryptographically secure, i.e., the sequence generation mechanism cannot be inverse engineered even if an attacker can manage to obtain a part of the binary sequence. The watermark signal is embedded in each audio segment using the following three-stage procedure. The signal-dependent, low-pass-shaped watermark signal is embedded in the original signal segment to produce the watermarked signal segment. This scheme is statistically imperceivable and resists MPEG2 audio compression plus other common forms of signal manipulation, such as cropping, time shifting, filtering, resampling and requantization. A. N. Lemma et al [25] investigated an audio watermarking system is referred to as modified audio signal keying (MASK). In MASK, the short-time envelope of the audio signal is modified in such a way that the change is imperceptible to the human listener. In MASK, a watermark is embedded by modifying the envelope of the audio with an appropriately conditioned and scaled version of a predefined random sequence carrying some information (a payload). On the detector side, the watermark symbols are extracted by estimating the short-time envelope energy. To this end, first, the incoming audio is subdivided into frames, and then, the energy of the envelope is estimated. The watermark is extracted from this energy function. The MASK system can easily be tailored for a wide range of applications. Moreover, informal experimental results show that it has a good robustness and audibility behavior. 2.4 Methods implemented in Transform domain Synchronization attack is one of the key issues of digital audio watermarking. In this correspondence, a blind digital audio watermarking scheme against synchronization attack using adaptive quantization is proposed by X.Y. Wang et al [26]. The features of the their scheme are as follows: 1) a kind of more steady synchronization code and a new embedded strategy are adopted to resist the synchronization attack more effectively; 2) the multiresolution characteristics of discrete wavelet transform (DWT) and the energy-compression characteristics of discrete cosine transform (DCT) are combined to improve the transparency of digital watermark; 3) the watermark is embedded into the low frequency components by adaptive quantization according to human auditory masking; and 4) the scheme can extract the watermark without the help of the original digital audio signal. Experiment 18 results shows that the proposed watermarking scheme is inaudible and robust against various signal processing attacks such as noise adding, resampling, requantization, random cropping, and MPEG-1 Layer III (mp3) compression. Barker code has better self-relativity, so Huang et al. [27] chooses it as synchronization mark and embeds it into temporal domain and embeds the watermark information into DCT domain. It can resist synchronization attack effectively. But it has such defects as follows: 1) it chooses a 12-bit Barker code which is so short that it is easy to cause false synchronization; 2) it only embeds the synchronization code by modifying individual sample value, which reduces the resisting ability greatly (especially against resampling and mp3 compression); 3) it does not make full use of human auditory masking effect. S. Wu, J. Hang. et al [28] proposed a self-synchronization algorithm for audio watermarking to facilitate assured audio data transmission. The synchronization codes are embedded into audio with the informative data, thus the embedded data have the self-synchronization ability. To achieve robustness, they embed the synchronization codes and the hidden informative data into the low frequency coefficients in DWT (discrete wavelet transform) domain. By exploiting the time-frequency localization characteristics of DWT, the computational load in searching synchronization codes has been dramatically reduced, thus resolving the contending requirements between robustness of hidden data and efficiency of synchronization codes searching. The performance of the scheme is analyzed in terms of SNR (signal to noise ratio) and BER (bit error rate). An estimation formula that connects SNR with embedding strength has been provided to ensure the transparency of embedded data. BER under Gaussian noise corruption has been estimated to evaluate the performance of the proposed scheme. The experimental results are presented to demonstrate that the embedded data are robust against most common signal processing and attacks, such as Gaussian noise corruption, resampling, requantization, cropping, and mp3 compression. Li et al [29] developed the watermarking technique in wavelet domain based on SNR to determine the scaling parameter required to scale the watermark. The intensity of embedded watermark can be modified by adaptively adjusting the scaling parameter. The authors have proved that the scheme is robust against different signal processing attacks and provide better embedding degree. This scheme requires the original signal to recover the watermark. This motivates us to develop the SNR based 19 scheme to detect and extract the watermark without using the original signal. The watermark embedding procedure adaptively selects the watermark scaling parameter α for each of the section of audio segment selected for embedding. A new watermarking technique capable of embedding multiple watermarks based on phase modulation is devised by A. Takahashi et al [30]. The idea utilizes the insensitivity of the human auditory system to phase changes with relatively long transition period. In this technique the phase modulation of the original signal is realized by means of a time-varying all-pass filter. To accomplish the blind detection which is required in detecting the copy control information, this watermark is assigned to the inter-channel phase difference between a stereo audio signal by using frequency shift keying. Meanwhile, the copyright management information and fingerprint are embeds in to both channels by using phase shift keying of different frequency components. Consequently these three kinds of information are simultaneously embedded into a single time frame. The imperceptibility of the scheme is confirmed through a subjective listening test. The technique is robust against several kinds of signal processing attacks evaluated by computer simulations. Author found that their method has good performance in both subjective and objective tests. H. H. Tsai et al [31] proposed a new intelligent audio watermarking method based on the characteristics of the HAS and the techniques of neural networks in the DCT domain. The method makes the watermark imperceptible by using the audio masking characteristics of the HAS. Moreover the method exploits a neural network for memorizing the relationships between the original audio signals and the watermarked audio signals. Therefore the method is capable of extracting watermarks without original audio signals. Their experimental results show that the method significantly possesses robustness to be immune against common attacks for the copyright protection of digital audio. C. Xu et al [32] implemented a method to embed and extract the digital compressed audio. The watermark is embedded in partially uncompressed domain and the embedding scheme is high related to audio content. The watermark content contains owner and user identifications and the watermark embedding and detection can be done very fast to ensure on-line transactions and distributions. X. Li et al [33] developed a data hiding scheme for audio signals in cepstrum domain. Cepstrum representation of audio can be shown to be very robust to a wide 20 range of attacks including most challenging time-scaling and pitch shifting warping. The authors have embedded the data by manipulating statistical mean of selected cepstrum coefficients. Psychoacoustic model is employed to control the audibility of introduced distortion. S. K. Lee et al [34] suggested a watermarking algorithm in cepstrum domain. They insert a digital watermark into the cepstral components of the audio signal using a technique analogous to spread spectrum communications, hiding a narrow band signal in a wideband channel. The pseudorandom sequence used as watermark is weighted in the cepstrum domain according to the distribution of cepstral coefficients and the frequency masking characteristics of human auditory system. Watermark embedding minimizes the audibility of the watermark signal. The technique is robust against multiple watermark, MPEG coding and noise addition. There are various techniques implemented in wavelet domain [35-41]. In these papers it is proved that the wavelet domain is the more suitable domain compare to the other transform domains. As the wavelet coefficients contain the multiple spectrums of multiple band frequencies, this transform is more suitable than other transform domains to select the perceptible band of frequencies for data embedding. 2.5 Other recently developed algorithms: Audio watermarking is usually used as a multimedia copyright protection tool or as a system that embed metadata in audio signals. In the method suggested by S. D. Larbi et al [42] watermarking is viewed as a preprocessing step for further audio processing systems: the watermark signal conveys no information, rather it is used to modify the nonstationarity. The embedded watermark is then added in order to stationnarize the host signal. Indeed the embedded watermark is piecewise stationary, thus it modifies the stationarity of the original audio signal. In some audio processing this can be used to improve the performances that are very sensitive to time variant signal statistics. The authors have presented the analysis of perceptual watermarking impact on the stationarity of audio signals. Their study was based on stationarity indices, which represented a measure of variations in spectral characteristics of signals, using time-frequency representations. They had presented their simulation results with two kinds of signals, artificial signals and audio signals. They had observed the significant enhancement in stationarity indices of watermarked signal, especially for transient attacks. 21 T. Furon et al [43] investigated an asymmetric watermarking method as an alternative to direct sequence spread spectrum technique (DSSS) of watermarking. This method is developed to provide higher security level against malicious attacks threatening watermarking techniques used for a copy protection purpose. This application, which is quite different from the classical copyright enforcement issue is extremely challenging as no public algorithm is yet known to be secure enough and some proposed proprietary techniques are already hacked. The asymmetric detectors need more complexity and more money and they accumulate bigger amount of content in order to take decision. Conventional watermarking techniques based on echo hiding provide many benefits, but also have several disadvantages, for example, lenient decoding process, weakness against multiple encoding attacks etc. B.S. Ko et al [44] improve the weak points of conventional echo hiding by time-spread echo method. Spreading an echo in the time domain is achieved by using pseudo-noise (PN) sequences. By spreading the echo the amplitude of each echo can be reduced, i.e. the energy of each echo becomes small, so that the distortion induced by watermarking is imperceptible to humans while the decoding performance of the embedded watermarks is better maintained as compared with the case of conventional echo hiding method. Authors have proved this by computer simulations, in which several parameters, such as the amplitude and length of PN sequences and analysis window length, were varied. Robustness against typical signal processing was also evaluated in their simulations and showed fair performance. Results of listening test using some pieces of music showed good imperceptibility. S. Eerüçük et al [45] introduced a novel watermark representation for audio watermarking, where they embed linear chirps as watermark signals. Different chirp rates, i.e. slopes on time-frequency plane, represent watermark messages such that each slope corresponds to a unique message. These watermark signals, i.e. linear chirps, are embedded and extracted using an existing watermarking algorithm. The extracted chirps are then post processed at the receiver using a line detection algorithm based on the Hough-Radon transform (HRT). The HRT is an optimal line detection algorithm, which detects directional components that satisfy a parametric constraint equation in the image of a TF plane, even at discontinuities corresponding to bit errors. The simulations carried by authors showed that HRT correctly detects the embedded watermark message after signal processing operations for bit error rates 22 up to 20%. The new watermark representation and the post processing stage based on HRT can be combined with existing embedding/extraction algorithms for increased robustness. A new adaptive blind digital audio watermarking algorithm is proposed by X. Wang et al [46] on the basis of support vector regression (SVR). This algorithm embeds the template information and watermark signal into the original audio by adaptive quantization according to the local audio correlation and human auditory masking. During the watermark extraction the corresponding features of template and watermark are first extracted from the watermarked signal. Then, the corresponding feature template is selected as training sample to train SVR and an SVR model is returned. Finally the actual outputs are predicted according to the corresponding feature of watermark, and the digital watermark is recovered from the watermarked audio by using the well-trained SVR. The algorithm is not only robust against various signal processing attacks but also has high perceptibility. The performance of the algorithm is better than other SVM audio watermarking schemes. 2.6 Audio watermarking techniques against time scale modification Synchronization attacks are a serious problem to any watermarking schemes. Audio processing such as random cropping and time scale modification (TSM) causes displacement between embedding and detection in time domain and is difficult for watermark to survive. TSM is a serious attack to audio watermarking, very few algorithms can effectively resist this kind of synchronization attack. According to the Secure Digital Music Initiative (SDMI) Phase-II robustness test requirement, practical audio watermarking schemes should be able to withstand pitch-invariant TSM up to ±4%. Mansour and Twefik [47] proposed to embed watermark by changing the relative length of the middle segment between two successive maximum and minimum of the smoothed waveform, the performance highly depends on the selection of the threshold and it is a delicate work to find an appropriate threshold. Mansour and Twefik [48] proposed another algorithm for embedding data into audio signals by changing the interval lengths between salient points in the signal. The extreme point of the wavelet coefficients from the selected envelop is adopted as salient points. 23 W. Li et al [49] have suggested a novel content – dependent localized scheme to combat synchronization attacks like random cropping and time-scale modification. The basic idea is to first select steady high-energy local regions that represent music edges like note attacks, transitions or drum sounds by using different methods, then embed the watermark in these regions. Such regions are of great importance to the understanding of music and will not be changed much for maintaining high auditory quality. In this way the embedded watermark will have the potential to escape all kinds of distortions. Experimental results carried out by authors show that the method is highly robust against some common signal processing attack and synchronization attack. This method has its inherent limitations. Although it is suitable for most modern music with obvious rhythm, it does not work with some classical music without apparent rhythm. S. Xiang et al [50] presented a multibit robust audio watermarking solution by using the insensitivity of the audio histogram shape and the modified mean to TSM and cropping operations. Authors have addressed the insensitivity property in both mathematical analysis and experimental testing by representing the histogram shape as the relative relations in the number of samples among groups of three neighboring bins. By reassigning the number of samples in groups of three neighboring bins, the watermark sequence is successfully embedded. In the embedding process, the histogram is extracted from a selected amplitude range by referring to the mean in such a way that the watermark will be able to be resistant to amplitude scaling and avoid exhaustive search in the extraction process. They observed that the watermarked audio signal is perceptibly similar to the original one. Experimental results demonstrated by authors prove the robustness of the scheme against TSM and random cropping attacks and has a satisfactory robustness for those common signal processing attacks. A blind digital audio watermarking scheme against synchronization attack using adaptive mean quantization is developed by X-Y. Wang et al [51]. The features of the scheme are as follows 1) a kind of more steady synchronization code and a new embedded strategy are adopted to resist the synchronization attack more effectively; 2) the multiresolution characteristics of DWT and energy-compression characteristics of discrete cosine transform are combined to improve the transparency of digital watermark 3) the watermark is embedded into the low frequency components by adaptive quantization according to human auditory masking; and 4)the scheme can 24 extract the watermark without the help of original audio signal. The experimental result added in the paper show that the technique can resist the various signal processing attacks. 2.7 Papers studied on performance analysis and evaluation of watermarking systems Powerful and low cost computers allow people to easily create and copy multimedia content, and the Internet has made it possible to distribute this information at very low cost. However, these enabling technologies also make it easy to illegally copy, modify, and redistribute multimedia data without regard for copyright ownership. Many techniques have been proposed for watermarking audio, image, and video, and comprehensive surveys of these technologies is presented in previous sections. However, it is required to consider an effective means of comparing the different approaches. J. D. Gordy et al [52] have presented an algorithm independent set of criteria for quantitatively comparing the performance of digital watermarking algorithms. Four criterions were selected by authors as a part of the evaluation framework. They were chosen to reflect the fact that watermarking is effectively a communications system. In addition, the criteria are simple to test, and may be applied to any type of watermarking system (audio, image, or video). 1) Bit rate refers to the amount of watermark data that may be reliably embedded within a host signal per unit of time or space, such as bits per second or bits per pixel. A higher bit rate may be desirable in some applications in order to embed more copyright information. Reliability was measured as the bit error rate (BER) of extracted watermark data. 2)Perceptual quality refers to the imperceptibility of embedded watermark data within the host signal. In most applications, it is important that the watermark is undetectable to a listener or viewer. This ensures that the quality of the host signal is not perceivably distorted, and does not indicate the presence or location of a watermark. The signal-to-noise ratio (SNR) of the watermarked signal versus the host signal was used as a quality measure. 3) Computational complexity refers to the processing required to embed watermark data into a host signal, and / or to extract the data from the signal. Actual CPU timings (in seconds) of algorithm implementations were collected. 4) Watermarked digital signals may undergo common signal 25 processing operations such as linear filtering, sample requantization, D/A and A/D conversion, and lossy compression. Although these operations may not affect the perceived quality of the host signal, they may corrupt the watermark data embedded within the signal. It is important to know, for a given level of host signal distortion, which watermarking algorithm will produce a more reliable embedding. Robustness was measured by the bit error rate (BER) of extracted watermark data as a function of the amount of distortion introduced by a given operation. The performance of spread-transform dither modulation watermarking in the presence of two important classes of non additive attacks, such as the gain attack plus noise addition and the quantization attack are evaluated by F. Bartolini et al [53]. The authors developed the analysis under the assumption that the host features are independent and identically distributed Gaussian random variables, and a minimum distance criterion is used to decode the hidden information. The theoretical bit-error probabilities are derived in closed form, thus permitting to evaluate the impact of the considered attacks on the watermark at a theoretical level. The analysis is validated by means of extensive Monte-Carlo simulations. In addition to the validation of the theoretical analysis, Monte-Carlo simulations permitted to abandon the hypothesis of normally distributed host features, in favor of more realistic models adopting a Laplacian or a generalized Gaussian probability density function. The general result of the analysis carried out by authors is that the excellent performance of ST-DM is confirmed in all cases with only noticeable exception of the gain attack. Hidden copyright marks have been proposed as a solution for solving the illegal copying and proof of ownership problems in the context of multimedia objects. Many systems have been proposed by different authors but it was difficult to have idea of their performance and hence to compare them. Then F.A.P. Petitcolas et al [54] propose a benchmark based on a set of attacks that any system ought to survive. G.C. Rodriguez et al [55] presented a survey report on audio watermarking in which watermarking techniques are briefly summarized and analyzed. They have made the following observations: • The patchwork scheme and cepstrum domain scheme are robust to several signal manipulations, but for real applications authors suggest to use patchwork scheme because the cepstrum domain scheme needs the original 26 signal to determine that the host signal is marked as a consequence it needs the double storage capacity. • The echo hiding scheme only fulfill with the inaudibility condition and is not robust to several attacks such as mp33 compression, filtering, resampling, etc. In early September 2000 Secure Digital Music Initiative (SDMI) announced a three-week open challenge for its phase II screening, inviting the public to evaluate the attach resistance for four watermark techniques. The challenge emphasized on testing the effectiveness of robust watermarks, which is crucial in ensuring the proper functioning of the entire system. M. Wu et al [56] points out some weaknesses in these watermark techniques and suggest directions for further improvement. Authors have provided the general framework for analyzing the robustness and security of audio watermark systems. 2.8 Watermark Attacks Research in digital watermarking has progressed along two paths. While new watermarking technologies are being developed, some researchers are also investigating different ways of attacking digital watermarks. Some of the attacks that have been proposed in the literature are reviewed here. Frank Hartung et al [57] have shown that the spread spectrum watermarks and watermark detectors are vulnerable to a variety of attacks. However with appropriate modifications to the embedding and extraction methods, methods can be made much more resistant to variety of such attacks. Frank et al classified the attack in four groups: a) Simple attack attempt to impair the embedded watermark by manipulation of the whole watermarked data. b) Detection disabling attacks – attempt to break the connection and to make the recovery of watermark infusible or infeasible for a watermark detector. c) Ambiguity attacks – attempt to analyze the watermarked data, estimate the watermark or host data, separate the watermarked data into host data and watermark, and discard only the watermark. Frank – Huntung et al [57] also suggested the counter attack to those attacks. Martin Kutter et al [58] suggested the watermark copy attack, which is based on an estimation of the embedded watermark in the spatial domain through a filtering process. The estimate of the watermark is then adapted and inserted into the target image. To illustrate the performance of the proposed attack they applied it to 27 commercial and non-commercial watermarking schemes. The experiments showed that the attack is very effective in copying a watermark from one image to a different image. Alexander et al [59] suggested the watermark template attack. This attack estimates the corresponding template points in the FFT domain and then removes them using local interpolation. The approach is not limited to FFT domain; other transformation domains may also exploit similar variants at this attack. J. K. Su et al [60] suggested a Channel Model for a Watermark Attack. Authors have analyzed this attach for images and stated that the attack can be applied to audio/video watermarking schemes. D. Kirovski et al [61] analyzed the security of multimedia copyright protection systems that use watermarks by proposing a new breed of attacks on generic watermarking systems. A typical blind pattern matching attack relies on the observation that multimedia content is often highly repetitive. Thus the attack procedure identifies subsets of signal blocks that are similar and permutes these blocks. Assuming that permuted blocks are marked with distinct secrets, it can be shown that any watermark detector is facing a task of exponential complexity to reverse the permutations as a preprocessing step for watermark detection. Authors have described the implementation of attack against a spread-spectrum and a quantization index modulation data hiding technology for audio signals. 2.9 Research problems identified: The problems identified from the literature survey carried out in this chapter include: 1) Construction of the method that would identify perceptually significant components from an analysis of image/audio and the Human visual system/ Human auditory system. 2) The system must be tested against lossy operations such as mp3 and data conversion. The experiments must be expanded to validate the results. 3) There is a need to explore novel mechanisms for effective encoding and decoding of watermark using DSSS in audio. The technique may aim at improving detection convergence and robustness, improving watermark imperceptiveness. Preventive attacks such as desynchronization attack and possibility of establishing covert communication over a public audio channel. 28 4) Possible asymmetric watermark method may be an alternative to classical DSSS watermarking, which may provide higher security level against malicious attacks. 5) Possible generation of a framework for blind watermark detection. 6) Possibility of suggesting new malicious attacks and counter attack for available watermarking techniques. 7) Possibility of embedding audio watermark in audio and design an adaptive system to overcome number of non-intentional attacks. Concluding remarks: Chapter 2 reviews the literature and describes the concept of information hiding in audio sequences. Scientific publications included in the literature survey have been chosen in order to build a sufficient background that would help out in better understanding of the research topic. A survey of the key digital audio watermarking algorithms and techniques presented are classified by the signal domain in which the watermark is inserted and statistical method used for embedding and extraction of watermark bits. Audio watermarking initially started as a sub-discipline of digital signal processing, focusing mainly on convenient signal processing techniques to embed additional information to audio sequences. This included the investigation of a suitable transform domain for watermark embedding and schemes for imperceptible modification of the host audio. Only recently has watermarking been placed to a stronger theoretical foundation, becoming a more mature discipline with a proper base in both communication modeling and information theory. My research concentrates on developing an audio watermarking technique to detection convergence and robustness, improving watermark imperceptiveness. An attempt is also made to embed the audio data in audio signal during this research. 29 Chapter 3 Research Problem Chapter 3 Research Problem Introduction The fundamental process in each watermarking system can be modeled as a form of communication where a message is transmitted from watermark embedder to the watermark receiver [2]. The process of watermarking is viewed as a transmission channel through which the watermark message is being sent, with the host signal being a part of that channel. In Figure 3.1, a general mapping of a watermarking system into a communications model is given. After the watermark is embedded, the watermarked signal is usually distorted after watermark attacks. The distortions of the watermarked signal are, similarly to the data communications model, modeled as additive noise. Noise Watermark embedder Input Message Watermark encoder m cw Wa wk Watermark key co Host signal n cwn Watermark detector Watermark Decoder Output Message Watermark key Fig. 3.1. A watermarking system and an equivalent communications model. When setting down the research plan for this study, the research of digital audio watermarking was in its early development stage; the first algorithms dealing specifically with audio were presented in 1996 [12]. Although there were a few papers published at the time, basic theory foundations were laid down and the concept of the "magic triangle" introduced [62]. Therefore, it is natural to place watermarking into the framework of the traditional communications system. The main line of reasoning of the "magic triangle" concept [62] is that if the perceptual transparency parameter is fixed, the design of a watermark system cannot obtain high robustness and watermark data rate at the same time. Thus, the research problem can be divided into three specific subproblems. They are: SP1: What is the highest watermark bit rate obtainable, under perceptual transparency constraint, and how to approach the limit? 31 SP2: How can the detection performance of a watermarking system be improved using algorithms based on communications models for that system? SP3: How can overall robustness to attacks of a watermark system be increased using an attack characterization at the embedding side? The division of the research problem into the three subproblems above defines the following three research hypotheses: RH1: To obtain a distinctively high watermark data rate, embedding algorithm can be implemented in a transform domain. RH2: To improve detection performance, a spread spectrum method can be used. RH3: To achieve the robustness of watermarking algorithms, an attack characterization can be introduced at the embedder. The general research assumption is that the process of embedding and extraction of watermarks can be modeled as a communication system, where the watermark embedding is modeled as a transmitter, the distortion of watermarked signal as a communications channel noise and watermark extraction as a communications detector. It is also assumed that modeling of the human auditory system and the determination of perceptual thresholds can be done accurately using models from audio coding, namely MPEG compression HAS model. The perceptual transparency (inaudibility) of a proposed audio watermarking scheme can be confirmed through subjective listening tests in a predefined laboratory environment with participation of a predefined number of people with a different music education and background. The imperceptibility can also be measured by computing signal to noise ratio between original host signal and watermarked audio signal. A central assumption in the security analysis of the proposed algorithms is that an adversary that attempts to disrupt the communication of watermark bits or remove the watermark does not have access to the original host audio signal. The adversary should not be able to extract the watermark with any statistical analysis. The embedded watermark should sustain against all kinds of signal processing attacks proving better robustness of the scheme. In this thesis we concentrate on developing the scheme which withstand against signal processing attack and provide better robustness maintaining the imperceptibility. 32 Summary In this thesis, a multidisciplinary approach is applied for solving the research subproblems. The signal processing methods are used for watermark embedding and extracting processes, derivation of perceptual thresholds, transforms of signals to different signal domains (e.g. DCT domain, wavelet domain). Communication principles and models are used for channel noise modeling, different ways of signaling the watermark (e.g. a direct sequence spread spectrum method, frequency hopping method), and evaluation of overall detection performance of the algorithm (bit error rate, normalized correlation value at detection). The research methods also include algorithm simulations with real data (music sequences) and subjective listening tests. 33 Chapter 4 High Capacity Covert Communication for Audio Chapter 4 High Capacity Covert Communication for Audio Introduction The simplest visualization of the requirements of information hiding in digital audio is so called magic triangle [62], given in Figure 4.1. Inaudibility, robustness to attacks, and the watermark data rate are in the corners of the magic triangle. This model is convenient for a visual representation of the required trade-offs between the capacity of the watermark data and the robustness to certain watermark attacks, while keeping the perceptual quality of the watermarked audio at an acceptable level. It is not possible to attain high robustness to signal modifications and high data rate of the embedded watermark at the same time. Therefore, if a high robustness is required from the watermarking algorithm, the bit rate of the embedded watermark will be low and vice versa, high bit rate watermarks are usually very fragile in the presence of signal modifications. However, there are some applications that do not require that the embedded watermark has a high robustness against signal modifications. In these applications, the embedded data is expected to have a high data rate and to be detected and decoded using a blind detection algorithm. While the robustness against intentional attacks is usually not required, signal processing modifications, like noise addition, should not affect the covert communications [17]. Inaudibility Robustness Data rate Fig 4.1 Magic triangle three contradictory requirement of watermarking One interesting application of high capacity covert communications is public watermark embedded into the host multimedia that is used as the link to external databases that contain certain additional information about the multimedia file itself, e.g. copyright information and licensing conditions [17]. Another application with similar requirements is the transmission of meta-data along with multimedia. Meta- 35 data embedded in, e.g. audio clip, may carry information about a composer, soloist, genre of music, etc. [17]. Usually embedding is performed in high amplitude portions of the signal either in time or frequency domain. The techniques explained in [12-16] where the binary data is used as a watermark, the proposed technique uses the audio information as the watermark. The objective of the method implemented here is to provide an audio watermarking technique that can be used for covert communication of audio signals. In the proposed technique the input audio is decomposed using discrete wavelet transform (DWT). The covert message (audio watermark) is embedded to the detailed coefficient of three level decomposed original audio signal. The results of the technique described in this chapter are presented partly in paper I and paper II. Section 4.1 gives an overview of the properties of HAS. Sections 4.2 concentrate on understanding of the 1-D wavelet decomposition. The idea of the proposed technique is described in section 4.3. Section 4.4 discusses the experimental results obtained for this method. 4.1. Overview of the properties of HAS Watermarking of audio signals is more challenging compared to the watermarking of images or video sequences, due to wider dynamic range of the HAS in comparison with human visual system (HVS). The HAS perceives sounds over a range of power greater than 109:1 and a range of frequencies greater than 103:1. The sensitivity of the HAS to the additive white Gaussian noise (AWGN) is high as well; this noise in a sound file can be detected as low as 70 dB below ambient level. On the other hand, opposite to its large dynamic range, HAS contains a fairly small differential range, i.e. loud sounds generally tend to mask out weaker sounds. Additionally, HAS is insensitive to a constant relative phase shift in a stationary audio signal and some spectral distortions interprets as natural, perceptually non-annoying ones. Auditory perception is based on the critical band analysis in the inner ear where a frequency-to-location transformation takes place along the basilar membrane. The power spectra of the received sounds are not represented on a linear frequency scale but on limited frequency bands called critical bands. The auditory system is usually modeled as a band-pass filter-bank, consisting of strongly overlapping band-pass filters with bandwidths around 100 Hz for bands with a central frequency below 500 36 Hz and up to 5000Hz for bands placed at high frequencies. If the highest frequency is limited to 24000 Hz, 26 critical bands have to be taken into account. Two properties of the HAS dominantly used in watermarking algorithms are frequency (simultaneous) masking and temporal masking [62]. The concept using the perceptual holes of the HAS is taken from wideband audio coding (e.g. MPEG compression 1, layer 3, usually called mp3)[66]. In the compression algorithms, the holes are used in order to decrease the amount of the bits needed to encode audio signal, without causing a perceptual distortion to the coded audio. On the other hand, in the information hiding scenarios, masking properties are used to embed additional bits into an existing bit stream, again without generating audible noise in the audio sequence used for data hiding. Frequency (simultaneous) masking is a frequency domain phenomenon where a low level signal, e.g. a pure tone (the maskee), can be made inaudible (masked) by a simultaneously appearing stronger signal (the masker), e.g. a narrow band noise, if the masker and maskee are close enough to each other in frequency [67]. A masking threshold can be derived below which any signal will not be audible. The masking threshold depends on the masker and on the characteristics of the masker and maskee (narrowband noise or pure tone). For example, with the masking threshold for the sound pressure level (SPL) equal to 60 dB, the masker in Figure 4.2 at around 1 kHz, the SPL of the maskee can be surprisingly high; it will be masked as long as its SPL is below the masking threshold. The slope of the masking threshold is steeper toward lower frequencies; in other words, higher frequencies tend to be more easily masked than lower frequencies. It should be pointed out that the distance between masking level and masking threshold is smaller in noise-masks tone experiments than in tone-masks-noise experiments due to HAS’s sensitivity toward additive noise. Noise and low-level signal components are masked inside and outside the particular critical band if their SPL is below the masking threshold. Noise contributions can be coding noise, inserted watermark sequence, aliasing distortions, etc. Without a masker, a signal is inaudible if its SPL is below the threshold in quiet, which depends on frequency and covers a dynamic range of more than 70 dB as depicted in the lower curve of Figure 4.2. 37 Fig 4.2 Frequency masking in the human auditory system (HAS) The qualitative sketch of Figure 4.3 gives more details about the masking threshold. The distance between the level of the masker (given as a tone in Figure 4.3) and the masking threshold is called signal-to-mask ratio (SMR) [66]. Its maximum value is at the left border of the critical band. Within a critical band, noise caused by watermark embedding will be audible as long as signal-to-noise ratio (SNR) for the critical band [16] is higher than its SMR. Let SNR (m) be the signal-to-noise ratio resulting from watermark insertion in the critical band m; the perceivable distortion in a given sub-band is then measured by the noise to mask ratio: NMR (m) = SMR - SNR (m) The noise-to-mask ratio NMR (m) expresses the difference between the watermark noise in a given critical band and the level where a distortion may just become audible; its value in dB should be negative. 38 Fig. 4.3. Signal-to-mask-ratio and Signal-to-noise-ratio values. This description is the case of masking by only one masker. If the source signal consists of many simultaneous maskers, a global masking threshold can be computed that describes the threshold of just noticeable distortion (JND) as a function of frequency. The calculation of the global masking threshold is based on the high resolution short-term amplitude spectrum of the audio signal, sufficient for critical band-based analysis and is usually performed using 1024 samples in FFT domain. In a first step, all the individual masking thresholds are determined, depending on the signal level, type of masker (tone or noise) and frequency range. After that, the global masking threshold is determined by adding all individual masking thresholds and the threshold in quiet. The effects of the masking reaching over the limits of a critical band must be included in the calculation as well. Finally, the global signal-to-noise ratio is determined as the ratio of the maximum of the signal power and the global masking threshold [66], as depicted in Figure 4.2. Temporal Masking refers to sounds that are heard at different time instances. Temporal masking can be either premasking (backward) or post-masking (forward). When the masker affects a previous sound, the masking is called premasking whereas, when the masker affects a subsequent sound, the masking is called post-masking. In general, premasking is not as intense as post-masking. Pre-masking occurs for duration of 5–20 ms before the masker is turned on. Post-masking occurs for duration of 50–200 ms after the masker is turned off. The temporal masking effects appear 39 before and after a masking signal have been switched on and off, respectively (Figure 4.4). The duration of the premasking is significantly less than one-tenth that of the post-masking, which is in the interval of 50 to 200 milliseconds. Both pre- and postmasking have been exploited in the MPEG audio compression algorithm and several audio watermarking methods. Fig 4.4 Temporal masking in the human auditory system (HAS). 4.2. Discrete Wavelet transform: The wavelet series is just a sampled version of continuous wavelet transform (CWT) and its computation may consume significant amount of time and resources, depending on the resolution required. The discrete wavelet transform (DWT), which is identical to a hierarchical sub band system, where the sub bands are logarithmically spaced in frequency domain. In DWT, a time- scale representation of the digital signal is obtained using digital filtering techniques. The signal to be analyzed is passed through filters with different cutoff frequencies at different scales. Filters are one of the most widely used signal processing functions. Wavelets can be realized by iteration of filters with rescaling. The resolution of the signal, which is a measure of the amount of detail information in the signal, is determined by the filtering operations, and the scale is determined by upsampling and downsampling (subsampling) operations. The DWT is computed by successive lowpass and highpass filtering of the discrete time-domain signal as shown in figure 4.5. This is called the Mallat algorithm or Mallat-tree decomposition. Its significance is in the manner it connects the continuous-time mutiresolution to discrete-time filters. In the figure, the signal is 40 denoted by the sequence x[n], where n is an integer. The low pass filter is denoted by G while the high pass filter is denoted by H . At each level, the high pass filter 0 0 produces detail information; d[n], while the low pass filter associated with scaling function produces coarse approximations, a[n]. At each decomposition level, the half band filters produce signals spanning only half the frequency band. This doubles the frequency resolution as the uncertainty in frequency is reduced by half. In accordance with Nyquist’s rule if the original signal has a highest frequency of ω, which requires a sampling frequency of 2ω radians, then it now has a highest frequency of ω/2 radians. It can now be sampled at a frequency of ω radians thus discarding half the samples with no loss of information. This decimation by 2 halves the time resolution as the entire signal is now represented by only half the number of samples. Thus, while the half band low pass filtering removes half of the frequencies and thus halves the resolution, the decimation by 2 doubles the scale. Fig 4.5 Mallat- tree decomposition With this approach, the time resolution becomes arbitrarily good at high frequencies, while the frequency resolution becomes arbitrarily good at low frequencies. The time-frequency plane is thus resolved. The filtering and decimation process is continued until the desired level is reached. The maximum number of levels depends on the length of the signal. The DWT of the original signal is then obtained by concatenating all the coefficients, a[n] and d[n], starting from the last level of decomposition. 41 Fig 4.6 Reconstruction of original signal from wavelet coefficients. Figure 4.6 shows the reconstruction of the original signal from the wavelet coefficients. Basically, the reconstruction is the reverse process of decomposition. The approximation and detail coefficients at every level are up sampled by two, passed through the low pass and high pass synthesis filters and then added. This process is continued through the same number of levels as in the decomposition process to obtain the original signal. The Mallat algorithm works equally well if the analysis filters, G and H , are exchanged with the synthesis filters, G and H . 0 0 1 1 4.2.1. Conditions for Perfect Reconstruction In most Wavelet Transform applications, it is required that the original signal be synthesized from the wavelet coefficients. To achieve perfect reconstruction the analysis and synthesis filters have to satisfy certain conditions. Let G0(z) and G1(z) be the low pass analysis and synthesis filters, respectively and H0(z) and H1(z) the high pass analysis and synthesis filters respectively. Then the filters have to satisfy the following two conditions as given in [4] : G0 (-z) G1 (z) + H0 (-z). H1 (z) = 0 (4.1) G0 (z) G1 (z) + H0 (z). H1 (z) = 2z-d (4.2) The first condition implies that the reconstruction is aliasing-free and the second condition implies that the amplitude distortion has amplitude of one. It can be observed that the perfect reconstruction condition does not change if we switch the analysis and synthesis filters. There are a number of filters which satisfy these conditions. But not all of them give accurate Wavelet Transforms, especially when the filter coefficients are quantized. The accuracy of the Wavelet Transform can be determined after reconstruction by calculating the Signal to Noise Ratio (SNR) of the 42 signal. Some applications like pattern recognition do not need reconstruction, and in such applications, the above conditions need not apply. 4.2.2. Classification of wavelets We can classify wavelets into two classes: (a) orthogonal and (b) biorthogonal. Based on the application, either of them can be used. 4.2.2.1. Features of orthogonal wavelet filter banks The coefficients of orthogonal filters are real numbers. The filters are of the same length and are not symmetric. The low pass filter, G and the high pass filter, H 0 0 are related to each other by H (z) = z 0 -N -1 G (-z ) 0 (4.3) The two filters are alternated flip of each other. The alternating flip automatically gives double-shift orthogonality between the lowpass and highpass filters [1], i.e., the scalar product of the filters, for a shift by two is zero. i.e., ∑G[k] H[k-2l] = 0, where k,lЄZ [4]. Filters that satisfy equation 4.3 are known as Conjugate Mirror Filters (CMF). Perfect reconstruction is possible with alternating flip. Also, for perfect reconstruction, the synthesis filters are identical to the analysis filters except for a time reversal. Orthogonal filters offer a high number of vanishing moments. This property is useful in many signal and image processing applications. They have regular structure which leads to easy implementation and scalable architecture. 4.2.2.2. Features of biorthogonal wavelet filter banks In the case of the biorthogonal wavelet filters, the low pass and the high pass filters do not have the same length. The low pass filter is always symmetric, while the high pass filter could be either symmetric or anti-symmetric. The coefficients of the filters are either real numbers or integers. For perfect reconstruction, biorthogonal filter bank has all odd length or all even length filters. The two analysis filters can be symmetric with odd length or one symmetric and the other antisymmetric with even length. Also, the two sets of analysis and synthesis filters must be dual. The linear phase biorthogonal filters are the most popular filters for data compression applications. 43 4.3. Audio watermarking for Covert Communication: The purpose of high capacity audio watermarking scheme implemented for covert communication is to hide the covert message in to the host multimedia signal for secure data transmission. The covert message can include a binary message, text message or audio data required to be transferred over a public channel. An attempt is made to hide audio information in the host audio signal. The method implemented here embeds the audio watermark in the host audio signal. The watermark is embedded in the LSB of the host audio signal. Since audio information is used as a watermark in this method the capacity of watermarking is increased compare to the other techniques which embed one bit of binary information in one sample of audio signal. The basic block diagram of watermark embedding is shown in Fig 4.7. A three level wavelet decomposition of an original signal x is first computed as explained in section 4.2. To embed the data in low middle frequency components the d3 coefficients are selected. Then the detailed coefficients d3 of original signal are modified for embedding the watermark. The audio watermark (covert signal) is scaled with the secret key of the author and embedded in d3. The selection criterion for this key is based on the psychoacoustic model of host signal. The numerical value of the key should be selected in such away that the embedded watermark should be imperceptible. To reconstruct the watermark from the watermarked signal the knowledge of secret key is necessary. Without the knowledge of secret key it is impossible to extract the watermark. The scaled watermark then added to the selected d3 coefficients. Then inverse DWT is computed to obtain the watermarked signal. The algorithm to embed the watermark in the original signal is explained as follows: • The original signal X is decomposed to get the approximation and detailed coefficients using DWT function available in Matlab. The daubechies wavelet (db2) is used as mother wavelet to decompose the signal. • The audio watermark w is scaled with the scaling factor α that is used as the secret key for the author. Without the knowledge of parameter α extraction of w is impossible. • The scaled watermark w’ is computed from w as (w’ = w * α). The purpose of the scaling parameter is to reduce the amplitude of the original watermark signal. 44 • The value of α is selected in such away that α<1 and should not disturb the imperceptibility of watermarked signal y. The scaled watermark w’ is embedded in d3 as d3’ = d3 + w’. • The inverse IDWT is computed using the idwt function in Matlab. Original Signal 3-level wavelet decomposition Audio Watermark Watermark Embedding Secret key Inverse DWT Watermarked signal Fig.4.7 Watermark embedding The signal to noise ratio (SNR) [52] between x and y is computed to measure the imperceptibility of the proposed technique. SNR = 10 log ∑ x 2 (i ) ∀i ∑ ( x (i ) − y (i )) 2 (4.4) ∀i The watermark is extraction from the watermarked audio signal as shown in Fig. 4.8. The three level decomposition of original audio signal and the watermarked audio signal is performed. d3 coefficients of both the signal is selected, then with the help of secret key the watermark is extracted from the watermarked audio signal y as w ′′ = (d′3 − d 3 ) α (4.5) Where d3’ is detailed coefficient vector of watermarked audio signal and d3 is the detailed coefficient vector of original audio signal. α is a scaling parameter (secret key) used while embedding the watermark. The similarity between the original audio watermark and the extracted audio watermark is measured by computing the bit error rate (BER)[12]. 45 Original Signal Watermarked Signal 3-level Wavelet Decomposition 3-level Wavelet Decomposition Watermark extraction Audio watermark Fig 4.8 Watermark extraction BER = 100 ∑ (w ′′ ⊕ w ) M (4.6) Where M is the total number of bits used in watermark signal and α is XOR operator. 4.4. Results of high capacity covert communication technique: The algorithm is applied to a set of audio signals. The signals are mono with sampling rates 44.1Khz.The original host audio signal to be watermarked is shown in fig.4.9. The length of the host signal is 45 seconds. The watermarked signal is shown in fig.4.10. The audio signal to be embedded as a watermark is shown in fig.4.11. The length of the watermark signal is 1sec. The extracted watermark signal is shown in Fig.4.12. To evaluate the watermarked audio quality the listening test is performed. The Table 1 shows the extracting results. The SNR is 28db when watermark signal is equal to the length of the detailed coefficient vector cd3; the SNR observed in [1] is 22 db. The table shows that SNR is increased with the decrease in the length of watermark signal. It is also observed that the BER in all the cases is 0. To evaluate the performance of the proposed watermarking technique its robustness is tested. 46 Fig4.9 Original Host audio signal Fig 4.10 Watermarked audio signal Fig 4.11 Original audio used as Watermark 47 Fig 4.12 Recovered Watermark 4.4.1. Subjective Listening Test: To further evaluate the watermarked audio quality, we also performed an informal subjective listening test according to a double blind triple stimulus with hidden reference listening test [13]. The subjective listening test results are summarized in Table 4.1 under Diffgrade. The Diffgrade is equal to the subjective rating given to the watermarked test item minus the rating given to the hidden reference: a Diffgrade near 0.00 indicates a high level of quality. The Diffgrade can even be positive, which indicates an incorrect identification of the watermarked item. Table 4.1 Subjective listening test for mp3 song Sr.No Type of attack Diffgrade MPEG-1 Layer-3 128 kbps/mono 0.00 1 MPEG-1 Layer3 64 kbps/mono 0.00 2 Band pass filtering 0.00 3 Echo addition 0.00 4 Equalization 0.00 5 Cropping 0.00 6 Requantization 0.00 7 Down sampling 0.00 8 Up sampling -1 9 Time warping -2 10 Low pass filtering -3 11 The Diffgrade scale is partitioned into five ranges: imperceptible (> 0.00), not annoying (0.00 to –1.00), slightly annoying (–1.00 to –2.00), annoying (–2.00 to – 3.00), and very annoying (–3.00 to –4.00).” The number of transparent items represents the number of incorrectly identified items. Fifteen listeners participated in 48 the listening test. The quality of the watermarked audio signal was acceptable for all the test signals. 4.4.2. Robustness test To test the robustness of the scheme the watermarked signal is passed through different signal processing attacks and then watermark is recovered. Table 4.3 shows the robustness test results for various attacks. The detailed robustness test procedure is as follows: Band pass filtering: The watermarked signal is applied to the band pass filter having 100Hz and 6 KHz cutoff frequencies. Echo addition: Echo signal of a delay of 100ms and a volume control of 50% is added to the original audio signal. MPEG compression: To evaluate the robustness against data compression attack, an MPEG1 Audio Layer 3 with a bit rate of 64kbps/mono and MPEG-1 Audio Layer 3 with a bit rate of 128kbps/mono is selected. Equalization: A 10-band equalizer with the characteristics listed below is used. Frequency Hz: 31, 62, 125, 250, 500, 1000, 2000, 4000, 8000, 16000. Gain [db]: -6, +6, -6, +6, -6, +6, -6, +6, -6, +6. Requantization: The 8-bit watermarked signal is requantized to 16 bit/sample and back to 8 bit/sample. The correlation coefficient after requantization is 0.9910 and SNR is 39db. Resampling: The watermarked audio signal with original sampling rate 44100Hz has been down sampled to 22050Hz and upsampled back to 44100Hz. Then 49 the watermarked signal is upsampled with sampling rate equal to 88200 Hz and down sampled back to 44100 Hz. Cropping: 10% signal of each segment of the watermarked signal is cropped and watermark is recovered from it. The obtained correlation coefficient is 0.9884 and SNR is 37.29db. Noise addition: White noise with 15 of the power audio signal is added into the watermarked audio signal. Correlation coefficient of recovered watermark after noise addition in watermarked signal is 0.98797 SNR is 34.26db. Time warped: The signal is time scaled by 10/9 (the 6.5 sec signal is compressed for the 6 sec duration). Table 4.2 SNR of watermarked signal and BER of extracted watermark signal for various length of audio. Sr.No 1 2 3 4 5 6 The length of watermark signal in sec. 5 4 3 2 1 0.5 SNR in db BER in % 28.1 28.9 30.4 31.7 35 37.9 0.00 0.00 0.00 0.00 0.00 0.00 Table 4.3 Robustness test for mp3 song Sr.No 1 2 3 4 5 6 7 8 9 10 11 Type of attack MPEG-1 Layer-3 128 kbps/mono MPEG-1 Layer3 64 kbps/mono Band pass filtering Echo addition Equalization Cropping Requantization Down sampling Up sampling Time warping Low pass filtering BER in % for this scheme 0.0017493 0.0017877 0.0017578 0.0017591 0.0017493 0.003564 0.001728 0.003564 0.042567 0.037863 2.1 50 4.5. Summary of chapter: The results listed in Table 4.1 indicate that by the present technique it is possible to hide audio information into the host audio signal. The listening test confirms that the embedded information is inaudible and the extracted watermark signal is indistinguishable from original watermark. The BER tells the similarity between the original watermark and the extracted watermark. Experimental results listed in Table 4.3 indicate that the present watermarking technique is robust to common signal processing attacks such as compression, echo addition and equalization. While performing the robustness test it is also observed that the watermark will not withstand if the watermarked signal is passed through low pass filtering of 11 kHz, 22 kHz. 51 Chapter 5 Spread spectrum Audio Watermarking Algorithms Chapter 5 Spread Spectrum Audio Watermarking algorithms Introduction Most of the existing audio watermarking techniques embed the watermarks in the time domain / frequency domain where as there are few techniques which embed the data in cepstrum or compressed domain. Spread spectrum technique is most popular technique and is used by many researchers in their implementations [13, 14, 15, 16, 17, 18, 19, 29]. Amplitude scaled Spread Spectrum Sequence is embedded in the host audio signal which can be detected via a correlation technique. Generally embedding is based on a psychoacoustic model (which provides inaudible limits of watermark). The spread spectrum techniques can be divided in to two categories blind and non blind techniques. Blind watermarking techniques detect the watermark without using the original host signal whereas the non blind techniques use the original host signal to detect the watermark. The most blind watermarking techniques studied in chapter 2 only detects the presence of the valid watermark and not concentrate on recovering (extracting) the embedded watermark. Non blind techniques recover the watermark whereas it requires the original signal to recover the watermark. Section 5.1 provides the brief overview of conventional spread spectrum method. Section 5.2 highlights on the non blind technique suggested by Li et al [29] and implemented in the initial stage of the research to test the watermarking scheme for our database and then compare the results with our proposed schemes. The results of this implementation tested for our database are appeared in paper VI. In section 5.3 we propose the adaptive blind watermarking technique based on SNR using DWT and lifting wavelet transform. The results of these implementations are published partly in paper VII and paper VIII. To improve the imperceptibility between the original audio signal and the watermarked audio signal the adaptive SNR based scheme using DWTDCT is implemented. Section 5.4 describes this scheme the results of which are published in paper IX and Paper X. To make the system intelligent we propose to embed the watermark using cyclic coding. The scheme which encodes the watermark using cyclic codes and embeds the watermark is proposed in section 5.5. 53 5.1. Spread Spectrum watermarking: Theoretical background A general model considered [14, 15] for SS-based watermarking is shown in Figure 5.1. Vector x is considered to be the original host signal transformed in an appropriate transform domain. The vector y is the received vector, in the transform domain, after channel noise. A secret key K is used by a pseudo random number generator (PRN) to produce a chip sequence with zero mean and whose elements are equal to +σu or -σu. The use of secret key is essential to provide the security to the watermarking system. The sequence u is then added to or subtracted from the signal x according to the variable b, where b assumes the values of +1 or -1 according to the bit (or bits) to be transmitted by the watermarking process (in multiplicative algorithms multiplication operation is performed instead addition [25]). The signal s is the watermarked audio signal. A simple analysis of SS-based watermarking leads to a simple equation for the probability of error. Thus, inner product and norm is defined N −1 as [15]: 〈 x, u 〉 = ∑ x i u i & x = 〈 x, x 〉 Where N is the length of the vectors x, s, u, n, i =0 and y in Figure 5.1. Without a loss of generality, we assume that we are embedding one bit of information in a vector s of N transform coefficients. Then, the bit rate is 1=N bits/sample. That bit is represented by the variable b, whose value is either +1 or -1. Embedding is performed by s= x + b u (5.1) The distortion in the embedded signal is defined by s − x . It is easy to see that for the embedding equation (5.1), we have D = bu = u = σ u x k PRN s u y n (5.2) Correlation detector b u PRN k b Fig. 5.1 General model of SS-based watermarking 54 The channel is modeled as an additive noise channel y = s + n, and the watermark extraction is usually performed by the calculation of the normalized sufficient statistics r: r= 〈 y, u 〉 〈 bu + x + n, u 〉 = = b + c x + cn σu 〈 u, u 〉 (5.3) and estimating the embedded bit as b̂ = sign(r) , where c x = 〈 x, u 〉 u and c n = 〈 n, u 〉 u . Simple statistical models for the host audio x and the attack noise n are assumed. Namely, both sequences are modeled as uncorrelated white Gaussian random processes. Then, it is easy to show that the sufficient statistics r is also Gaussian x i ∼ N(0, σ 2x )and n i ∼ N(0, σ 2n ) , ~ Sian variable, i.e.: r ∼ N(m r , σ 2r ), m r = E[r ] = b, σ 2r = σ 2x + σ 2n Nσ 2u (5.4) Let us consider the case when b is equal to 1. In this case error occurs when r < 0 and therefore, the error probability p is given by { } p = Pr b̂ < 0 b = 1 = ⎛ mr 1 erfc ⎜⎜ 2 ⎝ σr 2 ⎛ ⎞ 1 ⎟ = erfc ⎜ ⎟ 2 ⎜ ⎠ ⎝ σ 2u N 2 σ 2x + σ 2n ( ) ⎞ ⎟ ⎟ ⎠ (5.5) Where erfc(.) is complementary error function. The equal error probability is obtained under the assumption that b = -1. A plot of that probability as a function of the SNR (in this case defined as (mr/σr) is given in Figure 5.2. For example, from Figure 5.2, it can be seen that if an error probability lower than 10-3 is needed, SNR becomes: 55 Fig. 5.2. Error probability as a function of the SNR. m r /σ r > 3 ⇒ Nσ u2 > 9(σ x2 + σ n2 ) (5.6) or more generally, to achieve an error probability p we need: ( ) Nσ u2 > 2 erfc −1 ( p ) (σ x2 + σ n2 ) 2 (5.7) Malvar et al [15] shows that one can make a trade-off between the lengths of the chip sequence N with the energy of the sequence σu2. It lets us to simply compute either N or σu2, given the other variables involved. 5.2. Adaptive SNR based non blind watermarking technique in wavelet domain The SNR based watermarking technique suggested by Li et al [29] is implemented in wavelet domain. This non blind technique is implemented here in order to compare our results with this scheme proposed by Li et al [29]. This method is non blind means it requires the original host signal to recover the watermark. This section describes the adaptive watermarking technique based on SNR. The goal of the watermarking technique is to embed (add) the cover signal (watermark) in to the host multimedia signal. The embedding process modifies the host signal by mixing function. In most of the watermarking technique mixing function is the addition of original host signal and scaled cover signal. Mathematically the function is defined as y(i) = x(i) + α w ( ι ) (5.6) 56 Where y(i) is the watermarked signal, x(i) is the original host signal, w(i) the cover signal and α scaling parameter used as the secret key. The value of α plays an important role in embedding as well as in detection process. In embedding process α is selected in such way that the watermark remains imperceptible (inaudible). The watermark detection process is exactly the reverse process. Without the knowledge of parameter α it is not possible to detect the watermark. Any statistical analysis should not leave any possibility of detecting the cover signal or the parameter α. The imperceptibility of the watermarking procedure is computed by computing the SNR between the original host signal and the watermarked signal. The SNR can be computed as SNR = 10 log ∑ x 2 (i ) ∀i ∑ ( x (i ) − y (i )) 2 (5.7) ∀i The method proposed by Wu et al [29] is implemented in wavelet domain. To embed the watermark in to host audio signal the host audio signal is divided in to smaller segments of size N. One bit of binary watermark is then added in to one segment of host audio signal. if A 3 (i) is max(A 3k (i) ) k ⎧A 3 (i) + α(k ) ⋅ w (k ) ⎪ k 3 B (i) = ⎨ kk 3 ⎪ A k (i) ⎩ otherwise (5.8) Where A 3k (i) is cd3 coefficient of 3rd level DWT of xk (i ) and xk (i ) is ith sample of kth segment of host audio signal. B3k (i) is watermarked cd3 coefficient. α(k ) is scaling parameter used to scale the watermark bit w (k ) so that the added watermark is inaudible in the audio. The SNR between the original coefficient A 3k (i) and the modified coefficient B3k (i) can be represented by 2 SNR = 10 log ∑ A 3k (i ) (5.9) ∀i ∑ (A 3k (i ) − B 3k (i ) ) 2 ∀i From formulas (5.8) and (5.9) the scaling parameter α(k) can be computed as (−SNR 10 ) . 10 ) ∑ A 3k (i )( 2 α(k ) = i w (k ) (5.10) 57 The value of w 2 (k ) is =1 because w(k) is either +1 or -1. For the threshold value of SNR the scaling parameter α(k ) required in formula (5.8) can be computed using this equation and used to embed the watermark. According to IFPI (International Federation of the Phonographic Industry) [28] the imperceptible audio watermark scheme requires at least 20 db SNR value between the watermarked signal and the host signal, so the threshold value of SNR should be considered greater than 20 for solving the formula (5.10). The watermark embedding process is shown in Fig 5.3. The host audio signal x(n) is divided into subsections of size N=2i, each subsection is decomposed into 3level Discrete wavelet transform by means of Harr wavelet transform. The Haar wavelet is used in the implementation because Haar wavelet gives perfect reconstructability which is essential feature for this application. The authors in [29] used db4 wavelet as for Daubechieves wavelets compact signal property is not mentioned as we increase the number of points in the wavelet, reconstructability is not mentioned. So we used Haar wavelet in our implementation. The detailed coefficients (cd3) of host audio signal are selected for embedding the watermark in low frequency part of highest energy of audio signal. The M1 × M2 binary image is considered as a watermark. Before embedding this 2-D watermark into host audio it is converted into its 1-D bipolar equivalent w(k) {1,-1}. Then w(k) is scaled by the parameter α(k) and One bit of w(k) is embedded into a single subsection of host audio signal. To find the imperceptibility of the system SNR between the host audio signal and watermarked signal is computed by the formula (5.7). It is observed that the SNR obtained using this technique is 43.77 db. The watermarked signal and host audio signal is played in front of the five personalities having the knowledge of music. It is observed by the all five personalities that there is no perceptual difference between the host audio signal and the watermarked audio signal. In order to proceed to detect the watermark the knowledge of scaling parameter α is very important without the knowledge of α(k) it is not possible to detect the watermark. Scaling parameter α(k) is computed during the watermark embedding process using the formula (5.10). The original signal x(n) is required to compute the value of α(k) in order to recover the watermark properly. The formula 58 required to detect the watermark is exactly the reverse of formula (5.8). To detect the watermark the x(n) and y(n) are divided into subsections of size N where N =2i. Where the value of i is same as used during the watermark embedding process. Each subsection is decomposed using 3-level wavelet transform. The watermark is then recovered as shown in Fig.5.4. Input host audio x(n) Divide host into segment of size N Find start of utterance Watermarked audio y(n) Concatenate the segments Take third level DWT of each segment Take IDWT of each segment Scaled watermark Binary Watermark Image Computation of scaling parameter α(k) for each segment Bipolar conversion Dimension mapping Fig.5.3. Watermark Embedding Process. x(n) y(n) Subsection x(n) with size=2k Subsection x(n) with size=2k DWT Original watermark α Computatio n of α(k) DWT Watermark detection Recovery of bipolar watermark by threshold comparison. 1-D to 2-D conversion Binary image Fig.5.4 Watermark extraction. 59 After detecting the presence of watermark each value is compared against the threshold value T to recover the bipolar watermark. If the value of the sample is greater than the threshold then watermark bit is recovered as 1 otherwise if value of sample is less than threshold the watermark bit is recovered as 0.The recovered watermark is 1-D signal so it is converted into the required 2-D form of size M1× M2 to recover the binary image used as watermark. In order to test the similarity between the recovered watermark and the original watermark the correlation coefficient between original watermark and recovered watermark is computed. The original host audio signal and the watermarked audio signal are shown in fig 5.5. Fig 5.5.a represents the original host audio signal and the fig 5.5.b. represents the watermarked signal. It can be observed that there is no perceptual difference between the two. The two audio signals are played in front of the five personalities to test the audible difference between them and found that there is no audible difference between the original host audio signal and the watermarked signal. The SNR is computed by using formula (5.7) and is 43.77 db. To find the similarity between the original watermark and the recovered watermark the correlation coefficient is computed and is 0.9917. Fig 5.5. a) Original host audio signal b) Watermarked audio signal. 60 a) b) Fig 5.6 a) Original Watermark b) Recovered watermark. To test the robustness of the technique the watermarked audio signal is passed through different signal processing techniques. The Gold wave software is used for editing the watermarked signal using different signal processing tools available in the software. The experimental results show that the SNR between the original signal and processed watermarked signal is above 25db in all cases. The watermark is successfully recovered after all processing and the correlation coefficient is also above the 0.88 which is shown in the table 5.1. To compute the error obtained in recovering the watermark is also computed and is presented in the table 5.1. Low pass filtering: The watermarked signal is passed through the low pass filter with cutoff frequency 11025 Hz. The watermark is recovered from the filtered signal. The results of the signal processing attacks are shown in table 5.1. Resampling: The watermarked audio signal with original sampling rate 44100Hz has been down sampled to 22050Hz and upsampled back to 44100Hz. Then the watermarked signal is upsampled with sampling rate equal to 88200 Hz and downsampled back to 44100 Hz. MP3 Compression: The watermarked audio signal is mp3 compressed at 64 kbps and is observed that the watermark resist the mp3 compression. Requantization: The 8-bit watermarked signal is requantized to 16 bit/sample and back to 8 bit/sample. Cropping: 10% signal of each segment of the watermarked signal is cropped and watermark is recovered from it. 61 Noise addition: White noise with 15 of the power audio signal is added into the watermarked audio signal. Echo addition: Echo signal of a delay of 100ms and a volume control of 50% is added to the original audio signal. Equalization: A 10-band equalizer with the characteristics listed below is used. Frequency Hz: 31, 62, 125, 250, 500, 1000, 2000, 4000, 8000, 16000. Gain [db]: -6, +6, -6, +6, -6, +6, 6, +6, -6, +6. (a) (b) (c) (g) (d) (h) (e) (f) (i) Fig. 5.7 Results for SNR based scheme for non blind detection. a)without attack b) down sampling c)up sampling d)MP3 compression e)requantization f)cropping g)low pass filtered with fc= 11025 h) Lpfiltered fc=22050 i)time warping Table 5.1 Experimental Results against Signal Processing Attacks for non blind technique (MP3 song) Sr. No 1 2 3 4 5 6 7 9 10 Attack SNR 43.77 28.56 34.26 19.73 39 37.29 41.25 Correlation Coefficient 0.9917 0.9889 0.9917 0.9917 0.9917 0.9889 0.9917 Without attack Down Sampling Up Sampling LP-filtering Requantization Cropping Mp3 compression Noise addition Echo addition Equalization BER 0.0027 0.0039 0.0027 0.0027 0.0027 0.0039 0.0027 34.26 37.4420 38.518 0.9889 0.9889 0.9889 0.0039 0.0039 0.0039 The experimental results of the developed scheme are shown in table 5.1. The SNR between the watermarked signal and the original host audio signal is computed and entered in the 2nd column of the table. Correlation coefficient between the original 62 watermark and the recovered watermark is entered in the 3rd column of the table. The BER between the original watermark and the recovered watermark is entered in the 3rd column of the table. The experimental results show that the developed technique recovers the watermark successfully after all kinds of signal processing attacks mentioned in the table. It is observed that the SNR between the original signal and the distorted signal is within the acceptable limit of 20db and is above this limit except for the LP-filtering and down sampling. The major draw back of this scheme is that it requires the original host signal and the original watermark signal in order to recover the watermark and to provide the proof of ownership. To overcome these drawbacks we modified this scheme and proposed the SNR based blind watermarking scheme. 5.3. Proposed adaptive SNR based blind watermarking using DWT /Lifting wavelet transform: This section highlights on the proposed adaptive SNR based blind watermarking scheme implemented in DWT and LWT domain. In this scheme we have made an attempt to recover the watermark signal without the help of original audio signal. We are also successful in recovering the watermark without using the scaling parameter. The scaling parameter used to embed the watermark is varied adaptively for each segment in order to achieve the imperceptibility and to take the advantage of insensitivity of HAS to smaller variations in transform domain. To provide the security to the system the secret key is used and the secret key is a PNR sequence generated by any cryptographic method. Without the knowledge of this secret key it is not possible to recover the watermark hence it is required to generate with secret initial seed. This idea of generating the pseudo random number from initial seed is borrowed from communications. Our aim in developing this scheme is to devise a system which embeds the watermark using spread spectrum method and does not require the original audio signal to recover the embedded watermark. The embedding process modifies the host signal x by mixing function f(.). f(x, w) performs the mixing of x (host multimedia signal) and w(watermark signal) with the help of secrete key k. In most of the watermarking technique [12, 19] mixing function is the addition of original host signal and scaled watermark signal mathematically represented by formula (5.1). The 63 blind watermarking technique implemented here embeds the watermark in wavelet domain using the additive formula similar to formula (5.1). B3k (i) = A 3k (i) + α(k ) ⋅ w (k ) ⋅ r (i) (5.11) Where r(i) is the permuted pseudorandom binary signal with zero mean which is the secret key of the owner, α(k) and w(k) are the scaling parameter and watermark bit to be embedded in kth segment respectively. A 3k (i) is cd3 coefficient of 3rd level DWT/LWT of xk (i ) where xk (i ) is ith sample of kth segment of host audio signal. B3k (i) is watermarked cd3 coefficient. Solving the equation (5.11) and equation (5.9) for α(k). α(k) = −SNR ⎛ 2 ∑ ⎜⎜ A 3k (i )10 10 ∀i⎝ ∑ (r 2 (i )w 2 (k )) ∀i ⎞ ⎟ ⎟ ⎠ (5.12) w(k) is the bipolar watermark which is either 1 or -1 therefore the value of w2(k) is 1. The r(i) is a pseudorandom binary signal with zero mean with value of 1 or -1 so Σ r2(i) = N. The equation (5.12) reduces to α(k) = −SNR ⎛ 2 ∑ ⎜⎜ A 3k (i ) × 10 10 ∀ i⎝ N ⎞ ⎟ ⎟ ⎠ (5.13) The audio watermarking scheme developed here computes the value of α(k) for every subsection of host audio signal using the formula (5.13) which is later on used to embed the watermark bit in each segment of the host audio using the formula (5.11). 5.3.1. Watermark embedding: To embed the watermark signal in to the host audio signal present scheme uses the additive watermarking method. The host audio signal is divided into the segments of size N= 256, 512, 1024… etc. 64 Xk(i) = X (k . N + i) i=0,1,2…..N-1, K=0,1,2….. (5.14) N=256, 1024 Where X(i) represent the original host audio signal and Xk(i) represent the kth segment of the host audio. Then each Xk(i) is decomposed to Lth level wavelet transform. The scheme is implemented using Discrete Wavelet Transform (DWT) and Lifting Wavelet transform (LWT). To embed the watermark into low frequency part of the highest energy of audio signal by taking advantage of frequency mask effect of HAS [29] the 3rd level detail part of coefficients is selected. After selecting the watermark embedding formula the 3rd approximate coefficients are modified as A 3k (i) = A 3k (i) + α(k)w(k)r(i) (5.15) Where A 3k (i) 3rd level approximate coefficient. Where r(i) is the permuted pseudorandom binary signal with zero mean which is the secret key of the owner, α(k) and w(k) are the scaling parameter and watermark bit to be embedded in kth segment respectively. The block schematic of the proposed scheme is shown in the fig. 5.8. The SNR between the original signal and the watermarked signal can be computed by using formula (5.7) to measure the imperceptibility of the watermarked signal. In this method we compute the scaling parameter α(k) for every segment of host audio signal and then embed according to the rules for bit one or bit 0 of watermark signal. This variation of α(k) for every segment takes in to account the feature of the host audio in that segment and then compute the value of α(k), which is similar to finding the scaling parameter taking into consideration the perceptual transparency of the host audio. 65 Input host audio x(n) Divide host into segment of size N Find start of utterance Watermarked audio y(n) Take IDWT/ILWT of each segment Concatenate the segments Initial Seed Binary Watermark Image Dimension mapping Take third level DWT/LWT of each segment PN sequence generation Scaled watermark Bipolar conversion Computation of scaling parameter α(k) for each segment Fig.5.8. Watermark Embedding Process for proposed adaptive SNR based blind technique in DWT/LWT domain. 5.3.2. Watermark extraction The block schematic of watermark extraction for the proposed scheme is shown in fig. 5.9. To detect the watermark from the embedded signal the 3rd level ′ DWT/LWT of the watermarked signal is computed. Then the coefficients D 3k (i) are modified by the same pseudorandom signal r(i) used while embedding the watermark. ′ s(i) = D 3k (i) r(i) (5.16) ′ Where D 3k (i) = D 3k + α(k ) w (k )r (i) and s(i) wavelet coefficients modified by r(i). ∴ s(i) = (D 3k (i) + α(k)w(k)r(i))r(i) (5.17) ∴ ∑ s(i) = ∑ D 3k (i)r(i) + ∑ α(k)w(k)r 2 (i) ∀i ∀i (5.18) ∀i The expected value of the first term in the equation (5.18) i.e. ∑ D Lk (i)r(i) is ∀i approximately equal to zero and the value of ∑ r 2 (i) is N (α(k) w(k) are independent ∀i of summation variation i ) therefore the value of the equation is approximately equal 66 to Nα(k)w(k), where N is the size of the segment. If the value of ∑ s(i) is greater than ∀i threshold the watermark bit one will be recovered and if the value of summation is less than the threshold the watermark bit 0 will be recovered. Watermarked audio y(n) Find start of utterance Divide host into segment of size N PN sequence generation Initial Seed Recovered Watermark Take third level DWT/LWT of each segment Dimension mapping Threshold comparison and bit recovery from each segment Fig.5.9 Watermark extraction for adaptive SNR based blind technique in DWT/LWT domain. 5.3.3. Experimental results This part of the section highlights on the experimental results obtained during the implementation of this scheme. The watermark signal is embedded in discrete wavelet domain and lifting wavelet domain. The host audio signals used in section 5.2 are used for the experimentation of this scheme. Three level wavelet decomposition of each segment of the host audio is performed. The subjective analysis test is conducted in a similar manner as explained in previous section of this chapter. It is observed by all five personalities that there is no audible different between the original host audio signal and the watermarked signal. The imperceptibility of watermarked signal is measure by computing the signal to noise ratio (SNR) between the original signal and the watermarked signal. The observed SNR is shown in table 5.2.a. To measure the similarity between the original watermark and recovered watermark the correlation coefficient is computed and is presented in table 5.2.b. We have also computed the BER between the original watermark and recovered watermark and is entered in the table 5.2. To check the robustness of the scheme the watermarked signal is passed through different signal processing techniques and then watermark is recovered. The watermarked signal is checked against signal processing attacks described in section 67 5.1. The results are represented in the table 5.2 are obtained for blind technique in DWT and LWT domain. The results obtained in Table 5.2 are obtained by considering SNR=0.3 in equation (5.13) and length of segment=256. The original watermark embedded in the signal is shown in fig 5.10 whereas the recovered watermark after different signal processing attack is shown in fig. 5.11 (DWT based blind technique) and fig. 5.12 (LWT based blind detection technique). Fig. 5.10 Original Watermark (a) (b) (c) (d) (e) (f) (g) (h) Fig.5.11 Results for SNR based scheme for blind detection in DWT domain a) without attack b) down sampling c)up sampling d)mp3 compression e)requantization f)cropping g) Lpfiltered h)time warping (a) (b) (c) (d) (e) (f) (g) (h) Fig.5.12 Results for SNR based scheme for blind detection in LWT domain a) without attack b) downsampling c)upsampling d)mp3 compression e)requantization f)cropping g) Lpfiltered i)timewarping 68 Table 5.2. Results after signal processing attack a) SNR between the original audio signal and watermarked audio signal after various attacks. Without attack Downsampled Upsampled Mp3compressed Requantization Cropping Lpfiltered Time warping Echo addition Equalization SNR based scheme for blind detection in DWT domain SNR 18.6194 18.3863 18.3863 18.3863 18.3863 18.4208 16.2305 18.3863 17.4780 18.3863 SNR based scheme for blind detection in LWT SNR 18.3601 18.2978 18.2978 18.2978 18.2978 18.3316 16.2178 18.2978 17.1564 18.2978 b) Correlation coefficient between original watermark and recovered watermark Without attack Downsampled Upsampled Mp3compressed Requantization Cropping Lpfiltered Time warping Echo addition Equalization SNR based scheme for blind detection in DWT Correlation BER coefficient 0.9972 0.0027 0.9972 0.0027 0.9972 0.0027 0.9972 0.0027 0.9972 0.0027 0.9972 0.0027 0.9972 0.0027 0.9972 0.0027 0.9972 0.0027 0.9972 0.0027 SNR based scheme for blind detection in LWT Correlation BER coefficient 0.9862 0.0039 0.9862 0.0039 0.9862 0.0039 0.9862 0.0039 0.9862 0.0039 0.9862 0.0039 0.9531 0.0089 0.9862 0.0039 0.9862 0.0039 0.9862 0.0039 From the results presented in table 5.2 it is clear that the blind techniques implemented in discrete wavelet domain and lifting wavelet domain are robust against the various signal processing attacks such as resampling, requantization, MP3compression, L.P. filtering and cropping. From the comparison of the results presented in table5.2 and table 5.1 one can say that non blind technique has better SNR compared to the blind detection scheme whereas it requires the original audio signal to recover the watermark. The correlation coefficient test between the original watermark and recovered watermark indicate that DWT/LWT based blind detection techniques are robust with little difference in the recovery. Hence forth we have concentrated on DWT domain only and not considered the LWT as it produces 69 similar results. The observed SNR between the original host signal and the watermarked signal even after signal processing attacks is less than 20 during the experimentation results obtained in table 5.2. This is because of the parameter α(k) calculated from (5.13) by considering SNR=30 the lowest threshold requirement. As explained in the next subsection these imperceptibility results can be improved with increase in SNR consideration while computing α(k). 5.3.4. Selection criteria for value of SNR in computing α(k) and selection criteria for segment length N To increase the imperceptibility of the watermarked audio signal, we tried varying the SNR requirement for different values of parameter α(k). We varied the SNR from 30 to 80 and observed the results. We have also varied the segment length and observed the performance of the system. The observed results of these variations are shown in table 5.3. Table 5.3 Relation between segment length, parameter α(k), observed SNR and correlation coefficient Sr. Assumed SNR between Segment Observed SNR Corr. No. wavelet coefficients in (5.11) length between host audio Coeff. and watermarked for calculation of α(k) audio 01 30 512 16.3843 1 256 18.6826 1 128 21.5655 1 02 40 512 18.1608 1 19.6560 1 256 128 22.5764 1 03 50 512 19.0204 1 256 20.6227 1 128 24.5634 1 04 60 512 19.3452 1 256 21.6064 1 128 28.9675 1 05 70 512 19.6852 1 256 22.5887 0.9917 128 25.1256 0.9863 0.6 80 512 20.7438 0.9917 256 23.0733 0.9863, 128 27.7826 0.9684 From the table 5.3 it is clear that SNR can be increased by two ways i) By increasing the SNR value in expression (5.13) ii) By reducing the segment length. For the first case increase in SNR increases the imperceptibility of the watermarked signal 70 whereas reduces the robustness. For the second case i.e reduction in segment length improves the SNR but reduce the robustness. The optimized results can be obtained considering all the contradictory requirements of watermarking by selecting segment length 256 and SNR in the range 40 – 60. Therefore while obtaining the results of further implementation of spread spectrum watermarking the SNR is selected as 60 and segment length is considered as 256. 5.4. Proposed Adaptive SNR based spread spectrum scheme in DWT-DCT domain: This section highlights on the proposed SNR based spread spectrum technique of audio watermarking which adaptively select the embedding strength to embed the watermark. This scheme embeds the watermark by first taking the 3rd level DWT of the host audio signal and then computing the DCT of the low pass filtered DWT coefficients. DWT-DCT transform is used to embed the watermark in low-middle frequency components of host audio in order to increase the imperceptibility of the watermarking scheme as compared to the scheme implemented in previous section. Using DCT of the Ca3 coefficients, we can exactly track the required frequency band to embed the watermark for better imperceptibility. The spread spectrum watermarking techniques [2, 3, 9] modifies the host multimedia signal using the mathematical function defined by (5.1) The scheme proposed here first divide the host audio signal into smaller segments of size N. Then it computes 3 rd level DWT of each segment. Then the ca3 (Low pass filtered coefficients) coefficients are selected for embedding the watermark and take the DCT coefficients of ca3 (3rd level DWT coefficients of host audio) coefficients to add the watermark bit using the formula (5.19) described below. ′ A 3k (i) = A 3k (i) + α(k ) ⋅ r (i) ⋅ w (k ) (5.19) where r(i) is the permuted pseudorandom binary signal of length L with zero mean which is the secret key of the owner. The length L is equal to the length of the 3rd level DWT-DCT coefficients. α(k) is the adaptive scaling parameter computed for kth segment in which watermark is required to be added and w(k) the watermark bit to ′ be embedded in kth segment. A 3k (i) is the DWT-DCT coefficient of the watermarked signal for kth segment and A 3k (i) is the DWT-DCT coefficient of the original host 71 signal for kth segment. Solving the equation (5.19) and equation (5.9) for α(k). ⎛ A 3 2 (i) ⎞(10 )−(SNR /10) ⎜∑ k ⎟ ∀i ⎝ ⎠ α(k ) = 2 ∑ (r (i ) w 2 (k )) ∀i (5.20) w(k) is the bipolar watermark which s either 1 or -1 therefore the value of w2(k) is 1. The r(i) is a pseudorandom binary signal with zero mean with value of 1 or -1 so Σ r2(i) = L. The equation (5.13) reduces to ⎛ A 3 2 (i) ⎞ −(SNR /10) ⎟10 ⎜∑ k ∀i ⎠ ⎝ α(k ) = L (5.21) The audio watermarking scheme developed here computes the value of α(k) for every subsection of host audio signal using the formula (5.21) which is later on used to embed the watermark bit in each segment of the host audio using the formula (5.19). The value of SNR to be used in computing α(k) is required to be selected by user and should be in the range 40 – 60 dB as proved in section 5.3.4. Even though 2 the selected value of SNR is fixed is same for all segments, ∑ A 3k (i) will change for ∀i each segment and hence α(k) will change for each segment. 5.4.1. Watermark embedding The process of the proposed watermark embedding is shown in the block schematic of fig. 5.13. To embed the watermark signal in to the host audio signal present scheme uses the additive watermarking method. The host audio signal is divided into the segments of size N= 256, 512, 1024… etc x k (i ) = x(k .N + i ) i=0,1,2…….N-1, k= 0,1,2……. (5.22) Where N=256,1024 X(i) represent the original host audio signal and Xk(i) represent the kth segment of the host audio. Then each Xk(i) is decomposed to 3rd level wavelet transform. To embed the watermark into low frequency part of the highest energy of audio signal the 3rd level approximate part of coefficients is selected and DCT of it is obtained. Then watermark is embedded by selecting the low frequency DCT coefficients. The DWT-DCT coefficients are modified as 72 ′ A 3k (i) = A 3k (i) + α(k ) ⋅ w (k ) ⋅ r (i) (5.23) Where A 3k (i) is DCT coefficient of ca3 coefficient of 3rd level DWT of xk (i ) where r(i) is the permuted pseudorandom binary signal with zero mean which is the secret key of the owner, α(k)and w(k) are the scaling parameter and watermark bit to ′ be embedded in kth segment respectively and A 3k (i) are the watermarked coefficients. The SNR between the original audio signal and the watermarked audio signal can be computed by using formula (5.7) to measure the imperceptibility of the watermarked signal. Input host audio x(n) Divide host into segment of size N Find start of utterance Watermarked audio y(n) Concatenate the segments Initial Seed Binary Watermark Image Dimension mapping Take third level DWT of each segment DCT transform the approximate coeff. of DWT Take IDCT and then IDWT of each segment PN sequence generation Scaled watermark Bipolar conversion Computation of scaling parameter α(k) for each segment Fig.5.13. Watermark Embedding Process for proposed adaptive SNR based blind technique in DWTDCT domain. 5.4.2. Watermark extraction The watermark extraction procedure of the proposed scheme is depicted in the fig 5.14. To extract the watermark from the embedded signal the 3rd level DWT of the ′ watermarked signal is computed. Then the DCT of coefficients A 3k (i) are modified by the same pseudorandom signal r(i) used while embedding the watermark. ′ s(i) = A 3k (i) ⋅ r (i) (5.24) 73 ′ Where A 3k (i) = A 3k (i) + α (k ) ⋅ w (k ) ⋅ r (i) and s(i) low frequency DCT coefficients modified by r(i). ∴ s(i) = (A 3k (i) + α(k ) ⋅ w (k ) ⋅ r (i))r (i) (5.25) ∴ ∑ s(i) = ∑ (A 3k (i) + α(k ) ⋅ w (k ) ⋅ r (i))r (i) ∀i ∀i ∴ ∑ s(i) = ∑ A 3k (i)r (i) + ∑ α(k ) ⋅ w (k ) ⋅ r 2 (i) ∀i ∀i (5.26) ∀i The expected value of the first term in the equation (5.26) i.e. ∑ A 3k (i)r (i) is ∀i approximately equal to zero and the value of ∑ r 2 (i) is N (α(k) ⋅ w(k) are independent ∀i of variation i) therefore the value of the equation is approximately equal to N α(k)w(k), where N is the size of the segment. If the value of ∑ s(i) is greater than ∀i threshold the watermark bit one will be recovered and if the value of summation is less than the threshold the watermark bit 0 will be recovered. Watermarked audio y(n) Find start of utterance Divide host into segment of size N PN sequence generation Initial Seed Recovered Watermark Take third level DWT-DCT of each segment Dimension mapping Threshold comparison and bit recovery from each segment Fig.5.14 Watermark extraction for adaptive SNR based blind technique in DWT-DCT domain. 5.4.3. Experimental Results This section highlights on the experimental results obtained during the implementation of the scheme. The results are obtained with the same set of audio signals used in previous sections. Three level wavelet decomposition of each segment of the host audio is performed. Then the low frequency coefficients (ca3) are selected to embed the watermark. The watermark is embedded by modifying the low frequency DCT coefficients of the ca3 coefficients as per equation (5.19). The α(k) 74 used in the equation (5.19) to add the watermark bit w(k) is computed using equation(5.21) and the value of SNR considered in the equation(5.21) is 30 dB. The subjective analysis test proved that there is no perceptible difference between the host audio and watermarked audio. The signal to noise ratio (SNR) between the host audio signal and watermarked audio signal is computed using equation (5.9) and is shown in table 5.4. The BER between the original watermark and recovered watermark is computed and is presented in table 5.4. To check the robustness of the scheme the watermarked signal is passed through different signal processing technique and then watermark is recovered. The results are represented in the table 5.4. The original watermark embedded in the signal is shown in fig 5.15 a, whereas the recovered watermark after different signal processing attack is shown in fig 5.15 b - fig 5.15 h. Table 5.4 SNR between the original audio signal and watermarked audio signal, BER of recovered watermark SNR 31.3456 18.4872 18.5820 31.3456 31.3456 25.4691 17.8538 16.6834 31.3456 31.3456 Without attack Downsampled Upsampled Mp3compressed Requantization Cropping Lpfiltered fc=22050 Time warping -10% Echo addition Equalization a) Without attack d)Mp3compress BER 0 0.4531 0.0303 0 0 0.0162 0 0.0137 0 0 b) Down Sampling c) Upsample e) Requantized f) Cropped g) Time warping h) Lpfiltered22 75 Fig 5.15 Results for SNR based scheme for blind detection in DWT-DCT domain From the results presented in table 5.4 it is clear that the techniques implemented here is robust against the various signal processing attacks such as resampling, requantization, mp3compression and cropping. It is also observed that the technique is robust against LP filtering with cutoff frequency of 22 KHz. From the results presented in table 5.4 it is clear that SNR between the original audio and watermarked audio after different signal processing satisfies the audio watermarking requirements provided by IFPI [28]. The BER (bit error rate) test between the original watermark and recovered watermark indicate that the technique is able to recover the watermark after different signal processing attacks. The algorithm is tested for 1000 different keys and the PDF (probability density function) of SNR is shown in fig 5.16. The PDF of SNR in fig 5.16 computed for 1000 different keys indicate that the SNR between original signal and watermarked signal is always greater than 28.67. That is the performance of the proposed algorithm is very good. It always provides better imperceptibility between the original signal and watermarked signal and the imperceptibility is not dependent on the key used to embed the watermark. Watermark cannot be recovered without the knowledge of the key used. Fig 5.16 PDF of SNR 76 5.5 Comparison chart of spread spectrum audio watermarking technique implemented in this chapter. Non blind SNR based scheme Without attack Downsampled Upsampled Mp3compressed Requantization Cropping Lpfiltered Time warping 10% Echo addition Equalization SNR in db 52.2863 28.5660 52.2859 52.2859 28.5660 28.9923 20.3328 11.4711 0.0078 0.0078 0.0078 0.0078 0.0078 0.0107 0.0078 0.0078 Proposed blind technique in DWT SNR in BER db 21.6109 0 21.6109 0 21.6109 0 21.6109 0 21.6109 0 21.6102 0.0068 18.8769 0.0009 11.1957 0.0078 28.2335 28.2335 0.0127 0.0088 17.4780 19.1299 BER 0.0156 0 Proposed blind technique in LWT SNR in BER db 21.2621 0 21.2621 0 21.2621 0 21.2621 0 21.2621 0 20.3316 0.0078 16.2178 0.0029 10.2965 0.0089 Proposed blind technique in DWT-DCT SNR in BER db 31.7109 0 31.7109 0 31.7109 0 31.3456 0 31.3456 0 31.7056 0.0156 27.9929 0 20.0535 0.0186 17.1564 18.2978 16.9394 19.1391 0.0163 0 0.0049 0 The results of all the SNR based schemes implemented in this chapter are summarized in table 5.5. These results are obtained for the segment length=256 and SNR=60 required in computation of α(k). It is clear from the table that the schemes implemented here are robust against the various signal processing attacks with little variation in the BER. Though the SNR in blind detection using DWT/LWT method is less compare to the non blind technique, there is no audible difference observed between the original host audio signal and the watermarked audio signal. By implementing the method in DWT-DCT domain the imperceptibility is increased. The watermark is robust except for time warping, cropping and echo addition. To further improve the robustness and we propose to implement the technique using cyclic coding. 5.5. Proposed SNR based blind technique using cyclic coding To improve the detection performance and to make the system intelligent we propose to encode the watermark using cyclic coder before embedding in to the host audio signal. The block schematic of this scheme is shown in fig 5.17 the watermark to be embedded is first encoded by Cyclic encoder and then it is embedded in to the host signal. At the decoder side the recovered watermark is decoded using cyclic decoder and then it is recovered. The robustness results obtained using this method is presented in the table 5.6. From the table it is clear that the detection performance is significantly increased even in 10% time scaling, echo addition and low pass filtering. 77 Key Key Attack Watermark encoder Host Audio Watermark Decoder Encoding using cyclic codes Encoding using cyclic codes Watermark Dimension mapping Dimension mapping Recovered Watermark Fig 5.17 Improved encoder and decoder for blind watermarking using cyclic coding. Table 5.6 results obtained for (6,4) cyclic codes Without attack Downsampled Upsampled Mp3compressed Requantization Cropping Lpfiltered fc=22050 Time scaling -10% Echo addition Equalization SNR 32.2044 32.2044 32.2044 32.2044 32.2044 30.2458 29.1862 21.2377 16.9391 21.0204 BER 0 0 0 0 0 0.0110 0 0.0029 0.0039 0 Table 5.7 results obtained for (7, 4) cyclic codes Without attack downsampled upsampled Mp3compressed requantization cropping Lpfiltered fc=22050 Time scaling -10% Time scaling +10% Echo addition Equalization SNR 28.5381 28.3982 28.3982 28.3982 28.3982 22.3475 26.9378 23.5560 21.2156 16.0376 20.8502 BER 0 0 0 0 0 0.0094 0 0 0.0018 0.0021 0 It is clear from the table 5.6and 5.7 that the encoder and decoder model proposed using cyclic coding is more robust compare to all the techniques proposed. 78 With small sacrifice in imperceptibility test the (7, 4) cyclic coder provides the best robustness test as our main aim is to embed the watermark which sustain all kinds of attacks and recover the watermark successfully. 5.6. Summary of chapter: Adaptive SNR based schemes are proposed in this chapter. The main goal of proposing these methods is to propose the blind watermark techniques which are robust against various signal processing attacks. From the comparison of the results presented in table 5.5 one can say that non blind technique has better SNR compared to the blind detection scheme, but it requires the original audio signal to recover the watermark. It can also be observed that blind techniques are robust against various signal processing attacks. The scaling parameter used to embed the watermark is varied adaptively for each segment in order to achieve the imperceptibility and to take the advantage of insensitivity of HAS to smaller variations in transform domain. To provide the security to the system the secret key is used which is a PNR sequence generated by any cryptographic method. Without the knowledge of this secret key it is not possible to recover the watermark. The DWT-DCT based blind detection technique is more robust than DWT/LWT based blind detection technique and is more imperceptible. The SNR between original signal and watermarked signal is improved by DWT-DCT technique. The multiresolution characteristics of discrete wavelet transform (DWT) and the energy-compression characteristics of discrete cosine transform (DCT) are combined to improve the transparency of digital watermark [26]. By computing the DCT of 3-level DWT coefficients we take advantage of low-middle frequency components to embed the information. To further improve the detection convergence and to make the system more secure the idea of encoding the watermark using cyclic coding is proposed due to this the attacker will not be able to understand the statistical behavior of the embedded watermark and will also be able to correct the one bit error. The results obtained for watermark robustness using cyclic encoder are better than the methods not using any such encoder. 79 Chapter 6 Adaptive watermarking by GOS modifications Chapter 6 Adaptive watermarking by modifications of GOS Introduction The audio watermarking techniques can be categorized into three main groups: SS-based coding [12-18], echo-hiding [4], and phase coding [30]. Although the SS method is the most popular in the literature, the pseudorandom sequence used for watermarking may be audible to human ears; even though its power is low. However, two distinct time-offsets for “1” and “0” in echo hiding methods frequently cause the watermarks to be audible and require tradeoffs to be made with the echo volume (reducing the robustness). The transform-domain (Fourier transform, discrete cosine transform, subband, or cepstrum) methods: 1) take advantage of humans’ insensitivity to phase variation; 2) are compatible with techniques used in audio compression; and 3) involve little variation in transform coefficients due to disturbances. The phasecoding method may be sensitive to misalignment of reference points due to the phase shifts that it theoretically causes. However, controlling and predicting variations in the time-domain audio waveform are difficult when disturbances are added directly in the transform domain. The method tackled in this chapter directly modifies the audio waveform to embed watermarks. Unlike conventional algorithms, which add modulated and scaled PN-sequences or other disturbing patterns, no basic form of the watermark signal is set in advance. Instead, segments of host audio signals are modified into the “1” or “0” state based on the principle of differential amplitudes. Patchwork algorithms work in a similar manner. The watermark disturbance is actually strongly related, or similar to, the host signal itself. It is just like an echo signal with a zero time delay, thus yielding a high degree of watermark transparency. Besides, the proposed embedding process is performed subject to the frequency-masking test of watermark signals so that disturbances of the host audio signal are beyond the ability of human ears to perceive. The first section of this chapter gives the overview of audio watermarking technique [23] which modifies the group of samples (GOS) to embed the watermark in time domain. The section second introduces proposed adaptive audio watermarking technique which modifies the GOS in transform domain. The third section provides the results obtained by implementing the proposed scheme in DWT/DWTDCT/DCT domains. The parts of the results of this scheme are published in paper XI. 81 6.1. Introduction to Audio watermarking technique based on GOS modification in time domain: Conventional watermarking algorithms can be categorized as operating on individual samples or a GOS [23]. The former are preferred for high embedding capacities (one sample used for hiding one watermark message), but suffers from potentially heavy and uncontrolled noisiness. The latter often modify samples in a group such that the considered GOS will have a particular feature value and any individual corruption of the watermark in a sample does not necessarily cause a wrong decision concerning watermark retrieval. The method explained in this chapter is proposed by Lie et al [23] and is implemented by the authors in time domain. As the modifications done in time domain directly affect the perceptual quality of the signal and is immune to the noise disturbances. So we have proposed the similar kind of technique in transform domain. Our main goal is to propose the intelligent encoder and decoder model of audio watermarking which is adaptive, blind and robust keeping in mind the imperceptibility To implement the algorithm using GOS modification one “state”, or one feature value, can be specified to embed each type of data (for example, “1” or “0”). Keeping sufficient distance between the “0” and “1” states will typically enhance robustness against pirate attacks. The main point of efficient watermarking is to optimize audio quality for a given distance between two states. First, the GOS and binary states are defined as follows. Definition 1: The complete audio signal is partitioned into consecutive GOS’s, each containing three nonoverlapping sections of samples. These three sections have equal or unequal lengths L1, L2 and L3, respectively, and denoted as sec_1, sec_2, and sec_3, as shown in Fig. 1. Hence, a GOS contains samples. Definition 2: One watermark message represents one binary bit of value 0 or 1, embedded in one GOS. Definition 3: Average of Absolute Amplitudes (AOAA) is chosen as the feature for each section of samples. Embedding a watermark message depends on the differential AOAA relations among sec_1, sec_2, and sec_3. AOAA items are computed as E i1 = 1 L1 −1 ⋅ ∑ f (L ⋅ i + x ) L1 x =0 (6.1) 82 1 L1 + L 2 −1 ⋅ ∑ f (L ⋅ i + x ) L 2 x =L1 L −1 1 ⋅ ∑ f (L ⋅ i + x ) E i3 = L 3 x = L1 + L 2 Where i represents the GOS index i=0, 1, 2, 3…………….. Ei2 = (6.2) (6.3) Definition 4: After E i1 , Ei 2 and E i3 are sorted, they are renamed as E max , E min and Emid , according to their computational value respectively. The differences between them are computed to obtain A = E max − E mid (6.4) B = E mid − E min (6.5) The state “1” represents A ≥B and otherwise (B<A) state is “0”. 6.1.1. Rules of watermark embedding The proposed embedding scheme is based on the following rules. To embed watermark bit “1”: If (A-B ≥ THD1), then no operation is performed. Else increase E max and decrease Emid by the amount δ, so that the above condition is satisfied. To embed watermark bit “0”: If (B-A ≤ THD1), then no operation is performed. Else increase Emid and decrease E min by the same amount δ so that the above condition is satisfied. The above rules state that when the status of a GOS does not confirm to the state definition (subject to a threshold), the AOAA values of two selected sections will be modified so that the GOS’s status is changed into a conformable one. 6.1.2. Watermark extraction: The algorithm for watermark retrieval is simple and straightforward. Assuming that the start point of data embedding has been recognized and the section lengths L1, L2 and L3 are known, every three consecutive sections of samples are grouped as a GOS and examined to extract the watermark. The AOAA values E′i1 , E′i 2 and E′i3 are computed for the ith GOS, as in (6.1)–(6.3) E′i1 = 1 L1 −1 ⋅ ∑ f ′(L ⋅ i + x ) L1 x =0 (6.6) 83 1 L1 + L 2 −1 (6.7) ⋅ ∑ f ′(L ⋅ i + x ) L 2 x = L1 L −1 1 (6.8) ⋅ ∑ f ′(L ⋅ i + x ) E′i 3 = L 3 x = L1 + L 2 where f ′(x ) is the watermarked signal E′i1 , E′i 2 and E′i3 are then ordered to yield E′max , E′i 2 = E′min and E′mid . Their differences are A′ = E′max − E′mid (6.9) B′ = E′mid − E′min (6.10) Comparing A′ and B′ yields the retrieved bit “1” if A′ ≥ B′ , and “0” if A′ < B′ . This process is repeated for every GOS to determine the entire embedded bit stream. Clearly this scheme is blind meaning that the watermarks can be recovered without using the original host audio signal. 6.2. Proposed adaptive watermarking using GOS modifications in transform domain: The scheme proposed in this section is the modified version of the scheme highlighted in section 6.1. As stated earlier our main goal is to embed the watermark in such way that it sustain all kinds of attack and also provide good imperceptibility. The scheme in section 1 modifies the GOS in time domain. Practically, linear amplification or attenuation is not the only way to modify AOAA, but the simplest way to retain signal waveforms and to alleviate degradations that result from random disturbing noise. The underlying embedding scheme addressed above is actually not feasible due to signal discontinuities after amplitude scaling is performed near section boundaries. (Notably, only two out of the three sections are scaled and adjacent sections may have different scale factors.) These discontinuities will then cause “click” sounds that are perceivable to human ears. This problem is solved by adopting a progressive scaling scheme to make the audio waveform continuous and smooth or modifying the GOS in transform domain. We have concentrated on modifying the samples in transform domain. In the above scheme only two out of the three sections are scaled and adjacent sections have different scale factors. In the proposed scheme according to the embedding rule the entire GOS is modified. Transform domain amplitude variations 84 and modification of entire GOS in the proposed scheme avoids discontinuities and improve the SNR between the original host audio and watermarked audio. 6.2.1. Proposed blind watermarking using GOS modification in DWT domain: The method we propose in this subsection is implemented in DWT domain. First the complete audio signal is partitioned into consecutive GOS each containing two non overlapping sections of equal/ unequal length. To implement cryptographic method the length of sections can be kept unequal of size L1, L2 selected randomly for each GOS based on cryptographic key. Hence a GOS contains L = L1 + L2 samples. One watermark message represents one binary bit of value 0 or 1, embedded in one GOS. To embed one watermark message the mean of two sections of each GOS is considered as a feature space and is modified. To embed the watermark signal in to the host audio signal the host audio signal is first partitioned into the GOS’s of size L. x k (i) = x(k.L + i) i=0,1,2…….L-1, k=0,1,2……. (6.11) Where L=256, 512, 1024…etc. x(i) represent the original host audio signal and xk(i) represent the kth segment of the host audio. Decompose each audio segment into 3-level discrete wavelet transform using Haar wavelet. Modify the low frequency coefficients (ca3) to embed the watermark according to the rule defined below Then define A = mean(ca 3 k (i)) 0 ≤ i ≤ L1 − 1 (6.12) and B = mean(ca3 k (i)) (6.13) L1 ≤ i ≤ L 2 + L1 - 1 Where L1 the length of first subsection and L2 is the length of second subsection of DWT coefficients. The lengths L1 and L2 are required to be selected by cryptographic method. N the length of DWT coefficients equal to L/23 should be greater than L 2 + L1 . To embed the watermark bit 1 if A≤B then no operation is performed and if A>B then decrease A and increase B by the amount δ so that the condition A≤B satisfies. To embed the watermark bit 0 if A>B then no operation is performed and if A<B then increase A and decrease B by the amount δ so that the condition A>B 85 satisfies. The Block schematic to embed the watermark using GOS modifications is shown in Fig 6.1. Input x Segmentation DWT transformation W Selection of length L1 & L2 Dimension mapping Scaling parameter K Computation of mean A & mean B Watermark insertion IDWT transformation Watermarked signal Concatenation of Segments Fig 6.1 Block schematic of GOS based watermark embedding in DWT domain. As per the above rules of watermark embedding if the status of GOS does not confirm the state definition then the sections will be modified so that GOS status is changed to the required condition. The amount δ used to modify the sections can be selected by using following expression. δ≥ (A − B) × K (6.14) 2 where the parameter K is selected in such a way that the modified signal remains imperceptible. We have tested the performance of the system for various values of K and then propose the range of the K in which we get the optimized performance of the system. To detect the watermark from the embedded signal the DWT transformation of the each GOS of watermarked signal is performed. Then the parameters A´ and B´ are computed as explained in previous section. A′ = mean(ca ′3k (i)) 0 ≤ i ≤ L1 − 1 (6.15) B′ = mean(ca ′3 k (i)) (6.16) L1 ≤ i ≤ L1 + L 2 − 1 86 Where ca ′3k (i) are 3rd level DWT coefficients of kth GOS of watermarked signal. A´ and B´ yields the retrieved bit as 1 if A´≤ B´ and 0 if A´ >B´. This process is repeated for every GOS to determine the entire stream of embedded watermark. The detailed procedure of watermark extraction is depicted in the block schematic of fig 6.2. This proposed scheme is blind because it does not require the original host signal to recover the embedded watermark. To measure the similarity between the retrieved signal and the original signal the bit error rate (BER) is computed. BER = w ⊕ w r (6.17) ⊕ is X-OR operation between original watermark w and retrieved watermark wr. Watermarked Segmentation signal DWT transformation Selection of length L1 & L2 Recovered Concatenation watermark and Dimension ŵ recovery Computation of mean A′ & mean B′ Watermark bit recovery Fig 6.2 Block schematic of GOS based watermark extraction in DWT domain. Selection criterion of K We have tested the performance of the system by varying the value of K from 0.5 to 3 and observed the imperceptibility parameter SNR and Robustness parameter BER. The graphs of SNR Vs K and BER Vs K are shown in fig. 6.2. The SNR between the original signal and the watermarked signal can be computed by using formula (5.2) to measure the imperceptibility of the watermarked signal and the BER is computed by expression (6.17). 87 (a) SNR Vs k without attack (b) BER Vs K Fig 6.3 (a) Variation in SNR with factor k (b) BER Vs K Fig.6.3.(a) shows the variation in SNR for various values of K without attacking the watermarked signal. It is clear from the fig that the SNR decreases exponentially with increase in value of K whereas as for smaller values of K though SNR is increased the BER increases. Plot in fig 6.3 b is showing the variations in BER for Harmonium signal as Harmonium signal is very immune to noise compare to other musical signals because of its higher frequency than the other signals. From the observations of fig 6.3 (a) and 6.3 (b) it is clear that the optimized value of K is obtained from 0.75 to 1.5 because in this range BER is in between 0.05 to 0.04. 88 Experimental results: The results obtained for this technique implemented in DWT domain are presented in table 6.1. The results are obtained for various Indian musical signals like Song, Tabla, Harmonium, Bhimsen Joshi’s Classical music. These signals are in windows media format and are PCM coded with sampling rate of 44100 Hz, 8 bit single channel, 6.5 sec signals. These results are obtained for segment length = 256, for simplicity of implementation we have considered L1=L2= 16. To test the robustness of the scheme the watermarked signal is passed through common signal processing attacks and are summarized in table 6.1, the recovered watermarks obtained from watermarked version of ‘Tabla’ are shown in fig. 6.4.a to 6.4.g. (a)Without attack (b) Down-Sampling (d)Mp3compress (h) Lpfiltered22 (f) Cropped (c) Up-sample (e) Requantized (g) Time warping Fig. 6.4 Results of robustness test recovered watermark in DWT domain TABLE 6.1 SNR BETWEEN THE ORIGINAL AUDIO SIGNAL AND WATERMARKED AUDIO SIGNAL, BER OF RECOVERED WATERMARK Song.wav SNR Without attack 31.4344 Downsampled 31.5231 Upsampled 31.1987 Mp3compressed 31.4344 Requantization 31.4344 Cropping 31.4344 Lpfiltered 29.7558 fc=22050 Time warping 24. 8741 10% Echo addition 18.7530 Equalization 19.9434 BER 0 0.0098 0.0098 0.0098 0.0098 0.0154 0.0210 Tabala.wav SNR BER 31.1032 0 30.4534 0.0184 30.0056 0.0184 31. 1032 0.0039 31. 1032 0.0039 31.2989 0.0195 30.5924 0.0214 Bhimsenbahar.wav SNR BER 31.5349 0 30.7345 0.0103 30.7345 0.0103 31. 5349 0.0103 31. 5349 0.0103 31.1489 0.0242 30.6054 0.0542 Harmonium.wav SNR BER 38.9913 0.0025 36.6782 0.0157 36. 6782 0.0157 38. 9913 0.0157 38. 9913 0.0157 37.7654 0.0285 36. 6782 0.0801 0.0838 24.7654 0.0894 21.6434 0.0929 31.6576 0.3164 0.0465 0.0386 22.8352 23.3258 0.0953 0.0402 17.0064 18.5831 0.0387 0.0362 16.5880 27.0980 0.1088 0.0456 89 6.2.2. Proposed blind watermarking using GOS modification in DCT domain. This section highlights on the implementation of the scheme in DCT domain. To embed the watermark into low frequency part of audio signal by taking advantage of frequency mask effect of HAS the DCT of each GOS is obtained. Each xk(i) is DCT transformed. Then the DCT coefficients of each GOS are sub sectioned into two non overlapping sections of equal /unequal length. The watermark embedding and extraction for each GOS is done as explained in the previous section. In place of computing DWT in this method DCT of each GOS is computed and then the modifications for embedding each bit is done according to the rule defined in previous section. To embed the watermark signal in to the host audio signal, the host audio signal is first partitioned into the GOS’s of size L using equation (6.11). Compute the DCT of each GOS xk(i) to obtain DCT coefficients Ck(i). To embed the watermark states define the mean A and B similarly defined in (6.12) and (6.13). A = mean(C k (i)) 0 ≤ i ≤ L1 − 1 (6.18) B = mean(C k (i)) 0 ≤ i ≤ L 2 + L1 − 1 (6.19) Where L1 the length of first subsection and L2 is the length of second subsection of DCT coefficients. To test the imperceptibility of watermarked signal the signal to noise ratio (SNR) between the original signal and the watermarked signal is computed using equation (5.2). To measure the similarity between the original watermark and recovered watermark the BER is computed using (6.17). The observed results during experimentation are presented in table 6.2. To embed the watermark bit 1 if A≤B then no operation is performed and if A>B then decrease A and increase B by the amount δ so that the condition A≤B satisfies. To embed the watermark bit 0 if A>B then no operation is performed and if A<B then increase A and decrease B by the amount δ so that the condition A>B satisfies. The Block schematic to embed the watermark using GOS modifications is shown in Fig 6.5 and the schematic to extract the watermark is shown in fig 6.6. 90 Input x Segmentation DCT transformation W Selection of length L1 & L2 Computation of mean A & mean B Dimension mapping Watermark insertion Scaling parameter K IDCT transformation Watermarked signal Concatenation of Segments Fig 6.5 Block schematic of GOS based watermark embedding in DCT domain. Watermarked Segmentation signal DCT transformation Selection of length L1 & L2 Recovered Concatenation watermark and Dimension ŵ recovery Computation of mean A′ & mean B′ Watermark bit recovery Fig 6.6 Block schematic of GOS based watermark extraction in DCT domain. To detect the watermark from the embedded signal the DCT transformation of the each GOS of watermarked signal is performed. Then the parameters A´ and B´ are computed as explained in previous subsection. A′ = mean(C′k (i)) 0 ≤ i ≤ L1 − 1 (6.20) B′ = mean(C′k (i)) 0 ≤ i ≤ L 2 + L1 − 1 (6.21) C′k (i) are DCT coefficients of kth segment of watermarked signal. A´ and B´ yields the retrieved bit as 1 if A´≤ B´ and 0 if A´ >B´. This process is repeated for every GOS to determine the entire stream of embedded watermark. To check the robustness of the scheme the watermarked signal is passed through different signal processing technique such as low pass filtering, resampling, requantization, mp3compression, time scaling, cropping etc. The results are represented in the table 6.2. The original watermark embedded in the signal is shown in Fig 6.7 a, the recovered watermark after various signal processing attacks is shown 91 in Fig 6.7 b to Fig. 6.7 g. These results are obtained for Tabla.wav. The algorithm is tested for various Indian musical signals like Hindi song, Tabla, Harmonium and Bhimsen Joshi’s classical. (a)Without attack (b) Down-Sampling (d)Mp3compress (h) Lpfiltered22 (f) Cropped (c) Up-sample (e) Requantized (g) Time warping Fig. 6.7 Results of robustness test recovered watermark signal DCT domain. TABLE 6.2 SNR BETWEEN THE ORIGINAL AUDIO SIGNAL AND WATERMARKED AUDIO SIGNAL, BER OF RECOVERED WATERMARK Song.wav SNR Without attack 44.5607 Downsampled 44.5607 Upsampled 44.5607 Mp3compressed 44.5607 Requantization 44.5607 Cropping 44.5607 Lpfiltered 31.8993 fc=22050 Time warping 23.4781 10% Echo addition 17.0730 Equalization 19.3434 BER 0 0.0029 0.0029 0.0029 0.0029 0.0049 0.0238 Tabala.wav SNR BER 54.4350 0 54.4350 0.0137 54.4350 0.0137 54.4350 0.0137 54.4350 0.0137 54.4350 0.0079 44.0477 0.0379 Bhimsenbahar.wav SNR BER 43.2635 0.0039 43.2635 0.0039 43.2635 0.0039 43.2635 0.0039 43.2635 0.0039 43.2635 0.0056 30.7531 0.0246 Harmonium.wav SNR BER 61.0525 0 61.0525 0.0137 61.0525 0.0137 61.0525 0.0137 61.0525 0.0137 61.0525 0.0263 45.8223 0.0371 0.0938 25.4836 0.0840 22.5946 0.0929 33.8252 0.5098 0.0598 0.0313 22.8352 23.3258 0.0823 0.392 17.0064 18.5831 0.0637 0.0402 16.5880 27.0980 0.2988 0.0840 The results in the table represent that use of DCT domain improves the imperceptibility but decreases the robustness especially for the signals like Harmonium and Tabla for attacks like time warping, echo addition. As the frequency of these signals is high compare to the voice signal. We observed the improved robustness for the case of song and classical music as it involves the low frequency voice signals along with the instrumental signals. The results obtained with the variation in segment length along with number of DCT coefficients modified presented in table 6.3. From the table it is clear that though the SNR goes on improving with the smaller size of the segment, but the 92 watermark does not survive after time scaling and low pass filtering attacks. Therefore for the better robustness results it is required that the segment length should be greater than 128. To keep the trade off between the survivals of the mark and the SNR we kept segment length is equal to 256 in our implementations. To implement good cryptographic method the length can be larger. It is also observed that if lengths L1 and L2 are kept unequal there is variation in SNR but no effect on BER. The main concentration of the work is on the robustness and if lengths are kept unequal the robustness is not varied therefore for simplicity we kept equal lengths for our further observations. TABLE 6.3 RELATION BETWEEN SEGMENT LENGTH VARIATION AND NO. OF COEFF. MODIFIED WITH SNR AND BER Segment length=256 BER without attack Length of DCT coeff. Modified BER without attack Robustness Length of DCT coeff. Modified SNR 256( L1=L2= 128) 39.313 0 L1= 147, L2=107 38.4747 0 Robust 128( L1=L2= 64) 36.548 0 L1= 73, L2=55 36.1779 0 Robust SNR Segment length=128 128( L1=L2= 64) 39.529 0 L1=73, L2=55 38.4975 0 Robust 96( L1=L2= 48) 39.023 0 L1= 59, L2=37 38.6325 0 Not sustained in time scaling Segment length=64 64( L1=L2= 32) 44.377 0 L1=37, L2=27 44.4504 0 32( L1=L2= 16) 41.464 0 L1=19, L2=13 41.7952 0 16( L1=L2= 8) 38.44 0 L1= 9, L2=7 38.2531 0 Not sustained in time scaling, lpfiltering Not sustained in time scaling, lpfiltering Not sustained in time scaling, lpfiltering Segment length=32 32( L1=L2= 16) 48.917 0 L1=19, L2=13 48.8208 0 16( L1=L2= 8) 46.01 0 L1= 9,L2=7 45.3319 0 Not sustained in time scaling, lpfiltering Not sustained in time scaling, lpfiltering Segment length=16 16( L1=L2= 8) 47.91 0 L1=9,L2=7 47.3868 0 Not sustained in time scaling, lpfiltering 93 6.2.3. Proposed blind watermarking technique implemented in DWTDCT domain: The results of DWTDCT domain technique are presented in this subsection. Each GOS is first DWT transformed to three level wavelet decomposition, and then the ca3 coefficients are selected to embed the watermark. The ca3 coefficients are DCT transformed in a similar manner how DWT-DCT transformation is done in section 5.4. Watermark embedding and extraction schematics are depicted in fig.6.8 and 6.9 respectively. To obtain the comparison between the three techniques implemented the same set of audio signal are used for experimentation. The embedding and extraction equations involved in this procedure are A = mean(C k (i)) 0 ≤ i ≤ L1 − 1 (6.22) B = mean(C k (i)) 0 ≤ i ≤ L 2 + L1 − 1 (6.23) A′ = mean(C′k (i)) 0 ≤ i ≤ L1 − 1 (6.24) B′ = mean(C′k (i)) 0 ≤ i ≤ L 2 + L1 − 1 (6.25) C k (i) are DWT-DCT coefficients of xk(i) and C′k (i)are DWT-DCT coefficients of kth segment of watermarked signal. L1=L2= 16 Input x Segmentation DWT transformation Followed by DCT W Selection of length L1 & L2 Dimension mapping Scaling parameter K Computation of mean A & mean B Watermark insertion IDWT followed by IDCT transformation Watermarked signal Concatenation of Segments Fig 6.8 Block schematic of GOS based watermark embedding in DWT-DCT domain. 94 Watermarked Segmentation signal Selection of length L1 & L2 DWT-DCT transformation Recovered Concatenation watermark and Dimension ŵ recovery Computation of mean A′ & mean B′ Watermark bit recovery Fig 6.9 Block schematic of GOS based watermark extraction in DWT-DCT domain. (a)Without attack (b) Down-Sampling (d)Mp3compress (h) Lpfiltered22 (f) Cropped (c) Up-sample (e) Requantized (g) Time warping Fig. 6.10 Results of robustness test recovered watermark signal. The robustness test results are presented in Table 6.4 and the recovered watermarks from watermarked signal after various signal processing modifications are shown in Fig 6.10.a to Fig 6.10.i. TABLE 6.4 SNR BETWEEN THE ORIGINAL AUDIO SIGNAL AND WATERMARKED AUDIO SIGNAL, BER OF RECOVERED WATERMARK Song.wav SNR Without attack 41.7059 Downsampled 41.7059 Upsampled 41.7059 Mp3compressed 41.7059 Requantization 41.7059 Cropping 41.7059 Lpfiltered 31.6594 fc=22050 Time warping 23.6284 10% Echo addition 17.7346 Equalization 19.8 BER 0 0.0020 0.0020 0.0020 0.0020 0.0038 Tabala. wav SNR BER 38.9251 0 38.9252 0.0039 38.9252 0.0039 38.9252 0.0039 38.9252 0.0039 38.9252 0.0059 Bhimsenbahar.wav SNR BER 31.5349 0 30.7345 0.0029 30.7345 0.0029 31. 5349 0.0029 31. 5349 0.0029 31.1489 0.0046 Harmonium.wav SNR BER 38.9913 0 36.6782 0.0043 36. 6782 0.0043 38. 9913 0.0043 38. 9913 0.0043 37.7654 0.0068 0.0130 37.8637 0.0156 30.6054 0.0149 36. 6782 0.0253 0.0734 25.3734 0.0840 17.6704 0.0805 17.8623 0.0853 0.0204 0.0313 22.8352 23.3258 0.0523 0.0492 31. 5349 31. 5349 0.0042 0.0037 38. 9913 38. 9913 0.0013 0.0013 95 The results presented in table 6.4 indicate that the scheme implemented in DWT-DCT domain is more robust compare to all the schemes proposed in this chapter. As our main aim of present work is to achieve the better robustness test results it clear from the techniques implemented and tested in this chapter as well as in chapter5 that the DWT-DCT domain is the more suitable domain to embed the watermark. Using DWT-DCT domain we take the advantage of low-middle frequency regions to embed the watermark. It can also be observed that embedding using GOS modification increases the imperceptibility of the watermarked signal compare to the spread spectrum method. But in spread spectrum based technique robustness is increased. The comparison for song signal is shown in the table 6.5. In order to further increase the robustness of DWT_DCT based watermarking scheme we propose to embed the watermark using cyclic coder. Table 6.5 Comparison between the proposed GOS based scheme and proposed SS based scheme Song.wav Without attack Downsampled Upsampled Mp3compressed Requantization Cropping Lpfiltered fc=22050 Time warping -10% Echo addition Equalization GOS based scheme SNR BER 41.7059 0 41.7059 0.0020 41.7059 0.0020 41.7059 0.0020 41.7059 0.0020 41.7059 0.0038 31.6594 0.0130 23.6284 0.0734 17.7346 0.0204 19.8 0.0313 SNR SS based scheme BER 31.7109 31.7109 31.7109 31.3456 31.3456 31.7056 27.9929 20.0535 16.9394 19.1391 0 0 0 0 0 0.0056 0 0.0186 0.0049 0 6.3. Proposed GOS based blind technique using cyclic coding To improve the detection performance and to make the system intelligent we propose to encode the watermark using cyclic coder before embedding in to the host audio signal. The block schematic of this scheme is shown in fig 6.11 the watermark to be embedded is first encoded by cyclic encoder and then it is embedded in to the host signal. At the decoder side the recovered watermark is decoded using cyclic decoder and then it is recovered. The robustness results obtained using this method is presented in the table 6.6. From the table it is clear that the detection performance is significantly increased even in 10% time scaling, echo addition and low pass filtering. 96 Key Key Attack Watermark encoder Host Audio Watermark Decoder Encoding using cyclic codes Encoding using cyclic codes Watermark Dimension mapping Dimension mapping Recovered Watermark Fig 6.11 Improved encoder and decoder for GOS based blind watermarking using cyclic coding. Table 6.6 results obtained for (7,4) cyclic codes Song.wav SNR Without attack 38.7784 Downsampled 38.7784 Upsampled 38.7784 Mp3compressed 38.7784 Requantization 38.7784 Cropping 38.7784 Lpfiltered 31.6594 fc=22050 Time warping 23.6284 10% Echo addition 19.8 Equalization 41.7059 0 0 0 0 0 0.0038 Tabala. wav SNR BER 36.4719 0 36.4719 0.0039 36.4719 0.0039 36.4719 0.0039 36.4719 0.0039 36.4719 0.0059 Bhimsenbahar.wav SNR BER 28.8574 0 28.8574 0 28.8574 0 28.8574 0 28.8574 0 28.9863 0.0048 Harmonium.wav SNR BER 35.8765 0 35.8765 0.0022 35.8765 0.0022 35.8765 0.0022 35.8765 0.0022 35.1187 0.0058 0.0098 33.4554 0.0156 26.4356 0.0102 31.2845 0.0187 0.0107 22.8455 0.0840 16.2565 0.0214 14.6572 0.0657 0.0089 0.0098 21.6675 22.4543 0.0523 0.0492 25.9766 27.6252 0.0038 0.0015 35.8765 35.8765 0 0 BER It is clear from the table 6.6 that the encoder and decoder model proposed using cyclic coding is more robust compare to all the techniques proposed. With small sacrifice in imperceptibility test (Computed SNR) the (7, 4) cyclic coder provides the best robustness test as our main aim is to embed the watermark which sustain all kinds of attacks and recover the watermark successfully. 97 6.4. Comparison of proposed method and with various well known watermarking algorithms: To compare the performance of the proposed method we compare our method with other well known watermarking methods. This kind of comparison we borrowed from Lie et al from their publication in [23] where they have compared their method with other algorithms. Table 6.7 shows comparison of methods handled in literature with our proposed methods with respect to handling of different attacks, embedding position, domain etc. From the chart it is clear that the schemes proposed by us are tested for common signal processing attacks and are able to provide highest watermark embedding capacity compare to the all techniques mentioned in the chart except the technique proposed by Boney [12]. Though the technique suggested by Boney [12] provides highest watermark embedding, it is not robust against the few of the signal processing attacks such as time scaling, resampling and requantization. The table 6.8 gives comparison between the blind methods proposed by us and the various blind methods highlighted in literature survey. The results of the methods [13,16,46] are taken from the literature. The entry NA in the table indicates that the particular type of attack not addressed in the literature. The comparison in table 6.8 is done with respect to the robustness of the schemes against signal processing attacks. The method [13] has addressed only four kinds of attacks and their scheme does not sustain even against mp3 compression. The scheme in [16] has not addressed the attacks like down sampling, cropping, time scaling and echo addition attacks. The BER for this scheme in[16] for other attacks is always greater than 0.04 and their scheme is not able to recover the watermark with 0 BER for any of the attacks. The scheme in [46] has reported better SNR and good BER but their scheme produces poor BER for the down sampling attack. They have not addressed the major sensitive attacks like upsampling, cropping, time scaling, echo addition and equalization. Though our proposed methods give less SNR compare to scheme in [46], it is successful to sustain all kinds of attacks. Our schemes are more robust compared to the other schemes; also we have taken care to add the security to the implemented schemes. 98 TABLE 6.7 COMPARISION CHART OF VARIOUS WATERMARKING ALGORITHMS THE SYMBOL “?” REPRESENT THE CASE OF NOT BEING ADDRESSEDOR ANALYZED IN THE WORK. Embedding domain Secret keys used Embedding positions Blind detection Subjective test reported Bits embedded Cropping attack LP filtering attack Time scaling attack Resampling attack Requantization attack Compression attack Lowest BER Our scheme using SS DWTDCT Yes[PN sequence] FX Our scheme using GOS DWT-DCT Lie[23] Bassia[24] Wu[26] Li[29] Shin[90] Ko[44] Boney [12] Time Time Fourier Subband Time Time Time Yes[Section length] FX Yes[Section length] FX Yes[watermark key] FX NO NO DY Yes[PN sequence] FX No DY Yes[Frame length] FX FX Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No No No Yes No No ? 159bps Yes 159bps Yes 43bps Yes 44bps Yes ? Yes 38bps No 25bps No 4bps No 860bps No Yes Yes Yes Yes Yes Yes No No Yes Yes Yes Yes No No Yes Yes No No Yes Yes No Yes No No No No No Yes Yes No Yes No No No No No Yes Yes Yes Yes Yes Yes Yes No Yes 0 0 0.03 ? 0.305 0.028 0.084 ? 0.001 Fx-Fixed Dy-Dynamic 99 TABLE 6.8 COMPARISON CHART FOR VARIOUS BLIND AUDIO WATERMARKING TECHNIQUES. Proposed GOS based method using cyclic coder SNR in db BER 28.5381 28.3982 28.3982 28.3982 28.3982 22.3475 26.9378 23.5560 0 0 0 0 0 0.0094 0 0 38.7784 38.7784 38.7784 38.7784 38.7784 38.7784 31.6594 0 0 0 0 0 0.0038 0.0098 23.6284 0.0107 16.0376 0.0021 20.8502 0 In 6sec 32*32 19.8 41.7059 0.0089 0.0098 In 6sec 32*32 Scheme in [13] Scheme in [16] Scheme in [46] SNR in db NA NA NA NA NA NA NA NA SNR in db SNR in db 50.445 30.4116 NA 50.4542 45.3332 NA 36.6381 NA BER 0 NA NA 0.91 NA NA NA NA NA 0 NA 0 In 15 sec 128 bits Range is between -10 to 20 Without attack Downsampled Upsampled Mp3compressed Requantization Cropping Lpfiltered Time warping 10% Echo addition Equalization Bits embedded Proposed SS based method using cyclic coder SNR in db BER BER 0 NA 0.04 0.08 0.08 NA 0.06 NA NA 0.13 In 5sec 25 bits BER 0 0.0513 NA 0 0 NA 0.0002 NA NA NA NA NA In 9.57sec 64*64 NA-Not addressed in literature. 100 6.5. Summary of chapter: The schemes implemented in this chapter provide good results. The SNR between the original watermark and the recovered watermark is increased compare to the schemes implemented in previous chapter. It is observed that the robustness of the techniques is improved by implementing the scheme using cyclic coder. The techniques implemented in this chapter are blind meaning that it does not require the original signal to recover the watermark. The scaling parameter used to scale the host signal during the watermark embedding is adaptive meaning that the parameter is different for each GOS depending on the audible requirements of each GOS. It is observed that the DWT-DCT domain is more suitable domain to embed the watermark as it gives us an advantage of embedding in low-middle frequency regions. The watermark embedded in low-middle frequency sustains all kind of signal processing attacks. 101 Chapter 7 Intelligent Encoder decoder modeling Chapter 7 Intelligent encoder and decoder modelling Introduction In order to describe the link between watermarking and standard data communications, the traditional model of a data communications system is often used to model watermarking systems. The basic components of a data communications system, related to the watermarking process, are highlighted. One of the most important parts of the communications models of the watermarking systems is the communications channel, because a number of classes of the communications channels have been used as a model for distortions imposed by watermarking attacks. The other important issue is the security of the embedded watermark bits, because the design of a watermark system has to take into account access that an adversary can have to that channel. In this chapter we highlight on the communications models of watermarking. The first section of this chapter introduces the basic model of watermarking. The second section proposes the intelligent encoder decoder model of audio watermarking based on the implementations of chapter 5 and third section of this chapter proposes the intelligent encoder decoder model of audio watermarking based on the implementations of chapter 6. 7.1. Basic model of watermarking The process of watermarking is viewed as a transmission channel through which the watermark message is being sent, with the host signal being a part of that channel. In Figure 7.1, a general mapping of a watermarking system into a communications model is given. After the watermark is embedded, the watermarked signal is usually distorted after watermark attacks. The distortions of the watermarked signal are, similar to the data communications model, modeled as additive noise. A watermark message m is embedded into the host signal x to produce the watermarked signal s. The embedding process is dependent on the key k and must satisfy the perceptual transparency requirement, i.e. the subjective quality difference between x and s must be below the just noticeable difference threshold. Before the 103 watermark detection and decoding process takes place, s is usually intentionally or unintentionally modified. The intentional modifications (n) are usually referred to as attacks; an attack produce attack distortion at a perceptually acceptable level. After attacks, a watermark extractor receives attacked signal r. Input m Watermark embedder Watermark Wa s encoder Noise n Watermark detector r Watermark Decoder Output x Watermark key Host signal Watermark key Fig. 7. 1. A watermarking system and an equivalent communications model. Depending on a watermarking application, the detector performs informed or blind watermark detection. The term attack requires some further clarification. Watermarked signal s can be modified without the intention to impact the embedded watermark (e.g. dynamic amplitude compression of audio prior to radio broadcasting). Why this kind of signal processing is called an attack? The first reason is to simplify the notation of the general model of digital watermarking. The other, an even more significant reason, is that any common signal processing impairing an embedded watermark drastically will be a potential method applied by adversaries that intentionally try to remove the embedded watermark. The watermarking algorithms must be designed to endure the worst possible attacks for a given attack distortion which might be even some common signal processing operation (e.g. dynamic compression, low pass filtering etc.). Furthermore, it is generally assumed that the adversary has only one watermarked version s of the host signal x. In fingerprinting applications, differently watermarked data copies could be exploited by collusion attacks. It has been proven that robustness against collusion attacks can be achieved by a sophisticated coding of different watermark messages embedded into each data copy. However, it seems that the necessary codeword length increases dramatically with the number of watermarked copies available to the adversary. The importance of the key k has to be emphasized. The embedded watermarks should be secure against detection, decoding, removal or modification or modification 104 by adversaries. Kerckhoff’s principle [35], stating that the security of a crypto system has to reside only in the key of a system, has to be applied when the security of a watermarking system is analyzed. Therefore, it must be assumed that the watermark embedding and extraction algorithms are publicly known, but only those parties knowing the proper key are able to receive and modify the embedded information. The key k is considered a large integer number, with a word length of 64 bits to 1024 bits. Usually, a key sequence k is derived from k by a cryptographically secure random number generator to enable a secure watermark embedding for each element of the host signal. In order to properly analyze digital watermarking systems, a stochastic description of the multimedia data is required. The watermarking of data whose content is perfectly known to the adversary is useless. Any alteration of the host signal could be inverted perfectly, resulting in a trivial watermarking removal. Thus, essential requirements on data being robustly watermarkable are that there is enough randomness in the structure of the original data and that quality assessments can be made only in a statistical sense. Let the original host signal x be a vector of length Lx. Statistical modeling of data means to consider x a realization of a discrete random process x. In the most general form, x is described by probability density function (PDF). A further simplification is to assume independent, identically distributed (IID) data elements of x. Most multimedia data cannot be modeled adequately by an IID random process. However, in many cases, it is possible to decompose the data into components such that each component can be considered almost statistically independent. In most cases, the multimedia data have to be transformed, or parts have to be extracted, to obtain a component-wise representation with mutually independent and IID components. The watermarking of independent data components can be considered as communication over parallel channels. Watermarking embedding and attacks against digital watermarks must be such that the introduced perceptual distortion - the subjective difference between the watermarked and attacked signal to the original host signal is acceptable. In the previous section, we introduced the terms embedding distortion and attack distortion, but no specific definition was given. The definition of an appropriate objective 105 distortion measure is crucial for the design and analysis of a digital watermarking system. Watermark extraction reliability is usually analyzed for different levels of attack distortion and fixed data features and embedding distortion. Different reliability measures are used for watermark decoding and watermark detection. In the performance evaluation of the watermark decoding, digital watermarking is considered as a communication problem. A watermark message m is embedded into the host signal x and must be reliably decodable from the received signal r. Low decoding error rates can be achieved only using error correction codes. For practical error correcting coding scenarios, the watermark message is usually encoded into a vector b of length Lb with binary elements bn = 0; 1. Usually, b is also called the binary watermark message, and the decoded binary watermark message is b̂ . The decoding reliability of b can be described by the bit error rate (BER) The BER can be computed for specific stochastic models of the entire watermarking process including attacks. The number of measured error events divided by the number of the observed events defines the measured error rates, word error rate, WER. The capacity analysis provides a good method for comparing the performance limits of different communication scenarios, and thus is frequently employed in the existing literature. Since there is still no solution available for the general watermarking problem, digital watermarking is usually analyzed within certain constraints on the embedding and attacks. Additionally, for different scenarios, the watermark capacity might depend on different parameters (domain of embedding, attack parameters, etc.). If an informed watermark detector is used, the watermark detection is performed in two steps. In the first step, the unwatermarked host signal may be subtracted from the received signal r in order to obtain a received noisy added watermark pattern wn. It is subsequently decoded by a watermark decoder, using the same watermark key used during the embedding process. Because the addition of the host signal in the embedder is exactly canceled by its subtraction in the detector, the only difference between wa and wn is caused by the added channel noise. Therefore, 106 the addition of the host signal can be neglected, making watermark embedding, channel noise addition and watermark extraction equivalent to the data communications system. 7.2. Method-1: Proposed Intelligent encoder and decoder model for robust and secure audio watermarking based on Spread Spectrum The main aim of the present work is to make the system robust to all kinds of attacks. In this section we propose the intelligent model of encoding and decoding based on spread spectrum method. To make the system intelligent we propose to add the following features. 1) Add synchronization patterns to indicate the start of the file from where the watermark is embedded in the host signal. 2) To improve the robustness of the system through diversity, add multiple watermarks at different locations in a file using time diversity. 3) Encode the watermark to reduce the bit error rate and to improve the robustness further. 4) Make the system secure by using SS technique. In figure 7.2 we propose the spread spectrum based model of watermarking system incorporating all above mentioned features. The start of the signal is first identified and marked. The host audio signal x is decomposed in to smaller segments of user defined size from the marked point. Then the synchronization pattern is added to the host audio signal at various points of audio signal from where the watermark is embedded in host signal. A 6 bit zero mean synchronous pattern with +1 and -1 values alternatively is added in the music signal before embedding the watermark. The pattern is very small and it does not affect the signal quality. Due to the continuous higher value of amplitude the pattern is easily recognizable. Watermark embedding in time domain directly modifies the sample in time domain and hence the added distortion creates a smaller hum in the watermarked signal. Transforming the signal in to any suitable transform domain modifies the transformed samples of the signal and does not affect much on the imperceptibility. Each segment is therefore transformed into DWT-DCT transform domain. Design of 107 the technique to recover the watermark without using the host audio signal is required so the PRN (Pseudo random number) sequence is generated using any secret key k using cryptographic methods. Find start Input host of utterance audio x(n) and add sync pattern Watermarked audio y(n) Divide host into segment of size N Concatenate the segments Take third level DWT of each segment DCT transform the approximate coeff. of DWT Take IDCT and then IDWT of each segment Wa Initial Seed Binary Watermark W Dimension mapping Bipolar conversion PN sequence generation Cyclic coding using (7,4) encoder Scaled watermark Computation of scaling parameter α(k) for each segment Fig7.2 .Intelligent encoder model for proposed adaptive SNR based blind technique in DWT-DCT domain The watermark message to be transmitted is mapped into an added pattern, wa, of the same type and dimension of the host signal (one dimensional patterns). The watermark bit stream is then encoded using (7,4) cyclic encoder. Each resulting watermark bit is scaled and multiplied by PN sequence to embed in each segment of host audio. The encoded message pattern is then perceptually weighted in order to obtain the added pattern wa. After that, wa is added to the host signal, to construct the watermarked signal. If the watermark embedding process does not use information about the host signal, it is called the blind watermark embedding; otherwise the process is referred to as an informed watermark embedding. To find the imperceptibility between the two watermarked signal and original signal SNR is computed. While watermark embedding we modify each segment adaptively to embed the binary data. The computation of α (the scaling parameter) is made based on the energy of the signal. This helps for keeping the audibility of added watermark below the masking threshold. After the added pattern is embedded, the 108 watermarked work y is usually distorted during watermark attacks. We model the distortions of the watermarked signal as added noise, as in the data communications model. The types of attacks may include compression and decompression, low pass filtering, resampling, requantization, cropping, time scaling, etc. When the signal ‘r’ reaches to the destination it is required to recover the embedded watermark signal from ‘r’. The watermark decoder model is shown in fig 7.3. First start of the signal is identified. The synchronization pattern is tracked. Then ‘r’ is decomposed in to smaller segments of the same size used while embedding the watermark. Each segment is then transformed to the DWT-DCT domain. Every segment is then multiplied with the watermark key used. Watermark bit is then recovered from each segment as explained in the chapter 5. Decode the watermark using (7,4) cyclic decoder. Concatenate each watermark bit to get one dimensional watermark and then dimension transformation is used to convert back the one dimensional watermark into its original two dimensions. Received audio r(n) Find start of utterance and search Sync pattern Initial Seed Recovered Watermark Dimension mapping Divide host into segment of size N Take third level DWT-DCT of each segment PN sequence generation Decoding using cyclic decoder Threshold comparison and bit recovery from each segment Fig.7.3 Decoder model for adaptive SNR based blind technique in DWT-DCT domain. As in most applications, the watermark system cannot perform its function if the embedded watermark cannot be detected; robustness is a necessary property if a watermark is to be secure. Generally, there are several methods for increasing watermark robustness in the presence of signal modifications. Some of these methods aim to make watermarks robust to all possible distortions that preserve the perceptual quality of the watermarked signal. 109 One of the earliest methods of attack characterization consisted of diversity. Diversity is employed through watermark repetition. Although it is well known that the repetition can improve the reliability of robust data hiding schemes, it is traditionally used to decrease the effect of fading. If properly designed, a repetition can often significantly improve performance and may be worth the apparent sacrifice in the watermark bit rate. If the repetition is viewed as the application of communication diversity principles, it can be shown that a proper selection of an appropriate watermark embedding domain with an attack characterization can notably improve reliability. A communication channel can be broken into independent sub channels, where each sub channel has a certain capacity. Since, in a fading environment, some of these channels may have a capacity of zero in a particular time instant, diversity principles are employed. Specifically, the same information is transmitted through each sub channel with the hope that at least one repetition will successfully be transmitted. For watermarking, it is referred to as coefficient diversity because different coefficients within the host signal are modulated with the same information. The host audio signal x with longer length is selected to achieve the purpose. The signal x is then split into smaller signals of duration 7 sec. The watermark embedding is then applied to each of the 7 sec duration signal. Once embedding the watermark in all signals is over they are all concatenated to form a one signal and sent on communication channel. At the reception the received signal is again split in to smaller and 7sec duration signals and the watermark recovery is done from each signal of 7sec duration. The robustness results of this scheme implemented through time diversity are provided in table 7.1. In a 50 sec signal 5 watermarks are embedded after every 7 sec and the results are observed. The 5 watermarks of five 7 sec duration signals are recovered and their BER with the original watermark is computed. The recovered watermark which results in lowest BER is then identified and reported as a valid mark. The recovered watermark is then enhanced to remove the isolated error bits. The window of 3* 3 is used to identify the isolated pixel in the neighborhood of the center pixel. The values of neighborhood pixels are compared with the center pixel and if the value of center pixel is different than the maximum values of its 110 neighborhood pixels then it is replaced with the opposite one. As the watermark image is binary image it contain either 0 or 1 value. Table 7.1 Robustness results after attack characterization using time diversity Without attack downsampled upsampled Mp3compressed requantization cropping Lpfiltered fc=22050 Time scaling -10% Time scaling +10% Echo addition Equalization SNR Lowest BER after diversity 27.5764 26.6478 26.8021 26.8021 26.8021 23.3522 24.8021 20.4113 20.3347 20.6543 24.6745 0 0 0 0 0 0.0029 0 0.0105 0.0052 0.0019 0 BER after removing isolated error pixels 0 0 0 0 0 0.0009 0 0.0038 0.0029 0.0009 0 7.3. Method-2: Proposed Intelligent encoder and decoder model for robust and secure audio watermarking based on GOS modification We also propose the model based on patchwork algorithm which performs blind detection of watermark. The proposed intelligent encoder model is shown in Fig. 7.4 which modifies the group of segments (GOS) to embed the watermark. The start of the signal is first identified and marked. The host audio signal x is decomposed in to smaller segments of user defined size from the marked point. Then the synchronization pattern is added to the host audio signal at various points of audio signal from where the watermark is embedded. Now decompose the original signal x into smaller segments of user defined size. Transform the signal in to DWT-DCT domain. Each segment is again decomposed in to two part of group of samples (GOS) of equal/unequal size. Then compute the mean of each GOS and define them as A and B. The watermark message to be transmitted is mapped into an added pattern, wa, of the same type and dimension of the host signal (one dimensional patterns). To provide the security the watermark is encoded by error control coding (Cyclic encoder (7,4)) and then embedded into each segment. As explained earlier in the chapter 6 one 111 bit of binary watermark is embedded in one segment of host signal by modifying each GOS to satisfy the required condition of data embedding. The watermarked signal y is then transferred from source to destination through transmission channel. While traveling the signal on transmission channel it undergoes various intentional and unintentional signal processing attacks. The proposed decoder model based on GOS modification is shown in fig. 7.5. The received signal r is decomposed in to smaller segments. The r is transformed into DWT-DCT domain and the watermark is recovered by observing the mean of each GOS according to the rules. Input x Find start of utterance and add sync pattern DWT transformation Followed by DCT Segmentation W Dimension mapping Selection of length L1 & L2 Cyclic encoding Computation of mean A & mean B Watermark insertion Scaling parameter K Watermarked signal Concatenation of Segments IDWT followed by IDCT transformation Fig 7.4 Block schematic of GOS based intelligent encoder of watermark in DWT-DCT domain. Received signal r Find start of utterance and search Sync pattern Segmentation DWT-DCT transformation Recovered watermark ŵ Selection of length L1 & L2 Concatenation and Dimension recovery Computation of mean A′ & mean B′ Watermark bit recovery and cyclic decoder Fig 7.5 Block schematic of GOS intelligent decoder of watermark in DWT-DCT domain. 112 Table 7.2 Robustness results after attack characterization using time diversity. Without attack downsampled upsampled Mp3compressed requantization cropping Lpfiltered Time scaling -10% Time scaling +10% Echo addition Equalization SNR Lowest BER 36.4356 35.2543 35.2543 35.2543 35.2543 33.2541 30.8021 23.6543 22.4532 20.7645 24.8437 0 0.0009 0.0009 0.0009 0.0009 0.0029 0.0009 0.0185 0.0193 0.0124 0.0098 BER after removing isolated pixels 0 0 0 0 0 0.0009 0 0.0068 0.0078 0.0048 0.0019 The robustness results of this scheme implemented through time diversity are provided in table 7.2. The results presented in table 7.1, and 7.2 it is clear that the diversity in time increases the robustness of the system. The system implemented using GOS modification increases the imperceptibility where as the spread spectrum based technique is more robust. So we propose to use the model-1 based SS when robustness is more concerned than the imperceptibility and use model-2 when imperceptibility is the requirement. 7.4. Summary of Chapter In this chapter we modeled the audio watermarking techniques using data communication principles. The techniques implemented are more robust in transform domain than in time domain, so we propose to embed the data in transform domain. In both the models watermark is embedded adaptively meaning that discrimination factor used to embed each bit is varied according to the segment characteristics. Robustness is significantly improved by cyclic encoding and decoding of watermark. Diversity in time further increases the robustness of the system. We have also added the synchronization pattern to trace the start of watermark in the watermarked file. T The intelligent models we proposed are able to embed the watermark bit adaptively and perform the blind extraction of watermark successfully. Finally it is observed that the model proposed in method-1 is more robust compare to the method2. The max value of BER for method-1 is 0.0038 for time scale modification and for method-2 is 0.0078.Hence is applicable in the areas like copy protection, piracy 113 control, fingerprinting applications where robustness is preferred. The mehod-2 model is more imperceptible than method-1 and therefore applicable in applications where high degree of imperceptibility is the requirement. The maximum value of SNR for method-1 is 27.5764 and for method-2 is 36.4356. 114 Chapter 8 Discussion and Conclusion Chapter 8 Discussion and Conclusion Introduction Audio watermarking algorithms are studied in this thesis. Main goal of this thesis is to develop the intelligent encoder and decoder model of robust and secure audio watermarking. We have proposed two basic models one based on spread spectrum technique and other based on patch work algorithm. The implementation procedure and the results obtained for these methods were presented in chapter 5 and chapter 6 respectively. The chapter 7 proposed the models of the watermarking techniques based on the digital communication principles. This chapter concludes the thesis and gives suggestions for the further research. 8.1. Discussion and conclusion The main aim of the present work is the development of audio watermarking algorithms, with the state-of-the-art performance. The algorithms performance is validated in the presence of the standard watermarking attacks. In chapter 2 a survey of the key digital audio watermarking algorithms and techniques are presented. The referred algorithms are classified according to domain used for inserting a watermark and statistical method used for embedding and extraction of watermark bits. Scientific publications referred in the literature survey are chosen in order to build sufficient background that helps in identifying and solving the research problems. The main point of the "magic triangle" concept is that if the perceptual transparency parameter is fixed, the design of a watermark system cannot obtain a high robustness and watermark data rate at the same time. Therefore, the research problem was divided into three specific sub problems. These sub problems are stated in chapter 3. To solve the sub problems stated in chapter 3 we implemented the various algorithms as mentioned. i)What is the highest watermark bit rate obtainable, under perceptual transparency constraint, and how to approach the limit? 115 ii) How can the detection performance of a watermarking system be improved using algorithms based on communications models for the system? iii) How can overall robustness to attacks of a watermark system be increased using an attack characterization at the embedding side? These problems are tackled as below 1.To obtain a distinctively high watermark data rate, embedding algorithm were implemented in to transform domain. 2. To improve detection performance, a spread spectrum method and patch work algorithms are used, bit error rate is improved using cyclic encoding scheme. 3. To increase the robustness the attack characterization is employed through diversity. To increase the high bit rate we embed the data in LSB of host audio in wavelet domain. The wavelet domain LSB algorithm is described in chapter 4. The wavelet domain was chosen for data hiding due to its low processing noise and suitability for frequency analysis, because of its multiresolutional properties that provide access both to the most significant parts and details of signal’s spectrum. The wavelet domain algorithm produces stego objects perceptually hardly discriminated from the original audio clip even when LSBs of coefficients are modified, in comparison with the time domain LSB algorithm. The audio watermark is added into the host audio. In this scheme the attempt is made to embed the audio watermark in host audio signal. We have tried to embed the audio data of 0.5 to 5sec duration in 45 sec duration host audio. To measure the imperceptibility between the original signal and the watermarked signal we computed the SNR between them. The computed SNR after embedding the audio of length from 0.5 sec to 5 sec is presented in Table 4.2. The observed SNR for all cases is above 28 dB. The test results provided in Table 4.1 confirmed that the embedded information is inaudible except for the attacks like time warping and low pass filtering. 116 The BER is calculated to prove the similarity between the original watermark and the extracted watermark. The results listed in Table 4.3 indicate that the watermarking technique is robust to common signal processing attacks such as compression, echo addition and equalization with poor detection results for low pass filtering. This technique can be used for covert communication. One disadvantage of this technique is that the technique is non blind meaning that it requires original host audio signal to recover the watermark. To develop the blind watermarking technique and to increase the detection performance of added copy right information we embed the data using spread spectrum method. The spread spectrum based techniques are focused in chapter 5. From the test results in table 5.2 it is observed that non blind technique has better SNR compared to the blind detection scheme but it requires the original audio signal to recover the watermark. As in case of non blind techniques number of modifications made to embed the watermark bit is less as compare to the blind technique. In non blind as well as in blind techniques only one bit of information is added in one segment of host audio. To embed one bit of information we modify only one sample in the host audio segment, where as we modify each and every sample of host audio to implement blind watermarking technique. In table 5.2 the results obtained for the proposed blind technique implemented in DWT/LWT domain are also presented. Though the SNR in these cases is below 20 dB the listening test confirms that there is no perceptual difference between the original watermark and the recovered watermark. The BER test between the original watermark and recovered watermark indicate that DWT based blind detection technique is slightly more robust than LWT based blind detection technique. The blind watermarking schemes proposed in this chapter embed the watermark adaptively in each segment. To make the scheme adaptive we compute the scaling parameter α(k) for each segment by making assumption of the expected SNR between the original segment coefficients and the modified segments coefficients. The selection criteria for the value of SNR in computation of α(k) and the selection criteria for segment length is decided based on the results presented in table 5.3. From the table 5.3 it is clear that 117 the optimized results are obtained for segment length 256 and SNR between 40 – 60 dB. To provide the security to the system and to make the system secure the PN sequence is used as a secret key to embed the watermark. The embedded watermark cannot be recovered without the knowledge of the PN sequence used, the watermark the key is very important to keep the method secure. It is also observed that the SNR between original signal and watermarked signal is improved by embedding the watermark in DWT-DCT (DWT transform followed by DCT transform of low frequency coefficients) domain. By computing the DCT of 3-level DWT coefficients we take advantage of low frequency middle components to embed the information. This scheme proposed in section 5.4 provides better SNR compared to other two techniques implemented in chapter 5 and is also robust to various signal processing attacks. To embed the data using spread spectrum method we propose to use DWT-DCT domain to increase the imperceptibility and to obtain the better robustness test. To add the security and to further improve the robustness results the cyclic coding is used. The watermark bits are encoded using cyclic coder before embedding into host audio. At the receiver side the cyclic decoder is used to decode the watermark. Cyclic coding and decoding corrects the one bit errors generated during the watermark recovery. Due to encoding of watermark the opponent will not come to know the statistical behavior of watermark signal and hence can not guess the watermark from the statistical behavior of the audio signal. The results of this scheme are presented in table 5.6 and 5.7 using (6,4) and (7,4) cyclic coder respectively. Since our main goal of watermarking is to increase the robustness we preferred to use (7, 4) cyclic coder because it is more robust as compared to (7,4). Schemes based on patch work algorithm are proposed in chapter 6 in different transform domains. It is observed that this scheme is more imperceptible than the spread spectrum technique. The obtained SNR between the original signal and watermarked signal is increased for this case. The maximum value of SNR obtained using this scheme is 61.0525dB for harmonium signal and 44.5607 for song signal. This is obtained due to the smaller amount of δ (discrimination parameter) used to embed the information in this scheme. Also we make the modification of the segments if the required condition is not satisfied. Since the less number of 118 modifications are done to embed the watermark this scheme provides better imperceptibility than the Spread spectrum based method. The results provided in table 6.1, 6.2 and 6.4 again confirm that DWT-DCT domain is the more suitable domain to embed the information. The schemes implemented in this chapter calculate the adaptive scaling parameter in each subsection/GOS preserving the imperceptibility of the watermarked signal and providing good robustness results. The robustness results for the scheme proposed in chapter 5 and chapter 6 indicate that the scheme proposed in chapter 5 is more robust than the scheme proposed in chapter 6. Chapter 7 proposes the intelligent encoder decoder models of the schemes implemented in chapter 5 and chapter 6. To make the models intelligent robust and secure we added the following features in the proposed model. i) We added synchronization pattern to indicate the start of watermark in the audio, ii) The time diversity is employed to improve robustness. iii) Encoded the watermark using cyclic coder. In both the models watermark is embedded adaptively meaning that discrimination factor used to embed each bit is varied according to the segment characteristics. The observed test results are significant even in timescale modification, low pass filtering. The models we propose are able to embed 1024 bits in 6.5 sec duration signal preserving the imperceptibility of the signal. The watermark is repeatedly embedded in a longer audio signal and observed the recovery of the same. It is observed that after various attack at least one out of five watermarks embedded provide the minimum BER result. In addition, it is clear that the introduction of the attack characterization module additionally improved the extraction reliability of both algorithms, decreasing the bit error rate, most discernibly in the presence of time scale modification; low pass filtering, echo addition and equalization. From the results presented in this chapter it is again confirmed that the SS based method is more robust than the method based on patch work algorithm. The SS based algorithm obtained high detection robustness and increasing the perceptual transparency of the watermarked signal. Time required to embed and recover the watermark from SS based scheme is 21.131 sec. Time required to embed and recover the watermark from GOS based scheme is 24.401 sec. 119 8.2. Main contribution of the present research 1. Audio watermarking scheme is proposed to embed an audio watermark in the audio signal 2. Adaptive blind watermarking algorithms based on SNR are developed using Spread spectrum technique. 3. Spread spectrum based techniques are implemented in different transform domains. 4. Cyclic coder and attack characterization by diversity is used to increase the robustness of the scheme. 5. Synchronization pattern is added to track the watermark. 6. New intelligent encoder and decoder model is proposed for an audio watermarking system using Spread Spectrum method. 7. Blind watermarking algorithms based on GOS modification are developed in different transform domain. 8. Cyclic coder and attack characterization by diversity is employed to increases the robustness. 9. Synchronization pattern is added to track the watermark. 10. New intelligent encoder and decoder model is proposed for an audio watermarking system using GOS method. 8.3. Future scope The research in watermarking is progressing along two paths while the new algorithms of watermarking are developed. Researchers are working on attacks on the watermarked signals. Proposing the new attacks and suggesting the counter attack is one of the hot areas of watermarking research. One can work on attacks to propose new attack for the existing algorithms and suggest the counter measures to be taken to survive against that attack. The researchers can work on setting a buyer seller protocol for watermarking techniques. The algorithms can be developed in DWT-DCT domain to meet the requirement of imperceptibility and robustness at the same time. Cryptographic methods can be employed to provide the security to the developed watermarking technique. Proposing asymmetric watermarking method is also appreciable in the field of watermarking. 120 References References [1]. I. J. Cox and Matt L Miller, “ The first 50 years of electronic watermarking”, Journal of applied signal processing, 2002, pp. 126-132. [2]. I. J. Cox, J. Kilian, F. Thomson Leighton and T. Shamoon, “Secure Spread Spectrum Watermarking for Multimedia”, IEEE Transactions on Image Processing, Vol. 6, No. 12, December 1997, p.p. 1673-1686. [3]. J. A. Bloom, I. J. Cox, T. Kalker, J. M. G. Linnartz and M. L. Miller, C. B. S. Traw , “Copy protection for DVD Video” ,Proc. of the IEEE, Vol. 87, No. 7, July 1999. [4]. C.T. Hsu, J.L. Wu., “Hidden Digital Watermarks in Images”, IEEE Transactions on Image Processing, Vol. 8,No. 1, January 1999, p.p.58-68. [5]. D. Kundur and D. Hazinakos, “Digital watermarking for telltale tamper proofing and authentication”, Proc. IEEE No.7, July 1999, pp. 1167-1180. [6]. D. Kundur and D. Hazinakos, “Diversity and Attack Characterization for Improved Robust Watermarking”, IEEE Transactions on Signal Processing, Vol. 49, No. 10, October 2001, p.p.2383-2396. [7]. Ming Shing Hsish and Din Chang, “Hidding the digital watermarks using Multiresolution wavelet transform”, IEEE Transactions on industrial electronics. Vol. 48 No. 5. October 2001, pp. 875-891. [8]. L. M. Marvel, C.G. Boncelet and C.T. Retter, “Spread Spectrum Image steganography”, IEEE Transactions on Image Processing, Vol. 8. August 1999. p.p. 1075-1083. [9]. C. S. Lu, S.K. Huang, C. J. Sze and H. Y. M. Liao, “Cocktail Watermarking for Digital Image Protection”, IEEE Transaction on Multimedia, Vol. 2., No.4, December 2000, p.p. 209-224. [10]. D.Kundur, “Watermarking with Diversity: insights and Implications”, IEEE Multimedia magazine , October-Decemer 2001, p.p 46-52. [11]. A.P.Fabien and Petitcols, “Watermarking Schemes evaluation”, IEEE Signal Processing Magazine, September 2000, p.p.58-64. 121 [12]. L.B. Boney ,A. Twefik and K. Hamdy, “Digital Watermarks for Audio Signals”, IEEE Int. Conf. on Multimedia computing and Systems, June 1996, p.p. 473-480. [13]. J. Seok, J. Hong and J. Kim, “A Novel Audio Watermarking Algorithm for Copyright Protection of Digital Audio,” ETRI Journal, Volume 24, No. 3, June 2002, p.p. 181-189. [14]. D.Kirovski and H.S.Malvar, “Spread spectrum watermarking of audio signals” , IEEE Transactions on Signal Processing, Vol. 51, No. 4, April 2003 p.p. 1020-1033. [15]. H.S.Malvar and D.A.F. Florencio, “Improved Spread Spectrum: A New Modulation Technique for Robust Watermarking”, IEEE Transactions on signal Processing, Vol. 51, No. 4, April 2003, p.p. 898-905. [16]. S. Esmaili, S. Krishnan and K. Raahemifar, “A Novel Spread Spectrum Audio Watermarking Scheme Based on Time-Frequency Characteristics”, Proc. of CCECE 2003, p.p. 1963-1966. [17]. D. Kirovaki and H. Malver, “Robust covert communication over a public audio channel using spread spectrum”, available at site www.cs.ucla.edu/~darko/papers. [18]. Hafiz Malik, Ashfaq Khokhar and Rashid Ansari, “Robust audio watermarking using frequency selective spread spectrum theory” Proceeding of ICASSP 2004, IEEE, p.p. V385- V388. [19]. W. Bender, D. Gruhl, N. Morimoto and A. Lu, “Techniques for data hiding”, IBM system Journal, 1996,Vol. 35, p.p. 313-336. [20]. I.K. Yeo and H.J. Kim, “Modified patchwork algorithm: a novel audio watermarking scheme”, Proc. ICITCC 2001, p.p. 237-242. [21]. N. Cvejic and T. Seppanen, “Robust Audio watermarking in Wavelet Domain Using Frequency Hopping and Patchwork method”, Proc. of 3rd International Symposium on Image and Signal processing and Analysis 2003, p.p. 251-255. [22]. R.Wang and Dawen Xu,Q. Li, “Multiple audio watermarks based on lifting wavelet transform”, Proc. of 4th international conference on m/c learning and cybernetics IEEE, August 2005,p.p. 1959-1964. [23]. W. N. Lie and L. C. Chang, “Robust and High Quality Time-Domain Audio Watermarking Based on Low Frequency Amplitude Modulation”, IEEE Transactions on Multimedia, Vol.8., No.1., February 2006, p.p. 46-59. 122 [24]. P. Bassia and I. Pitas, “Robust audio watermarking in the time domain”, IEEE Transactions on Multimedia,Vol. 3, No.2, June 2001 p.p.232-241. [25]. A. N. Lemma, J. Aprea, W. Oomen and L. V. D. Kerkhof, “A Temporal Domain Audio Watermarking Technique”, IEEE Transaction on Signal Processing, Vol. 51, No. , April 2003, p.p. 1088-1097. [26]. S. Wu., J. Huang, D. Huang and Y. Q. Shi, “Self-Synchronized Audio Watermark in DWT Domain”, Proc. of ISCAS 2004, p se.p. V.712-V.715. [27]. J. Huang, Y. Wang and Y. Q. Shi, “A blind audio watermarking algorithm with self synchronization,” Proc. IEEE Int. Symp. On circuits and systems, vol. 3, 2002, p.p.627630 [28]. S. Wu., J. Huang, D. Huang and Y. Q. Shi, “Efficiently self synchronized Audio watermarking for Assured Audio Data Transmission”, IEEE Transaction on Broadcasting, Vol. 51, No1, March 2005, p.p. 69-76. [29]. X. Li, M. Zhang and S.Sun, “Adaptive audio watermarking algorithm based on SNR in wavelet domain”, Proc. of IEEE conference 2003 p.p.287-292. [30]. A. Takahashi, R. Nishimura and Y. Suzuki, “Multiple Watermarks for Stereo Audio Signals Using Phase-Modulation Techniques”, IEEE Transactions on Signal Processing, Vol. 53, No. 2, February 2005, p.p. 806-815. [31]. H. H. Tsai, J. S. Cheng and P. T. Yu, “Audio Watermarking Based on HAS and Neural Networks in DCT Domain”, EURASIP Journal on Applied signal processing 2003, p.p. 252-263. [32]. C.Xu, J. Wu, D.D. Feng, “Content based Digital Watermarking for Compressed Audio”, Available on line at site http://133.23.229.11/~ysuzuki/Proceedingsall/ . [33]. X. Li and H.H. Yu., “Transparent and Robust Audio Data Hiding in Cepstrum Domain”, Proc. IEEE International conference on Multimedia and Expo. New York(I), 2000, P.P.397-400. [34]. S. K. Lee and Y. S. Ho, “Digital Audio Watermarking in the Cepstrum Domain”, IEEE Transaction on Consumer Electronics. Vol.46, No.3 August 2000, p.p.744 -750. [35]. N. Sriyingyong and K. Attakimongeol, “Wavelet based Audio watermarking using adaptive Tabu search”, Proc. of IEEE Int. Symp. Wireless Pervacive computing 2006 pp 1-5 on line at http://sutir.sut.ac.th:8080/sutir/bitstream/123456789/1821/1/BIB990_F.pdf 123 [36]. Wang, Xu Dawen, Chen Jiner and Duchengtou, “Digital Audio Watermarking algorithm based on Linear Predictive Coding in wavelet domain”, ICSP-04 Proc. of IEEE, 2004, p.p. 2393-2396. [37]. Chin-Su Ko, K.Kim, R.Hwang, Y. Kim and S.Rhee, “Robust Audio Watermarking in wavelet domain using PN sequences”, Proc. of ICIS-2005 published by IEEE. [38]. R.Vieru, R. Tahboub,C. Constantinescu and V. Lazarescu, “New results using the audio watermarking based on wavelet transform”, International Symposium on Signals, Circuits, and Systems, Kobe, Japan 2 (2005) published by IEEE 2005, p.p. 441-444. [39]. R.Wang, Dawen Xu and L Qian, “Audio Watermarking based on wavelet packet and Psychoacoustic model”, IEEE Proc. of PDCAT-2005. [40]. Wang, Xu, Z. Hang and C. Youngrui, “2-D digital Audio Watermarking based on Integer Wavelet Transform”, IEEE Proc. of ISCIT 2005, p.p. 877-880. [41]. S.Ratansanya, S.Poomdaeng, S. Tachaphetpiboon and T. Amornraksa, “New Psychoacoustic models for wavelet based Audio watermarking”, IEEE Proc. of ISCIT 2005, p.p. 582-585. [42]. S. D. Larbi, M. J. Saidane , “Audio Watermarking A Way to Stationnarize Audio Signals”, IEEE Transactions on Signal Processing, Vol. 53, No. 2, February 2005 p.p. 816-823. [43]. T. Furon and P. Duhamel, “An Asymmetric Audio Watermarking Method”, IEEE Transactions on Signal Processing, Vol. 51, No. 4, April 2003 p.p. 981-995. [44]. B. S. Ko, R. Nishimura and Y. Suzuki, “Time-spread Echo method for digital audio watermarking”, IEEE Transaction on multimedia, Vol. 7, No.2, April 2005, pp 212-221. [45]. S. Eerüçük, S. Krishnan and M. Zeytinoğlu, “A Robust audio watermark representation based on linear chirps”, IEEE Transaction on multimedia, Vol. 8, No.5, October 2005, pp 925-936. [46]. X. Wang, W. Qi and P. Niu, “A new Adaptive Digital Audio watermarking based on support vector regression”, IEEE Transaction on audio , speech and language processing, Vol. 15, No.8, November 2007, pp 2270-2277. [47]. M.Mansour and A. Twefik, “Audio watermarking by time scale modification”, in Proc. Int. conf. on Acoustics, speech and signal processing, May 2001, Vol. 3, pp 13531356. 124 [48]. M.Mansour and A. Twefik, “Data embedding in audio using time-scale modification”, IEEE Transaction on speech audio Processing, Vol. 13 No.3 pp432-440, 2005.e [49]. Wei Li, X Xue and P. Lu , “Localized Audio watermarking technique Robust against Time scale modification”, IEEE Transaction on multimedia, Vol. 8, No.1, February 2006, pp 60-69. [50]. S. Xiang and J. Huang, “Histogram-based audio watermarking against time-scale modification and cropping attacks”, IEEE Transaction on multimedia, Vol. 9, No.7, November 2007, pp1357-1372. [51]. X. Y. Wang and H. Zhao, “A Novel Synchronization Invariant Audio Watermarking Scheme based on DWT and DCT”, IEEE Transaction on signal processing, Vol.54, No.12, December 2006, pp 4835-4839. [52]. J. D. Gordy and L. T. Bruton, “Performance Evaluation of Digital Audio Watermarking Algorithms”, Proc. of Midwest symposium on Circuits and Systems, Vol. 1, August 2000, p.p. 456-459. [53]. F. Bartolini, M. Barni and A Piva, “Performance Analysis of ST-DM Watermarking in Presence of Non Additive Attacks”, IEEE Transactions on Signal Processing, Vol. 52, No. 10, October 2004 p.p. 2965-2974. [54]. F.A.P. Petitcolas and R. J. Anderson, “Evaluation of copyright marking systems”, Proc. of IEEE Multimedia systems’ 99, Vol. 1, pp 574-579. [55]. G.C. Roddriguez, M. N. Miyatake and H. M. P. Meana, “Analysis of Audio Watermarking Schemes”, Proc. of ICEEE 2005, p.p. 17-20. [56]. M. Wu, S. A. Craver, E. W. Felten and B. Liu, “Analysis of Attacks on SDMI Audio Watermarks”, IEEE Int. Conf. on acoustics, speech and Signal Processing, Vol. No.3, p.p. 1369-1372. [57]. F. Hartung, J. K. Su and B. Girod, “Spread Spectrum Watermarking: Malicious Attacks and Counterattacks”, Proc. SPIE, Vol. 3657 January 1999. [58]. M. Kuttur, S. Voloshynovskiy and A. Herrigel, “ The Watermark Copy Attack”, Proc. SPIE, Vol. 3971, security and watermarking of multimedia. 2000, p.p.371-380. [59]. Alexander Herrigel, Sviatoslav Voloshynovskiy and Yuriy Rystar, “The watermark template attack.” Proc. SPIE, Vol 4314, 2001, p.p. 394-400. 125 [60]. J. K. Su., F. Hartung and B. Girod, “A Channel Model for a Watermark Attack”, Proc. SPIE, Vol. 3657, January 1999, p.p. 159-170. [61]. D.Kirovski and A. P. Peticolas, “Blind Pattern Matching Attack on Watermarking Systems”, IEEE Transactions on Signal Processing, Vol. 51. No. 4, April 2003, p.p.10451053. [62]. N Cvejic, “Algorithms for audio watermarking and steganography”, academic dissertation report, available on line http://hekules.olu.fi/isbn9514273842. [63]. M. Swanson, B. Zhu and A. Twefik, L.B. Boney, “Robust audio watermarking using perceptual masking”, Signal process. Special issue on watermarking, 1997, pp. 337-355 [64]. M. Swanson, M. Kobayashi and A.H. Tewfik, “Multimedia data-embedding and watermarking technologies”, Proc. IEEE vol.86. No.6. June 1998.p.p. 1064-1087. [65]. M. Arnold and Z. Huang, “Fast Audio Watermarking: Concepts and Realizations”, Proc. SPIE 2004.p.p.105-111. [66]. P. Noll, “MPEG Audio coding”, IEEE Signal Processing Mag. September 1997, Vol.1, No.5 p.p.59-81. [67]. T. Painter and A. Spanias, “A Perceptual audio Coding”, Proc. of IEEE Vol. 88, No. 4, April 2000. [68]. P. Cayre, C. Fontaine and T. Furon, “Watermarking Security: Theory and Practice”, IEEE Transactions on Signal Processing, Vol. 53. No. 10, October 2005, p.p.3976-3987. [69]. Q. Cheng and T. S. Huang, “Robust Optimum Detection of Transform Domain Multiplicative Watermarks”, IEEE Transactions on Signal Processing, Vol. 51. No. 4, April 2003, p.p.906-920. [70]. O. Dabeer, k. Sullivan, U. Madhow, S. Chandrasekaran and B. S. Manjunath, “Detection of Hiding in the Least Significant Bit”, IEEE Transactions on Signal Processing, Vol. 52. No. 10, October 2004, p.p.3046-3048. [71]. T. K. Das, S. Maitra and J. Mitra, “Cryptanalysis of Optimal Differential Energy Watermarking and a Modified Robust Scheme”, IEEE Transactions on Signal Processing, Vol.53, No. 2. February 2005. p. p. 768-775. 126 [72]. H. H. Yu, D. Kundur and C. Y. Lin, “Spies, Thieves and Lies: The Battle for Multimedia in the Digital Era”, IEEE Multimedia Magazine July-September 2001, p.p.812. [73]. D.Kundur, “Watermarking with Diversity: insights and Implications”, IEEE Multimedia magazine , October-Decemer 2001, p.p 46-52 [74]. C. P. Yu and C. C. J. Kuo, “Fragile Speech Watermarking Based on Exponential Scale Quantization for Tamper Detection”, Available on IEEE Digital Library from 2002, p.p.3305-3308. [75]. M. Barni, F. Bartolini, A. D. Rosa and A. Piva, “Optimum Decoding and Detection of Multiplicative Watermarks”, IEEE Transactions on Signal Processing, Vol. 51, No.4, April 2003, p.p.1118-1123 [76]. G. Wu and E. H. Yang, “Joint Watermarking and compression Using Scalar Quantization for Maximizing Robustness in the Presence of Additive Gaussian Attacks”, IEEE Transactions on Signal Processing, Vol. 53, No.2, February 2005, p.p.834-844. [77]. A. Abrardo and M. Barni, “Informed Watermarking by Means of Orthogonal and Quasi-Orthogonal Dirty Paper Coding”, IEEE Transactions on Signal Processing, Vol. 53, No.2, February 2005, p.p.824-833. [78]. S. Trivedi and R. Chandramouli, “Secret Key Estimation in Sequential Steganography”, IEEE Transactions on Signal Processing, Vol. 53, No.2, February 2005, p.p.746-757. [79]. T. P. C. Chen and T. Chen, “A Framework for Optimal Watermark Detection”, Proc. of ACM Multimedia 2001. [80]. D Kirovski, H.S. Malvar and Y Yacobi, “A dual watermarking and fingerprinting system”, Microsoft research (online) Available at http://research.microsoft.com [81]. C. T. Hsu and J. L. Wu, “DCT based watermarking for Video”, IEEE Transactions on Consumer Electronics, Vol.44, No. 1, February 1998, pp. 206-216. [82]. J. Tzeng, W-L Hwang and I-L. Chern, “An Asymmetric Subspace Watermarking Method for Copyright Protection,” IEEE Transactions on Signal Processing, Vol. 53, No.2, February 2005, p.p.784-792. [83]. J.Fridrich, M. Goljan, P. Lisonek and D. Soukal, “Writing on Wet Paper”, IEEE Transactions on Signal Processing, Vol. 53, No.10, October 2005, p.p.3923-3935 127 [84]. S. Dumitrescu and X. Wu, “A New Frame Work of LSB Steganalysis of Digital Media”, IEEE Transactions on Signal Processing, Vol. 53, No.10, October 2005, p.p.3936-3947. [85]. P Kumsawat, K. Attakitmongcol and A. Srikaew, “A New Approach for Optimization in Image Watermarking by Using Genetic Algorithms”, IEEE Transactions on Signal Processing, Vol. 53, No.12, December 2005, p.p.4707-4719. [86]. J. Sigut, J. Demetrio, L. Moreno, J. Estévez, R. Aguilar and S. Alayón, “An Asymptotically Optimal Detector for Gaussianity Testing”, IEEE Transactions on Signal Processing, Vol. 53, No.11, Novmber 2005, p.p.4186-4793. [87]. F. Balado, K. M. Whelan, G. C. M. Silvestre and N. J. Hurley, “Joint Iterative Decoding and Estimation for Side-Informed Data Hiding”, IEEE Transactions on Signal Processing, Vol. 53, No.12, December 2005, p.p.4006-4019. [88]. H. Gou and M. Wu, “Data Hiding in Curves With Applications to Fingerprinting Maps” IEEE Transactions on Signal Processing, Vol. 53, No.10, October 2005, p.p.39884005. [89]. F. P. González, C. Mosquera, M. Barni and A. Abrardo, “Rational Dither Modulation: A High-Rate Data-Hiding Method Invariant to Gain Attacks”, IEEE Transactions on Signal Processing, Vol. 53, No.10, October 2005, p.p.3960-3975. [90]. N.K.Kalantri, M. A. Akhaee, S. M. Ahadi and H. Amindavar, “Robust Multiplicative Patchwork Method for Audio Watermarking”, IEEE Transaction on Audio, Speech and Language Processing, Vol. 17, No. 6, August 2009, p.p.1133-1141. [91]. Y. Nakashima, R. Tachibana and N. Babaguchi, “Watermarked Movie Soundtrack Finds the Position of the camcorder in a Theater”, IEEE Transaction on Multimedia, Vol. 11, No. 3, April 2009, p.p. 443-454. [92]. Oscal T.C. Chen and W.C. Wu. “Highly Robust, Secure, and Perceptual-Quality Echo Hiding scheme”, IEEE Transaction on Audio, Speech and Language Processing, Vol. 16, No. 3, March 2008, p.p. 628-638. [93]. X. Y. Wang, P.P. Niu and H.Y. Yang, “A Robust Digital Audio Watermarking”, IEEE Multimedia Magazine, July-September 2009,p.p.60-68 . 128 Appendix 129 View publication stats