ppt

advertisement
Techniques and Applications for
Audio
695410106
695410121
謝育任
劉威麟
Outline

Audio Watermark

Audio Classification :

Security Monitoring Using Microphone Arrays and Audio Classification

A Generic Audio Classification and Segmentation Approach for Multimedia
Indexing and Retrieval
Introduction

What’s watermark?


Paper watermarks appears nearly 700 years ago


A kind of technology for data hiding
The oldest one be found in 1292
The idea of digital image watermarking arose independently in 1990

Around 1993, coined the word “water mark”
Terminology




Steganography stands for techniques in general that allow secrete
communication
Watermarking , as opposed to steganography, has the additional notion of
robustness against attacks
Fingerprinting and labeling are terms that denote special applications of
watermarking. Ex. Copyright
Bit-stream watermarking is sometimes used for data hiding or watermarking
of compressed data
Requirement



A watermark shall convey as much information as possible
A watermark should in general be secret and should only be accessible by
authorized parties
A watermark should stay in the host data regardless of whatever happens
to the host data

A watermark should be imperceptible

Depend on media to be watermarked



Blend V.S Non-blend
Maybe required in a real time
Low complexity-time
Basic Watermarking Principle

There are three main issues in the design of a watermarking system

Design of the watermark signal W to be added to the host signal. Typically, the
watermark signal depends on a key K and watermark information I
possibly, it may also depend on the host data X into which it is embedded

Design of the embedding method itself that incorporates the watermark signal
W into the host data X yielding watermarked data Y

Design of the corresponding extraction method that recovers the watermark
information from the signal mixture using the key and with help of the original
or without the original
Embedding Technologies for Audio

Low-bit coding




By replacing the least significant bit of each sampling point by a coded
binary string
The major disadvantage of this method is its poor immunity to
manipulation
This method is useful only in close, digital-to-digital environment
Phase coding


By substituting the phase of an initial audio segment with a reference
phase that represents the data
Procedure

Break the sound sequence s[i], (0≦i≦I-1), into a series of N short segment,
sn[i] where (0≦n≦N-1)
 Apply a K-points discrete Fourier transform to n-th segment, Sn[i], where (k=1/N), and
create a matrix of the phase, ψn(Wk), and magnitude, An(Wk) for (0≦k≦K-1)
 Store the phase difference between each adjacent segment for (0≦n ≦N-1)
 A binary set of data is represented as a ψdata = π/ 2 or –π/ 2 representing 0 or 1
 Re-create phase matrixes for n > 0 by using the phase difference
 Use the modified phase matrix and the original magnitude matrix to reconstruct the
sound signal by applying the inverse DFT
• Spread spectrum coding

In the decoding stage, the following is assumed:




The pseudorandom key is maximal
The key stream for the encoding is known by the receiver. Signal
synchronization is done, and the start/stop point of the spread data are
known
The following parameters are known by the receiver: chip rate, data rate,
and carrier frequency
To keep the noise level low and inaudible, the spread code is attenuated
to roughly 0.5 percent of the dynamic range of the host sound file
• Echo data hiding
– The data are hidden by varying three parameters of the echo: initial
amplitude, decay rate, and offset
Example
Decoding:
magnitude of the autocorrelation of the encoded signal’s cepstrum:
Classification of attacks


“Simple attacks” (other possible names include “waveform attacks” and
“noise attacks”) are conceptually simple attacks that attempt to impair the
embedded watermark by manipulations of the whole watermarked data
without an attempt to identify and isolate the watermark
“Detection-disabling attacks” (other possible names include
“synchronization attacks”) are attacks that attempt to break the correlation
and to make the recovery of the watermark impossible or infeasible for a
watermark detector
Classification of attacks (continue)


“Ambiguity attacks” (other possible names include “deadlock attacks”,
“inversion attacks”, “fake-watermark attacks”, and “fake-original attacks”)
are attacks that attempt to confuse by producing fake original data or fake
watermarked data
“Removal attacks” are attacks that attempt to analyze the watermarked
data, estimate the watermark or the host data, separate the watermarked
data into host data and watermark, and discard only the watermark
Watermark algorithm


LSB

Working in time-domain and embedding the watermark in the least significant
bits

The message is embedded many times into audio signal

Parameters
 Secrete key, error correction code, embedding message, etc
Microsoft

Working in frequency domain and embedding watermark in the frequency
coefficients by using spread spectrum technique

Only one parameter: embedding message


VAWW ─ Viper Audio Water Wavelet

Working in wavelet domain and embedding the watermark in selected
coefficients

Parameter :

Secrete key

Threshold, which selects the coefficients for embedding. The default value is 40

Scale factor, which means the embedding strength. The default value is 0.2
Publimark

Open source tool

Parameter:

Embedded message

Public/private key
Reference

Multimedia Watermarking Technique
By Hartung, F.; Kutter, M.;
PROCEEDINGS OF THE IEEE, VOL. 87, NO. 7, JULY 1999

Techniques for data hiding
By W. Bender D. Gruhl N. Morimoto A. Lu ;
IBM SYSTEMS JOURNAL, VOL 35, NOS 3&4, 1996

Transparency and Complexity Benchmarking of Audio Watermarking
Algorithms Issus
By Andreas Lang, Jana Dittmann
Audio Classification

Audio Classification

Security

Multimedia Indexing and Retrieval

Other
Introduction



The proposed system :

Location

Type of sound
SNR (signal to noise ratio) :
Reflection coefficient : A reflection coefficient describes either the amplitude
or the intensity of a reflected wave relative to an incident wave.
Proposed Security monitoring instrument
Center Clipping


c(n) is the center clipped sample at time index n
s(n) is the audio sample at time index n
PR algorithm


The PR algorithm divides the audio segment into frames, estimate the
presence of the human pitch in each frame, and calculates a PR parameter.
PR = NP / NF

NP : the numbers of frames that have human pitch

NF : the total number of frames
Pitch Value

Pitch  arg (max{Rxx ( ) : Rxx ( )  0.4 RMSE})

Human Pitch = {Pitch : 70Hz < Pitch < 280Hz}
Proposed system
Non-speech Classification

Time Delay Neural Network (TDNN) is used to classify a nonspeech audio
segment into an audio Type. (e.g., door opening, fan noise…etc)

MFCC (Mel-Filtered Cepstral Coefficient) :

△ MFCC (Delta Mel-Filtered Cepstral Coefficient) :
Simulation Enviroment

Simulation Environment :
Simulation Results

OR (overlap ratio) = 0.85 .

SD ( segment duration) = 400 MS
Introduction



A Generic Audio Classification and Segmentation Approach for Multimedia
Indexing and Retrieval
Bi-model :

Bit-Stream : a series of bits

Generic mode : temporal and spectral information is extracted from the PCM
samples.
Classification :

Speech

Music

Silence

Fuzzy
Erroneous classification



Critical Errors : one pure class is misclassified into another pure class.
Semi-critical Errors : a fuzzy class type is misclassified as one of the pure
class types.
Non-critical errors : a pure class is misclassified as a fuzzy class.
Classification and Segmentation framework
Spectral Template

Pulse Code Modulation : PCM is a common method of storing and
transmitting uncompressed digital audio. PCM is also a very common format
for AIFF and WAV files.
1 : positive voltage pulse
0 : absence of pulse


Spectral template : it formed from the input audio source, and it can be
obtained from the MDCT coefficient of MP3 granules.
Power Spectrum : it obtained from the PCM samples.
About MP3



Layer3 encoding process starts by dividing the audio signal into frames,
which corresponds to one or two granules.
Each granules has 576 PCM samples.
There are three windowing modes in Mpeg layer3 encoding scheme : Long,
Short, Mixed
Bit-Stream Mode

MDCT (Modified discrete cosine transform) :
xk 
2 N 1

n 0

1 N
1
xn cos[ (n   )(k  )]
N
2 2
2
Bit-Stream Mode

MDCT (w, f)

w represents the window number

f represents the line frequency index
Frame Features

Total Frame Energy (TFE) Calculation : to detect silent frame
TFE j 
NoW NoF
2
(
SPEQ
(
w
,
f
)
)

j
w

f
Band Energy Ratio (BER) Calculation : to detect the ratio between of two
spectral regions that are separated by a single cut-off frequency.
BER j ( f c ) 
 
 
Now
f fc
w
f 0
Now
f fc
w
f  f fc
( SPEQ j ( w, f )) 2
( SPEQ j ( w, f )) 2
Frame Features


Fundamental Frequency Estimation : if the input audio is harmonic over a
fundamental frequency, the real fundamental frequency (FF) value can be
estimated from the spectral coefficient (SPEQ(w,f))
Subband Centroid Frequency Estimation : Subband Centroid (SC) is the first
moment of the spectral distribution.
( SPEQ( w, f ) * FL( f ))



  SPEQ(w, f )
NoW
f sc
w
NoF
f
NoW
NoF
w
f
Initial Classification
Segment Features

Transition Rate (TR) : transition between consecutive frames. TR has a forced
speech classification.
NoF   i
NoF
TR( S ) 


TP i
2 NoF
Fundamental Frequency Segment Feature : FF has a forced music
classification
Subband Centroid Segment Feature : SC has two forced classification
region, one for music and the other for speech content.
Step2
Generic Decision Table
Step3
Download