05.basicAudioFeature

advertisement
Basic Features of Audio Signals
(音訊的基本特徵)
Jyh-Shing Roger Jang (張智星)
http://www.cs.nthu.edu.tw/~jang
MIR Lab, CS Dept, Tsing Hua Univ.
Hsinchu, Taiwan
Audio Features
Four commonly used audio features
Volume
Pitch
Zero crossing rate
Timber
Our goal
These features can be perceived subjectively.
But we need to compute them quantitatively for
further processing and recognition.
Audio Features in Time Domain
Audio features presented in the time domain
Fundamental period
Intensity
Timbre: Waveform within an FP
Audio Features in Frequency Domain
Volume: Magnitude of spectrum
Pitch: Distance between harmonics
Timber: Smoothed spectrum
First formant
F1
Intensity
Pitch freq
Second formant
F2
Demo: Real-time Spectrogram
Try “dspstfft_audio” under MATLAB:
Spectrum:
Spectrogram:
Steps for Audio Feature Extraction
Frame blocking
Frame duration of 20 ms or so
Feature extraction
Volume, zero-crossing rate, pitch, MFCC, etc
Endpoint detection
Usually based on volume & zero-crossing rate
Frame Blocking
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
0
500
1000
1500
Overlap
2000
2500
0.3
0.2
0.1
0
Sample rate = 11025 Hz
Zoom in
Frame
Frame size = 256 samples
Overlap = 84 samples
(Hop size = 256-84)
Frame rate = 11025/(256-84)=64 frames/sec
-0.1
-0.2
-0.3
-0.4
0
50
100
150
200
250
300
Intensity (I)
Intensity
Visual cue: Amplitude of vibration
Computation:
n
Volume: vol 
s
i 1
i
 n 2
Log energy (in decibel): energy  10*log10   si 
 i 1 
Characteristics
Influenced by
microphone types
Microphone setups
Perceived volume is influenced by frequency and timbre
Intensity (II)
To avoid DC drifting
DC drifting: The vibration is not around zero
Computation:
n
Volume:
vol   si  median  s 
i 1
2
 n
energy

10*log
s

mean
s
  
10    i
Log energy (in decibel):
 i 1

Theoretical background (How to prove?)
n
s   s1 , s2 ,..., sn   arg min  si  x  median  s 
x
i 1
n
s   s1 , s2 ,..., sn   arg min   si  x   mean  s 
x
i 1
2
Intensity (III)
Examples
Please refer to the online tutorial
Pitch
Definition
Pitch is known as fundamental frequency, which
is equal to the no. of fundamental period within a
second. The unit used here is Hertz (Hz).
More commonly, pitch is in terms of semitone,
which can be converted from pitch in Hertz:
 Hz 
semitone  69  12*log 2 

440


Pitch Computation (I)
Pitch of tuning forks
ff  16000/ 187 7  / 5  439.56 Hz
 ff 
pitch  69  12* log2 
  68.98 sem itone
440


Pitch Computation (II)
Pitch of speech
ff  16000/ 477 75 / 3  119.403 Hz
 ff 
pitch  69  12* log2 
  46.42 sem itone
440


Statistics of Mandarin Chinese
 5401 characters, each character is at least associated with a
base syllable and a tone
 411 base syllables, and most syllables have 4 ones, so we
have 1501 tonal syllables
 Tone is characterized by the pitch curves:
Tone 1: high-high
Tone 2: low-high
Tone 3: high-low-high
Tone 4: high-low
 Some examples of tones:
1242:清華大學
1234:三民主義、優柔寡斷、搭達打大、依宜以易、夫福府負
?????:美麗大教堂、滷蛋有夠鹹(Taiwanese)
Sinusoidal Signals
How to generate a stream of sinusoidal signals
fs=16000;
duration=3;
f=440;
t=(1:fs*duration)/fs;
y=0.8*sin(2*pi*f*t);
plot(t,y); axis([0.6, 0.65, -1 1]);
sound(y, fs);
Zero Crossing Rate
Zero crossing rate (ZCR)
The number of zero crossing in a frame.
Characteristics:
 Noise and unvoiced sound have high ZCR.
 ZCR is commonly used in endpoint detection,
especially in detection the start and end of
unvoiced sounds.
To distinguish noise/silence from unvoiced sound,
usually we add a bias before computing ZCR.
ZCR Computations
Two types of ZCR definition
If a sample with zero value is considered a case of
ZCR, then the value of ZCR is higher. Otherwise
its lower.
It affects the ZCR, especially when the sample
rate is low.
Other consideration
Zero-justification is required.
ZCR with shift can be used to distinguish between
unvoiced sounds and silence. (How to determine
the shift amount?)
ZCR
Examples
Please refer to the online tutorial.
Download