Psycho-acoustics The auditory system Absolute Threshold of

advertisement
Psycho-acoustics
The auditory system
• Simple objective quality measures such as mean-square error is not so useful for efficient
• The human auditory system is a very complex and highly non-linear system.
audio coding.
• It is not know exactly how it works. A lot of open questions.
• A very import part of modern audio coding methods is the psycho-acoustic model.
• The dynamic range is more than 130 dB.
• Psycho-acoustic models try to model the human perception of sound
• We are able to perceive sound between 20 Hz and 20 kHz. The upper threshold is reduced
– Absolute threshold of hearing (ATH)
with age, and many cannot hear above 16 kHz.
– Masking principles
• The ear is most sensitive between 1 and 5 kHz.
– Critical bandwidth, Bark scale
• The cochlea is often modelled as a bank of highly overlapping asymmetrical non-linear filters. (It is often said the the cochlea, or basiliar membrane in the cochlea, performs a
frequency to place transformation).
• The bandwidth of each filter is often called the critical bandwidth. It increases with frequency.
• Different critical bands corresponds to different regions in the cochlea.
SMS047 - 2006 - #12
Frank Sjöberg - Signalbehandling
1
Absolute Threshold of Hearing
SMS047 - 2006 - #12
Frank Sjöberg - Signalbehandling
2
Critical bandwidth
• We cannot hear sounds below a certain sound pressure level. This is referred to as the
• Critical bandwidth can loosely be defined as the bandwidth at which subjective responses
absolute threshold of hearing (ATH).
of the hearing system change abruptly.
2
AT H(f ) = 3.64(f /1000)−0.8 − 6.5e−0.6(f /1000−3.3) + 10−3(f /1000)4 (dB SPL)
– Perceived loudness of narrowband noise (with constant SPL) is approximately constant
100
as long as the bandwidth is smaller than the critical bandwidth.
90
– Masking effects are constant within the critical bandwidth (a critical band).
80
Sound Pressure Level, SPL (dB)
70
• Approximative model of critical bandwidth
60
50
BWc(f ) = 25 + 75(1 + 1.4(f /1000)2)0.69 (Hz)
40
30
20
10
0
−10
2
10
SMS047 - 2006 - #12
3
10
Frequency (Hz)
4
10
Frank Sjöberg - Signalbehandling
3
SMS047 - 2006 - #12
Frank Sjöberg - Signalbehandling
4
Critical bandwidth
Bark scale
• The Bark scale is a nonlinear frequency scale, related to critical bandwidth and how the
inner ear process a signal.
5000
• One critical band has the width of 1 Bark.
4500
−1
z(f ) = 13 tan
4000
f
7500
2 (Bark)
3000
25
25
20
20
15
15
2500
2000
Bark
1500
1000
Bark
Critical Bandwidth (Hz)
3500
10
10
5
5
500
0
0.2
0.4
0.6
0.8
1
1.2
Frequency (Hz)
1.4
1.6
1.8
2
4
x 10
0
SMS047 - 2006 - #12
Frank Sjöberg - Signalbehandling
5
0.2
0.4
0.6
0.8
1
1.2
Frequency (Hz)
1.4
1.6
SMS047 - 2006 - #12
1.8
2
0
4
2
10
x 10
4
10
Frank Sjöberg - Signalbehandling
Masking principles
Tone-masking-noise
• Masking is an effect of the hearing that a strong sound can mask weaker sounds, tones or
• Two tones separated with fd Hertz mask narrowband
noise.
3
10
Frequency (Hz)
6
noise.
fd
• We can differ between for different cases of masking
SMR
– Tone-masking-noise (TNM).
• The masking effect is fairly constant as long as the tones
– Noise-masking-tone (NMT).
are separated with less than the so called critical bandwidth.
– Noise-masking-noise (NMN).
Sound pressure level
• The SMR threshold is typically in the range 20-30 dB.
• When the distance between the maskers exceed the criti-
– Tone-masking-tone (TMT).
Frequency
cal bandwidth, the masking threshold goes down.
• The two first cases have been studied the most, but it is case 1 and 3 that are most relevant
for audio coding.
• The masking threshold is measured in signal-to-masker ratio (SMR).
SMS047 - 2006 - #12
Frank Sjöberg - Signalbehandling
7
SMS047 - 2006 - #12
Frank Sjöberg - Signalbehandling
8
Noise-masking-tone
Spreading function
• The masking effect is not limited to a critical band, it spreads into neighboring bands.
• A noise-masking-tone test can be made with two narrow-
• The spreading function drops around 10 dB/Bark for higher frequencies and around 25
fd
SMR
band noise maskers with a pure tone in between.
dB/Bark for lower frequencies.
SpF n(z) = 15.81 + 7.5(z + 0.474) − 17.5 1 + (z + 0.474)2
Sound pressure level
• The SMR for noise-masking-tones is around 5 dB.
• Noise is a better masker than a pure tone.
• Noise-masking-noise is not studied so much, but it looks
Spreading function
Spreading function
Frequency
as it gives similar results.
0
10
−10
0
−20
−10
−20
−40
Spreading (dB)
Spreading (dB)
−30
−50
−30
−40
−60
−50
−70
−60
−80
−70
−90
−100
−5
SMS047 - 2006 - #12
Frank Sjöberg - Signalbehandling
9
Temporal masking
−3
70
1
2
3
4
SMS047 - 2006 - #12
5
−80
0.2
0.4
0.6
0.8
1
1.2
Frequency (Hz)
1.4
1.6
1.8
2
4
x 10
Frank Sjöberg - Signalbehandling
10
0≤k ≤N −1
where w[n] is a window function, and each xi [n] overlaps N samples with the previous block
Simultaneuous masking
xi[n] = x[iN + n].
50
40
• The inverse MDCT transform is defined as
N −1
1
N +1
4 π
k+
n+
,
yi[n] = w[n]
Xi[k] cos
N k=0
N
2
2
30
post-masking
pre
20
10
SMS047 - 2006 - #12
0
Bark
• The normalized MDCT of block number i is defined as
2N −1
1
N +1
4 π
k+
n+
,
Xi [k] =
w[n]xi[n] cos
N n=0
N
2
2
• Post-masking can occur up to 200 ms after a strong stimuli.
0
−1
• The modified DCT (MDCT) is a critically sampled lapped transform based on DCT type IV.
N −1
1
1
2 π
k+
n+
XDCT −IV [k] =
x[n] cos
N n=0
N
2
2
• Pre-masking occur about 5 ms before the start of a strong stimuli.
0
−2
Modified DCT – MDCT
• The masking effect spreads in time also.
60
−4
0 ≤ n ≤ 2N − 1
We also have to add the overlapping parts to cancel the aliasing
100
200
300
400
500
Frank Sjöberg - Signalbehandling
x̂i[n] = yi[n] + yi−1[n + N ]
600
11
SMS047 - 2006 - #12
Frank Sjöberg - Signalbehandling
12
MDCT-windows
• The window function must satisfy the following condition
w2[n] + w2[n + N ] = 2
• One example is the sine-window used in MPEG-2 layer 3
1
π
n+
w[n] = sin
2N
2
• Another window is the Kaiser-Bessel derived (KBD) window, used in MPEG-4 AAC
n
v[k]
w[n] = k=0
for 0 ≤ n ≤ N,
N
k=0 v[k]
√
2
where v[k] is the Kasier window v[k] =
I0 πα
1−(2k/n−1)
I0 (πα)
, and I0[·] is the zeroth order mod-
ified Bessel function of the first kind, and α is a constant that determines the shape of the
window. α is typically chosen between 3 and 6.
SMS047 - 2006 - #12
Frank Sjöberg - Signalbehandling
13
Download