Det indre øret MPEG audio Psykoakustikk som grunnlag for lydkompresjon

advertisement
UNIVERSITETET
I OSLO
MPEG audio
Psykoakustikk som grunnlag for
lydkompresjon
Torbjørn Ekman og Nils Christophersen
• Hvordan oppfatter vi lyd
• Digital representasjon av lyd
• Lydkompresjon
110
– Rett frem og dumt
– Lurt med psykoakustikk
011
010
001
00
0
111
110
101
INSTITUTT FOR INFORMATIKK
UNIVERSITETET
I OSLO
INSTITUTT FOR INFORMATIKK
Tid
INF MKT – 11/10-2001 - 1
Det indre øret
INF MKT – 11/10-2001 - 2
1
UNIVERSITETET
I OSLO
Ørets båndpassfilter –
”frekvens til sted”
INSTITUTT FOR INFORMATIKK
INF MKT – 11/10-2001 - 3
UNIVERSITETET
I OSLO
Terskel for hørbar lyd
Referansen
20 µPa = 2 10-5 N/m 2
INSTITUTT FOR INFORMATIKK
INF MKT – 11/10-2001 - 4
2
UNIVERSITETET
I OSLO
Digital representasjon av lyd
Kvantiserings nivåer
110
1
011
0.8
010
0.6
001
0.4
Tid
000
0.2
111
0
110
16 bit
4 bit
-0.2
101
Diskretisert
variabel
-0.4
-0.6
Med 16 bit får vi
-0.8
2*2*2*…2 = 216 = 65 536 nivåer
-1
-1
-0.8
-0.6
Dette betyr 16*44100 = 705 600 bits/sek
INSTITUTT FOR INFORMATIKK
UNIVERSITETET
I OSLO
-0.4
-0.2
0
0.2
Kontinuerlig variabel
0.4
0.6
0.8
1
INF MKT – 11/10-2001 - 5
Beethovens 9. symfoni
Bethovens 9., samplet med 44.1 kHz
0.15
0.1
16 bit kvantisering
4 bit kvantisering
0.05
0
-0.05
16 bit
kvantisering
-0.1
Amplitude
-0.15
-0.2
-0.25
-0.3
0
4 bit kvantisering (dum og dårlig)
20
INSTITUTT FOR INFORMATIKK
40
60
Tid
80
100
120
INF MKT – 11/10-2001 - 6
3
UNIVERSITETET
I OSLO
What is this Psychoacoustics
that is used in the Encoder ?
INSTITUTT FOR INFORMATIKK
UNIVERSITETET
I OSLO
INF MKT – 11/10-2001 - 7
Masking
We do not hear all sounds.
• Absolute threshold of hearing.
• Masking: One sound is inaudible in the
presence of another sound.
1. Simultaneous masking
–
–
–
Noise Masking Tone
Tone Masking Noise
Noise Masking Noise
2. Nonsimultaneous masking
•
•
Pre masking (2 ms)
Post masking (100 ms)
INSTITUTT FOR INFORMATIKK
INF MKT – 11/10-2001 - 8
4
UNIVERSITETET
I OSLO
Noise Masking Tone
Filtered Noise
Tone 1, 820 Hz
Tone 2, 410 Hz
Noise
Noise
Center 410 Hz
Width 111 Hz
5 dB below noise
5 dB below noise
+
Tone 1
+
Tone 2
Not masked
Masked
You can not hear a sinusoid that lies in
the same critical band as a filtered noise if the
soundpreasure level is below a certain threshold.
This effect also stretches out beyond the critical
band.
INSTITUTT FOR INFORMATIKK
UNIVERSITETET
I OSLO
Filtered Noise
INF MKT – 11/10-2001 - 9
Tone Masking Noise
Tone 1, 2 kHz
Center 1 kHz
Width 162 Hz
15 dB below
Tone 2, 1 kHz
Noise
Noise
+
Tone 1
+
Tone 2
Not masked
Masked
You can not hear a filtered noise that lies in
the same critical band as a sinusoid if the
soundpreasure level is below a certain threshold.
This effect also stretches out beyond the critical
band.
INSTITUTT FOR INFORMATIKK
INF MKT – 11/10-2001 - 10
5
UNIVERSITETET
I OSLO
Exploit Masking
• If a sound is masked we can’t hear it.
• Make a frequency analyze of the signal and find
the masking threshold.
• Put the quantization noise under the masking
threshold and we don’t hear the quantization.
INSTITUTT FOR INFORMATIKK
UNIVERSITETET
I OSLO
INF MKT – 11/10-2001 - 11
Pre echo distortion
• The original sound of a
castanet.
• The abruptness in time
domain result in that all
frequencies are involved.
INSTITUTT FOR INFORMATIKK
• The data is split in to windows
of finite length.
• The quantization noise is spread
over a whole window.
• This makes the castanets sound
less distinct.
• Audible effects can be avoided
with shorter windows, exploiting
premasking.
INF MKT – 11/10-2001 - 12
6
UNIVERSITETET
I OSLO
Scale factors and Quantization
• When the dynamics change over time, only a
small subset of the quantization steps are used in
regions with low magnitudes.
• Use scale factors instead:
– Take a window of data.
– Find the max magnitude in this window.
• Use the next larger scale factor from a table.
– Normalize with the scale factor.
– Quantize.
• Now the whole dynamic range of the quantizer is used.
– Send scale factor and quantized samples.
INSTITUTT FOR INFORMATIKK
INF MKT – 11/10-2001 - 13
UNIVERSITETET
I OSLO
Bit Allocation and Masking
• The masking threshold in each
subband gives the Just Noticeable
Distortion (JND) limit for that band.
• Bits are assigned subbands so that the
quantization noise falls below or as
little over the JND as possible.
INSTITUTT FOR INFORMATIKK
INF MKT – 11/10-2001 - 14
7
UNIVERSITETET
I OSLO
INSTITUTT FOR INFORMATIKK
UNIVERSITETET
I OSLO
INSTITUTT FOR INFORMATIKK
Castanets and Guitar
INF MKT – 11/10-2001 - 15
Bit allocation with 2 bits per sample
INF MKT – 11/10-2001 - 16
8
UNIVERSITETET
I OSLO
Bit allocation with 4 bits per sample
INSTITUTT FOR INFORMATIKK
UNIVERSITETET
I OSLO
INF MKT – 11/10-2001 - 17
Signal to Quantization Noise Ratio
and the Just Noticeable Distortion
Frame at t=0.6 s
INSTITUTT FOR INFORMATIKK
Frame at t=1.1 s
INF MKT – 11/10-2001 - 18
9
UNIVERSITETET
I OSLO
Examples on compression
Compression
2
4
8
MP1
4 bit
2 bit
MP1 error (SQR)
22 dB
11 dB
Direct Quantization
8 bit
4bit
2 bit
Direct Quantization Error
(SQR)
Downsampling to 22 kHz
bandwidth and quantization
31 dB
7.8 dB
1.1 dB
16 bit
8 bit
4 bit
INSTITUTT FOR INFORMATIKK
INF MKT – 11/10-2001 - 19
10
Related documents
Download