Sound Processing

advertisement
Sound Processing
CSC361/661
Digital Media
Spring 2002
How Sound Is Produced




Air vibration
Molecules in air are disturbed, one bumping
against another
An area of high pressure moves through the air
in a wave
Thus a wave representing the changing air
pressure can be used to represent sound
How Sound Perceived







The cochlea, an organ in our inner ears, detects sound.
The cochlea is joined to the eardrum by three tiny bones.
It consists of a spiral of tissue filled with liquid and thousands of
tiny hairs.
The hairs get smaller as you move down into the cochlea.
Each hair is connected to a nerve which feeds into the auditory
nerve bundle going to the brain.
The longer hairs resonate with lower frequency sounds, and the
shorter hairs with higher frequencies.
Thus the cochlea serves to transform the air pressure signal
experienced by the ear drum into frequency information which can
be interpreted by the brain as sound.
Pulse Code Modulation




PCM is the most common type of digital audio
recording.
A microphone converts a varying air pressure
(sound waves) into a varying voltage.
Then an analog-to-digital converter samples
the voltage at regular intervals.
Each sampled voltage gets converted into an
integer of a fixed number of bits.
Digitization of Sound

Sampling
–
–
–
–

Most humans can’t hear anything over 20 kHz.
The sampling rate must be more than twice the highest
frequency component of the sound (Nyquist Theorem).
CD quality is sampled at 44.1 kHz.
Frequencies over 22.01 kHz are filtered out before sampling is
done.
Quantization
–
–
Telephone quality sound uses 8 bit samples.
CD quality sound uses 16 bit samples (65,536 quantization
levels) on two channels for stereo.
Encoder
Design
A – B. Apply bandlimiting
filter to remove high
frequency components.
C. Sample at regular time
intervals.
D. Quantize each sample.
Sampling Error (Undersampling)



If you undersample,
one frequency will
alias as another.
For CD quality,
frequencies above
22.05 kHz are filtered
out, and then the
sound is sampled at
44.1 kHz.
This is depicted on the
next slide.
Figure from Multimedia Communications
by Fred Halsall, Addison-Wesley, 2001.
Quantization Interval

If Vmax is the maximum positive and negative signal
amplitude and n is the number of binary bits used, then
the magnitude of the quantization interval, q, is defined
as follows:
2Vmax
q n
2

For example, what if we have 8 bits and the values
range from –1000 to +1000?
Quantization Error (Noise)



Any values within a quantization interval will be
represented by the same binary value.
Each code word corresponds to a nominal
amplitude value that is at the center of the
corresponding quantization interval.
The actual signal may differ from the code
word by up to plus or minus q/2, where q is the
size of the quantization interval.
Quantization
Intervals and
Resulting
Error
Results of Insufficient Quantization
Levels



Insufficient quantization levels result from not
using enough bits to represent each sample.
Insufficient quantization levels force you to
represent more than one sound with the same
value. This introduces quantization noise.
Dithering can improve the quality of a digital file
with a small sample size (relatively few
quantization levels).
Linear Vs. Non-Linear Quantization



In linear quantization, each code word represents a
quantization interval of equal length.
In non-linear quantization, you use more digits to
represent samples at some levels, and less for
samples at other levels.
For sound, it is more important to have a finer-grained
representation (i.e., more bits) for low amplitude
signals than for high because low amplitude signals
are more sensitive to noise. Thus, non-linear
quantization is used.
Sound Editing

See Tutorial for
–
–


Choosing sampling rate and bit depth
Recording sound
See Studio Plugin Overview for information
about multi-track recording
See Noise Reduction Overview for information
about noise reduction
Fourier Analysis
Fourier Transform


It is possible to take any periodic function of time x(t)
and resolve it into an equivalent infinite summation of
sine waves and cosine waves with frequencies that
start at 0 and increase in integer multiples of a base
frequency = 1/T, where T is the period of x(t).
Mathematically, we can say the same thing with this
equation: 
x(t )  a0   ak cos( 2kf0t )  bk sin( 2kf0t ))
k 1

This equation does NOT tell how to compute the
Fourier transform, that is, how we get the coefficients
a1…a and b1…b.
Discrete Fourier Transform




We can’t do an infinite summation on a computer.
For digitally sampled input we can do the summation
using the same number of frequency samples as there
are time input samples.
We can pretend that x(t) is periodic and that the period
is the same length as the recording (or sound
segment).
The base frequency will be 1/length of recording (or
sound segment).
Difference Between Discrete Fourier
Transform and Discrete Cosine Transform


The discrete cosine transform uses real
numbers. This is all you need for image
representation.
The Fourier Transform uses complex numbers,
which have a real and an imaginary part.
Recall the definition of the
Discrete Cosine Transform


For an N X N pixel image puv ,0  u  N ,0  v  N
the DCT is an array of coefficients DCTuv ,0  u  N ,0  v  N 
where
1
N 1
N 1
 (2 x  1)u 
 (2 y  1)v 
DCTuv 
Cu Cv  x 0  y 0 p xy cos 
cos 


2N
2N
2N



where
This tells how to compute the
1
Cu C v 
for u , v  0
Discrete Cosine Transform.
2
Cu Cv  1 otherwise
Versions of the Fourier Transform




Fourier Transform -- infinite summation
Discrete Fourier Transformation -- a sum of n waves
derived from n samples; O(n2) complexity
Fast Fourier Transform -- a fast version of the Fourier
transform, O(n* log2n) complexity; a disadvantage is
that it requires a windowing function
See http://www.dataq.com/applicat/articles/an11.htm,
http://www.dataq.com/applicat/articles/an11.htm, and
http://www.chipcenter.com/eexpert/bmasta/bmasta001.
html
Windowing Functions


Minimizes the effect of phase discontinuities at
the borders of segments.
Hanning, Hamming, Blackman, and BlackmanHarris are often used.
Fourier Analysis in CoolEdit




Can be used to filter certain frequencies.
The window size and function are adjustable
Go to Transform/Filters/FFT to filter
frequencies.
Go to Analyze/Frequency Analysis to see an
analysis of the frequency.
Download