fifth

advertisement
Media Processing – Audio Part
Dr Wenwu Wang
Centre for Vision Speech and Signal Processing
Department of Electronic Engineering
w.wang@surrey.ac.uk
http://personal.ee.surrey.ac.uk/Personal/W.Wang/teaching.html
1
Approximate outline
 Week 6: Fundamentals of audio
 Week 7: Audio acquiring, recording, and standards
 Week 8: Audio processing, coding, and standards
 Week 9: Audio perception and audio quality assessment
 Week 10: Audio production and reproduction
2
Audio reproduction
Concepts and topics to be covered:
 Loudspeakers
 Moving-coil, electrostatic, ribbon, and horn loudspeakers
 Amplifiers
 Stereo reproduction
 Loudspeakers
 3D production
 HRTF, reverberation
3
Loudspeakers
 A loudspeaker is a transducer which converts electrical energy (or an
electric signal) into acoustical energy (or an acoustical signal). This is
usually achieved by a diaphragm which vibrates to produce sound waves
when excited by the electrical signal.
 Moving-coil loudspeakers
 Electrostatic loudspeakers
 Ribbon loudspeakers
 Horn loudspeakers
4
Moving-coil loudspeakers
5
Source: Francis Rumsey and Tim McCormick (1994)
Principle of moving-coil
loudspeakers
 The moving-coil loudspeaker consists of diaphragm (or cone), coil, magnet, chassis,
etc. The device is also called “driver”, as it forms the core unit of speaker that drives
air to produce sound.
 The coil which seats in the magnet gap is wounded around a cylindrical former which
is attached to the diaphragm (or cone).
 The diaphragm is held in its rest position by a suspension system which consists of a
compliant cloth material and a complaint surround around the edge of the cone
made by e.g. rubber. This allows the required amount of movement of the cones.
The diaphragm can be made of almost any material, common choices being paper
pulp (light weight and good efficiency), plastics (greater consistency, lower coloration
of sound, however lower efficiency), or metal foil.
 The chassis is made either of pressed steel or a casting (particularly desirable for
large heavy magnets as this will reduce the potential distortions due to rough
handling of the speaker).
 The moving-coil loudspeaker is based on the principle of electromagnetic
transduction.
 It is the exact reverse of the process involved in the moving-coil microphone.
6
Electrostatic loudspeaker
 The drive unit of the electrostatic
loudspeaker consists of a large flat
diaphragm (side view shown in the
figure) of light weight, placed between
two rigid plates.
 The diaphragm has a very high
resistance. The polarising voltage
charges the capacitor formed by the
narrow gap between diaphragm and
the plates.
 The input signal modulates (through
the transformer) the electrostatic field
between the two rigid plates. The
change of the input signal alters the
force on the diaphragm, thus causing
the vibration of the diaphragm to
produce sounds.
7
Source: Francis Rumsey and Tim McCormick (1994)
Ribbon loudspeaker
 A light corrugated aluminium ribbon,
clamped at each end, is placed
between two magnetic poles.
 The input signal is applied (via
transformer) to each end of the
ribbon. The alternating nature of the
signal causes an alternating
magnetic field, and thus the ribbon
to vibrate which produces sounds.
 A horn is usually placed in front of
the ribbon to give a higher output for
a given electrical input.
8
Source: Francis Rumsey and Tim McCormick (1994)
Horn loudspeaker
 A horn is an acoustic transformer which helps to improve acoustic
efficiency by matching the air impedance at the throat of the horn with that
at the mouth.
 For a given electrical input, it can increase the acoustical output of a driver
by 10 dB or more compared with that driver mounted in a conventional
cabinet.
 A very large horn is needed to reproduce low frequencies, and relatively
small horns for high frequencies, and larger ones for upper mid
frequencies. As such, it is frequently employed at mid and high frequencies
as a large size is sometimes limited in practice.
Source: Francis Rumsey and Tim McCormick (1994)
9
Loudspeaker performance
 Impedance

Most loudspeakers are labelled “Impedance = 8 ohms”. However, the
impedance of a loudspeaker in practice varies widely with frequency.
The impedance plot of a typical domestic hi-fi speaker is shown below.
Source: Francis Rumsey and Tim McCormick (1994)
10
Loudspeaker performance
(cont.)
 Sensitivity

Measures how efficiently it converts electrical sound energy into
acoustical sound energy.

A typical high-quality domestic speaker system has an efficiency of
less than 1%. The rest of the power is dissipated as heat in the voice
coils of the drivers.

It has been suggested that sensitivity is not an indication of quality,
and it is often found that lower sensitivity models tend to produce a
better sound.
 Distortion

At low frequencies (mainly second harmonic), the distortion is
commonly at around 10%.

At mid and high frequencies distortion is generally below 1%.
11
Loudspeaker performance (cont)
 Frequency response

Ideally, a speaker should respond equally well to all frequencies (a
smooth flat response to the input signal)

In practice, only the largest speakers produce a significant output
down to about 20 Hz, even the smallest speakers can respond to 20
kHz.

High-quality systems can achieve a response that is within 6 dB of the
1 kHz level from 80 Hz to 20 kHz (see figure (a) in next page).

Low-quality systems tend to have a considerably more ragged
response and an earlier bass roll-off (see figure (b) in next page).
12
Loudspeaker performance (cont)
Source: Francis Rumsey and Tim McCormick (1994)
13
Loudspeaker positioning
 Positioning of loudspeakers has a significant effect on their performance.
 In smaller spaces, the speakers are likely to be positioned close to the
walls. The lower frequencies are reinforced, as the speaker is virtually
omnidirectional at these frequencies and the rear and side-radiated sound
is reflected from the walls and add more bass power.
 For higher frequencies, the wavelength starts to become comparable with
the distance between the speaker and a nearby wall. As a result, the
reflected sound is out of phase with the original sound from the speaker,
leading to some cancellation of the sound.
 Even a loudspeaker has a flat frequency response, it could still sound far
from neutral in a real listening environment due to its various performance
at different frequencies. Therefore, it is important to place appropriately the
loudspeaker in the room, usually better to be at head height and also away
from room boundaries.
14
Loudspeaker positioning
 For stereo reproduction, the optimum listening position is normally
considered to be the rear of the apex of an equilateral triangle, formed by
loudspeakers and the listener, as shown below.
Source: Francis Rumsey and Tim McCormick (1994)
15
Power amplifiers
 Power amplifiers are used to do a simple job, i.e. providing voltage
amplification with output currents in the ampere range to develop the
necessary power across the loudspeaker terminals.
 Despite the simple job, there are many designs of power amplifiers
available on the market.

Domestic power amplifier is unlikely to be operated at high output
levels for a long time. Therefore, it is usually design to deliver high
currents for short periods to take care of short loud passages.

Professional power amplifier is usually designed to offer long-term
overload protection and complete stability into any type of speaker
load. It is often required to drive a number of speakers in parallel on
the other end of a long cable (say, 30 meters). This demands large
power supplies and heavy transformers, with plenty of heat sink area
to keep it from overheating.
16
Stereo recording and
reproduction
 The principles and practice of stereo recording and reproduction need to be
studied in both theoretical and subjective point of view. The reproduced
sound may be judged subjectively to be good, but not necessarily
theoretically “correct”, and vice versa.
 Stereo reproduction is closely linked with directional perception (space
perception discussed earlier) mechanism in human hearing, as the aim of
stereo reproduction is to create the illusion of directionality and space in
reproduced sound.
17
Precedence effect
 The precedence effect is also known Haas effect, which explains the
effects of echoes on the perception of direction of a source.

If two sources were emitting similar sounds the perceived direction
tended towards the advanced (in time) source. If the delays are within
about 50 ms, the two sources are fused together by the brain,
appearing as one source with the perceived direction towards that of
the first arrival.

If the delays are over 50 ms, the brain begins to perceive the sounds
as distinct, and the second appears as an echo of the first.

For the delayed source to appear equally as loud as the undelayed
source it must be a certain number of decibels higher in amplitude to
compensate for the precedence advantage of the first arrival.
18
Precedence effect
 The precedence effect describes how a delayed secondary source must be
louder than the primary source if it is to be heard as equally loud.
19
Implications for stereo sound
reproduction
 Stereo sound reproduction aims to give the impression of directionality and
space in sound emitted from two or more loudspeakers or over
headphones.
 This can be achieved using a combination of ITD and ILD between the two
channels, as shown in the figure below.
 The precedence effect also helps to achieve the stereo effect. For example,
if the right loudspeaker signal is delayed slightly in relation to the left
loudspeaker then the sound will appear to come from somewhere towards
the left, depending on the amount of delay.
20
Implications for stereo sound
reproduction (cont.)
21
Two channel stereo from
loudspeakers (cont)
 The loudspeaker
reproduction of stereo
signal is based on either
ITD, ILD or the
combination of both.
 The theory by Blumlein
(1931) suggests that the
level differences between
the two loudspeakers are
converted into low
frequency phase
differences between the
ears, based on the
summation of the
loudspeaker signals at the
two ears (see figure on the
right).
Source: Francis Rumsey and Tim McCormick (1994)
22
Two channel stereo from
loudspeakers (cont)
 If the outputs of the two speakers differ only in level and not in phase
(time), then it can be shown that the vector summation of the signals from
the two speakers at each ear results in two signals which differ in phase
angle proportional to the relative amplitudes (i.e. level difference) of the
two loudspeaker signals. This is true at least for low frequencies up to 700
Hz).
 For a given level difference, the phase angle changes approximately
linearly with frequency.
 At higher frequencies, the phase difference cue will not be useful, but the
shadowing effect of the head will cause level differences between the
ears.
23
Two channel stereo from
loudspeakers (cont)
 A level difference of
approximately 18 dB between
channels is necessary to give
the impression that a sound
comes from either fully left or
fully right in the image.
 There are disagreements
between listeners as to the
positions of half left or half
right.
 If a time difference also exist
between the channels, the
sounds will be pulled towards
the advanced speaker due to
the precedence effect.
24
Source: Francis Rumsey and Tim McCormick (1994)
Two channel stereo from
loudspeakers (cont)
 A time difference of between 2 and 4 ms appears to be required for
a sound to appear either fully left or fully right.
 There is a trade-off between time and level difference. For example,
if the left channel is, say, 2 ms earlier than the right, then the right
must be made approximately 5 dB louder to bring the signal back
into the centre.
25
3D sound production
 3D audio is referred to as the method for either widening stereo
audio produced by two loudspeakers or stereo headphones, or
creating the illusion of sound effects in three dimensional space
surrounding the listener.

HRTF

Reverberation
26
3D sound production:
Head Related Transfer Function
 Humans localise sound sources based on the monaural cues (derived from
one ear, i.e. single channel sound), and/or the binaural cues (obtained from
both ears). The binaural cues include the interaural time difference (ITD)
and interaural intensity difference (IID). The monaural cues come from the
interaction between the sound source and the human auditory system,
where the temporal/spectral shape of the original sound source is modified
by the auditory before it reaches the human brain.
 The information of source location is encoded by these modifications, and
may be captured by an impulse response from the source location to the
ear location. This impulse response is termed the head-related impulse
response (HRIR). Therefore, taking the convolution of a sound source with
the HRIR essentially converts the sound to that which would have been
heard by the listener if it had been played at the source location, with the
listener's ear at the receiver location.
 HRIRs are therefore used widely for the production of virtual surround
sound. The HRTF is the Fourier transform of HRIR.
27
HRTF
 HRTF coordinates:
Cheng & Walkfield (2001)
28
 Frequency domain
comparison of
theoretical HRTFs as a
function of azimuth in
the horizontal plane
(elevation= 0 degree)
Cheng & Walkfield (2001)
29
 Time domain
comparison of measured
and theoretical left ear
HRTFs as a function of
azimuth in the horizontal
plane (elevation= 0
degree).
Cheng & Walkfield (2001)
30
HRTF
 Block diagram of a simple HRTF-based spatial sound synthesis system.
The delays and FIRs can be computed in real time to synthesize the
moving sources.
Cheng & Walkfield (2001)
31
Reverberation
 Reverberation is created when a sound is produced in an enclosed
environment causing a large number of echoes to build up and then
decay slowly as the sound is gradually absorbed by the walls,
ceilings, floors and air. Reverberation of an audio signal can be
produced by filtering (or taking the convolution of) the signal with an
impulse response function.
speaker
listener
32
References
 Francis Rumsey and Tim McCormick, Sound and Recording: an
Introduction, 1994.
 David Howard and James Angus, Acoustics and Psychoacoustics,
1996.
 Corey Cheng and Gregory Wakefield, Introductions to HRTF:
Representations of HRTFs in Time, Frequency and Space, 2001.
33
Download