Media Processing – Audio Part Dr Wenwu Wang Centre for Vision Speech and Signal Processing Department of Electronic Engineering w.wang@surrey.ac.uk http://personal.ee.surrey.ac.uk/Personal/W.Wang/teaching.html 1 Approximate outline Week 6: Fundamentals of audio Week 7: Audio acquiring, recording, and standards Week 8: Audio processing, coding, and standards Week 9: Audio perception and audio quality assessment Week 10: Audio production and reproduction 2 Audio reproduction Concepts and topics to be covered: Loudspeakers Moving-coil, electrostatic, ribbon, and horn loudspeakers Amplifiers Stereo reproduction Loudspeakers 3D production HRTF, reverberation 3 Loudspeakers A loudspeaker is a transducer which converts electrical energy (or an electric signal) into acoustical energy (or an acoustical signal). This is usually achieved by a diaphragm which vibrates to produce sound waves when excited by the electrical signal. Moving-coil loudspeakers Electrostatic loudspeakers Ribbon loudspeakers Horn loudspeakers 4 Moving-coil loudspeakers 5 Source: Francis Rumsey and Tim McCormick (1994) Principle of moving-coil loudspeakers The moving-coil loudspeaker consists of diaphragm (or cone), coil, magnet, chassis, etc. The device is also called “driver”, as it forms the core unit of speaker that drives air to produce sound. The coil which seats in the magnet gap is wounded around a cylindrical former which is attached to the diaphragm (or cone). The diaphragm is held in its rest position by a suspension system which consists of a compliant cloth material and a complaint surround around the edge of the cone made by e.g. rubber. This allows the required amount of movement of the cones. The diaphragm can be made of almost any material, common choices being paper pulp (light weight and good efficiency), plastics (greater consistency, lower coloration of sound, however lower efficiency), or metal foil. The chassis is made either of pressed steel or a casting (particularly desirable for large heavy magnets as this will reduce the potential distortions due to rough handling of the speaker). The moving-coil loudspeaker is based on the principle of electromagnetic transduction. It is the exact reverse of the process involved in the moving-coil microphone. 6 Electrostatic loudspeaker The drive unit of the electrostatic loudspeaker consists of a large flat diaphragm (side view shown in the figure) of light weight, placed between two rigid plates. The diaphragm has a very high resistance. The polarising voltage charges the capacitor formed by the narrow gap between diaphragm and the plates. The input signal modulates (through the transformer) the electrostatic field between the two rigid plates. The change of the input signal alters the force on the diaphragm, thus causing the vibration of the diaphragm to produce sounds. 7 Source: Francis Rumsey and Tim McCormick (1994) Ribbon loudspeaker A light corrugated aluminium ribbon, clamped at each end, is placed between two magnetic poles. The input signal is applied (via transformer) to each end of the ribbon. The alternating nature of the signal causes an alternating magnetic field, and thus the ribbon to vibrate which produces sounds. A horn is usually placed in front of the ribbon to give a higher output for a given electrical input. 8 Source: Francis Rumsey and Tim McCormick (1994) Horn loudspeaker A horn is an acoustic transformer which helps to improve acoustic efficiency by matching the air impedance at the throat of the horn with that at the mouth. For a given electrical input, it can increase the acoustical output of a driver by 10 dB or more compared with that driver mounted in a conventional cabinet. A very large horn is needed to reproduce low frequencies, and relatively small horns for high frequencies, and larger ones for upper mid frequencies. As such, it is frequently employed at mid and high frequencies as a large size is sometimes limited in practice. Source: Francis Rumsey and Tim McCormick (1994) 9 Loudspeaker performance Impedance Most loudspeakers are labelled “Impedance = 8 ohms”. However, the impedance of a loudspeaker in practice varies widely with frequency. The impedance plot of a typical domestic hi-fi speaker is shown below. Source: Francis Rumsey and Tim McCormick (1994) 10 Loudspeaker performance (cont.) Sensitivity Measures how efficiently it converts electrical sound energy into acoustical sound energy. A typical high-quality domestic speaker system has an efficiency of less than 1%. The rest of the power is dissipated as heat in the voice coils of the drivers. It has been suggested that sensitivity is not an indication of quality, and it is often found that lower sensitivity models tend to produce a better sound. Distortion At low frequencies (mainly second harmonic), the distortion is commonly at around 10%. At mid and high frequencies distortion is generally below 1%. 11 Loudspeaker performance (cont) Frequency response Ideally, a speaker should respond equally well to all frequencies (a smooth flat response to the input signal) In practice, only the largest speakers produce a significant output down to about 20 Hz, even the smallest speakers can respond to 20 kHz. High-quality systems can achieve a response that is within 6 dB of the 1 kHz level from 80 Hz to 20 kHz (see figure (a) in next page). Low-quality systems tend to have a considerably more ragged response and an earlier bass roll-off (see figure (b) in next page). 12 Loudspeaker performance (cont) Source: Francis Rumsey and Tim McCormick (1994) 13 Loudspeaker positioning Positioning of loudspeakers has a significant effect on their performance. In smaller spaces, the speakers are likely to be positioned close to the walls. The lower frequencies are reinforced, as the speaker is virtually omnidirectional at these frequencies and the rear and side-radiated sound is reflected from the walls and add more bass power. For higher frequencies, the wavelength starts to become comparable with the distance between the speaker and a nearby wall. As a result, the reflected sound is out of phase with the original sound from the speaker, leading to some cancellation of the sound. Even a loudspeaker has a flat frequency response, it could still sound far from neutral in a real listening environment due to its various performance at different frequencies. Therefore, it is important to place appropriately the loudspeaker in the room, usually better to be at head height and also away from room boundaries. 14 Loudspeaker positioning For stereo reproduction, the optimum listening position is normally considered to be the rear of the apex of an equilateral triangle, formed by loudspeakers and the listener, as shown below. Source: Francis Rumsey and Tim McCormick (1994) 15 Power amplifiers Power amplifiers are used to do a simple job, i.e. providing voltage amplification with output currents in the ampere range to develop the necessary power across the loudspeaker terminals. Despite the simple job, there are many designs of power amplifiers available on the market. Domestic power amplifier is unlikely to be operated at high output levels for a long time. Therefore, it is usually design to deliver high currents for short periods to take care of short loud passages. Professional power amplifier is usually designed to offer long-term overload protection and complete stability into any type of speaker load. It is often required to drive a number of speakers in parallel on the other end of a long cable (say, 30 meters). This demands large power supplies and heavy transformers, with plenty of heat sink area to keep it from overheating. 16 Stereo recording and reproduction The principles and practice of stereo recording and reproduction need to be studied in both theoretical and subjective point of view. The reproduced sound may be judged subjectively to be good, but not necessarily theoretically “correct”, and vice versa. Stereo reproduction is closely linked with directional perception (space perception discussed earlier) mechanism in human hearing, as the aim of stereo reproduction is to create the illusion of directionality and space in reproduced sound. 17 Precedence effect The precedence effect is also known Haas effect, which explains the effects of echoes on the perception of direction of a source. If two sources were emitting similar sounds the perceived direction tended towards the advanced (in time) source. If the delays are within about 50 ms, the two sources are fused together by the brain, appearing as one source with the perceived direction towards that of the first arrival. If the delays are over 50 ms, the brain begins to perceive the sounds as distinct, and the second appears as an echo of the first. For the delayed source to appear equally as loud as the undelayed source it must be a certain number of decibels higher in amplitude to compensate for the precedence advantage of the first arrival. 18 Precedence effect The precedence effect describes how a delayed secondary source must be louder than the primary source if it is to be heard as equally loud. 19 Implications for stereo sound reproduction Stereo sound reproduction aims to give the impression of directionality and space in sound emitted from two or more loudspeakers or over headphones. This can be achieved using a combination of ITD and ILD between the two channels, as shown in the figure below. The precedence effect also helps to achieve the stereo effect. For example, if the right loudspeaker signal is delayed slightly in relation to the left loudspeaker then the sound will appear to come from somewhere towards the left, depending on the amount of delay. 20 Implications for stereo sound reproduction (cont.) 21 Two channel stereo from loudspeakers (cont) The loudspeaker reproduction of stereo signal is based on either ITD, ILD or the combination of both. The theory by Blumlein (1931) suggests that the level differences between the two loudspeakers are converted into low frequency phase differences between the ears, based on the summation of the loudspeaker signals at the two ears (see figure on the right). Source: Francis Rumsey and Tim McCormick (1994) 22 Two channel stereo from loudspeakers (cont) If the outputs of the two speakers differ only in level and not in phase (time), then it can be shown that the vector summation of the signals from the two speakers at each ear results in two signals which differ in phase angle proportional to the relative amplitudes (i.e. level difference) of the two loudspeaker signals. This is true at least for low frequencies up to 700 Hz). For a given level difference, the phase angle changes approximately linearly with frequency. At higher frequencies, the phase difference cue will not be useful, but the shadowing effect of the head will cause level differences between the ears. 23 Two channel stereo from loudspeakers (cont) A level difference of approximately 18 dB between channels is necessary to give the impression that a sound comes from either fully left or fully right in the image. There are disagreements between listeners as to the positions of half left or half right. If a time difference also exist between the channels, the sounds will be pulled towards the advanced speaker due to the precedence effect. 24 Source: Francis Rumsey and Tim McCormick (1994) Two channel stereo from loudspeakers (cont) A time difference of between 2 and 4 ms appears to be required for a sound to appear either fully left or fully right. There is a trade-off between time and level difference. For example, if the left channel is, say, 2 ms earlier than the right, then the right must be made approximately 5 dB louder to bring the signal back into the centre. 25 3D sound production 3D audio is referred to as the method for either widening stereo audio produced by two loudspeakers or stereo headphones, or creating the illusion of sound effects in three dimensional space surrounding the listener. HRTF Reverberation 26 3D sound production: Head Related Transfer Function Humans localise sound sources based on the monaural cues (derived from one ear, i.e. single channel sound), and/or the binaural cues (obtained from both ears). The binaural cues include the interaural time difference (ITD) and interaural intensity difference (IID). The monaural cues come from the interaction between the sound source and the human auditory system, where the temporal/spectral shape of the original sound source is modified by the auditory before it reaches the human brain. The information of source location is encoded by these modifications, and may be captured by an impulse response from the source location to the ear location. This impulse response is termed the head-related impulse response (HRIR). Therefore, taking the convolution of a sound source with the HRIR essentially converts the sound to that which would have been heard by the listener if it had been played at the source location, with the listener's ear at the receiver location. HRIRs are therefore used widely for the production of virtual surround sound. The HRTF is the Fourier transform of HRIR. 27 HRTF HRTF coordinates: Cheng & Walkfield (2001) 28 Frequency domain comparison of theoretical HRTFs as a function of azimuth in the horizontal plane (elevation= 0 degree) Cheng & Walkfield (2001) 29 Time domain comparison of measured and theoretical left ear HRTFs as a function of azimuth in the horizontal plane (elevation= 0 degree). Cheng & Walkfield (2001) 30 HRTF Block diagram of a simple HRTF-based spatial sound synthesis system. The delays and FIRs can be computed in real time to synthesize the moving sources. Cheng & Walkfield (2001) 31 Reverberation Reverberation is created when a sound is produced in an enclosed environment causing a large number of echoes to build up and then decay slowly as the sound is gradually absorbed by the walls, ceilings, floors and air. Reverberation of an audio signal can be produced by filtering (or taking the convolution of) the signal with an impulse response function. speaker listener 32 References Francis Rumsey and Tim McCormick, Sound and Recording: an Introduction, 1994. David Howard and James Angus, Acoustics and Psychoacoustics, 1996. Corey Cheng and Gregory Wakefield, Introductions to HRTF: Representations of HRTFs in Time, Frequency and Space, 2001. 33