Media Processing – Audio Part Dr Wenwu Wang Centre for Vision Speech and Signal Processing Department of Electronic Engineering w.wang@surrey.ac.uk http://personal.ee.surrey.ac.uk/Personal/W.Wang/teaching.html 1 Approximate outline Week 6: Fundamentals of audio Week 7: Audio acquiring, recording, and standards Week 8: Audio processing, coding, and standards Week 9: Audio production and reproduction Week 10: Audio perception and audio quality assessment 2 Audio recording Concepts and topics to be covered: Microphones Directional response of microphones Omni pattern Figure-eight pattern Cardioid pattern A/D conversion Sampling Quantisation D/A conversion Digital audio recording formats/standards Free audio recording software 3 Audio recording / processing / production chain Recording/acquisation system Sound source Production system (speaker) Listener 4 Microphone A microphone, whose functionality is opposite to a loudspeaker, is a transducer that converts acoustical sound energy into electrical form. There are three most common operation principles for microphones: the dynamic (i.e. moving coil), the ribbon, and the capacitor (or condensor). 5 Dynamic microphone It consists of a rigid diaphragm, typically 20-30 mm in diameter, suspended in front of a magnet. The coil sits in the gap of a strong permanent magnet. Sound waves cause diaphragm to vibrate, and ultimately the coil to move in the magnet’s gap, which results in an alternating current flows in the coil, producing the electrical output. Such a microphone is useful in applications such as drums, and handheld vocal use, due to its advantage in robustness. Its disadvantage lies in its limited frequency response (fairly rapid fall-off in response above 8 or 10 kHz). Source: Francis Rumsey and Tim McCormick (1994) 6 Ribbon microphone The ribbon microphone consists of a long thin strip of conductive metal foil, magnetic poles, and transformer. The foil is pleated to give it ‘spring’, which is lightly tensioned between two end clamps. The magnetic poles create a magnetic field across the ribbon. When it is excited by the sound waves, a current is generated. A transformer is used to magnify the electrical output of the ribbon, which is very small. Note that the standard output impedance is 200 ohms, as in the dynamic microphone. Source: Francis Rumsey and Tim McCormick (1994) 7 Capacitor microphone The capacitor consists of a flexible diaphragm and a rigid back plate, separated by an insulator. The diaphragm is free to vibrate with the sound waves. When one plate (i.e. diaphragm) is free to move with respect to the other (i.e. the earthed back plate), then the capacitance, i.e. the ability to hold electrical charge, will vary. The DC phantom power charges the capacitor via a very high resistance. A DC blocking capacitor prevents the phantom power from entering the head amplifier, allowing only audio signals to pass. Sound waves cause the diaphragm to move, and thus the changes in the capacitance and in the voltage across the capacitor proportionally (as the high resistance only allows very slow leakage of charge from the diaphragm). The head amplifier converts the very high impedance voltage output of the capacitor to a much lower impedance. The transformer further balances the signal for output. Source: Francis Rumsey and Tim McCormick (1994) 8 Directional responses of microphone and polar diagrams Microphones are designed to have certain directional response pattern, often described by ‘polar diagram’, which is a two-dimensional contour map, showing the magnitude of the microphone’s output at different angles of incidence of a sound wave. The distance of the polar plot from the centre of the diagram is usually measured by decibels (dB). The further the plot is from the centre, the greater the output of the microphone at that angle. A nominal 0 dB is usually marked for the response at zero degrees at 1 kHz. 9 Omnidirectional pattern An omnidirectional microphone picks up sound equally from all directions, i.e., a response of 1. The polar diagram of an ideal omnidirectional microphone is shown below, where the microphone response is omnidirectional for all frequencies. Such a pattern can be achieved by leaving the microphone diaphragm open at the front, but completely enclosed at the rear, so that it responds only to the change of air pressure caused by the sound waves. Source: Francis Rumsey and Tim McCormick (1994) 10 Omnidirectional pattern (cont.) The polar diagram of a (typical) real omnidirectional microphone at a number of frequencies is shown in the figure below. Source: Francis Rumsey and Tim McCormick (1994) 11 Omnidirectional pattern (cont.) For this microphone, its response is perfectly omnidirectional for frequencies up to 2 kHz. For frequencies between 3-6 kHz, its sensitivity at 180 degree (i.e. at the rear of the microphone) drops about 6 dB, as compared with the lower frequencies (up to 2 kHz). For frequencies above 8 kHz, the response at the 180 degree could drop as much as 15 dB. Therefore, sounds picked up by such a microphone will lose considerably the treble (high) frequency components of the signal. The smaller the dimension of the microphone, the better the polar response at high frequencies, and the mics with quarter-inch diaphragms, for example, maintains good response up to 10 kHz. Omni microphones are usually the most immune to mic movement and wind noise (as compared to the other types discussed later), as they are only sensitive to the absolute sound pressure. 12 Bidirectional (figure-eight) pattern Bidirectional (or figure-eight) microphone has a polar response proportional to the mathematical cosine of the angle of incidence of the sound waves. At 90 degree, no sound is picked up. At 0 degree, the sound is picked up by a front lobe, and at 180 degree, by a rear lobe, which will be 180 degree out of phase as compared with the one from the front lobe. Phase Source: Francis Rumsey and Tim McCormick (1994) 13 Bidirectional pattern (cont.) In such a microphone, such as the traditional ribbon microphone, the diaphragm operates on the pressure-gradient principle, i.e. responding to the difference in pressure between the front and the rear of the microphone. Therefore, for a sound from a direction 90 degree off axis, the sound pressure will be of equal magnitude to both sides of the diaphragm, and hence cause no movement of diaphragm, giving no output. For a sound arrives to the microphone from the front at 0 degree, a phase difference arises between the front and the rear of the diaphragm, due to the small additional distance travelled by the wave. The resulting difference in pressure produces movement of the diaphragm and hence gives an output (or response). For very low frequencies, the phase difference between the front and rear becomes very small (due to the long wavelengths), and the output response will become lower. The polar response of bidirectional mic tends to be very uniform at all frequencies, except for a slight narrowing at above approximately 10 kHz. In practice, correct orientation of such microphones is required in use. 14 Unidrectional (cardioid) pattern Unidirectional (also known as cardioid) pattern is described mathematically as 1+cos(phi), where phi is the angle of incidence of the sound signal. An idealised polar diagram of an unidirectional microphone is shown in the figure below. Source: Francis Rumsey and Tim McCormick (1994) 15 Unidrectional pattern (cont.) The response of the unidirectional microphone can be regarded as a combination of the omnidirectional and bidirectional responses, as shown in the figure below. At 0 degree, both polar responses are of equal amplitude and phase, when adding together, they produce a total output which is twice that of either separately. At 180 degree, they cancel each other due to the opposite phase. Source: Francis Rumsey and Tim McCormick (1994) 16 Unidirectional pattern (cont.) Such microphones can be obtained by leaving the diaphragm open at the front, but introducing various acoustic labyrinths at the rear which cause sound to reach the back of the diaphragm in various combinations of amplitude and phase, resulting in a cardioid response. A typical polar diagram of an unidirectional microphone at low (LF), middle (MF) and high frequencies (HF) is shown on the right figure. The polar response at mid-frequencies is very good, but tends to degenerate towards omni at the low frequencies (which are picked up quite uniformly), and becomes more directional than is desirable at high frequencies (sounds arriving from the rear will not be completely attenuated). Source: Francis Rumsey and Tim McCormick (1994) 17 Hypercardioid pattern The hypercardioid response is described mathematically as 0.5+cos(phi), where phi is the angle of incidence of sound signal. It can be considered as a combination of an omni response (attenuated by 6 dB), and a figure-eight response. The shape of the response lies in between the cardioid and figure-eight patterns, having a relatively small rear lobe which is out of phase with the front lobe. The hypercardioid microphone has the highest direct-to-reverberant ratio of the patterns, implying that the ratio between the level of on-axis sound and the level of reflected sounds picked up from other angles is very high. As a result, it is good for excluding unwanted sounds (such as room reverberations or unwanted noise). Demo for polar patterns: http://www.youtube.com/watch?v=_MMHi8bQVv0 http://www.youtube.com/watch?v=TUHpLqvw9AA 18 Examples of microphones: switchable polar patterns Two identical diaphragms are used and placed on each side of a central rigid plate. Perforations in the central plate give both diaphragms an cardioid response. When the polarising voltage of one side is the opposite to the other, the combined output gives a figure-eight response, as cardioids are out of phase. When the polarising voltage of both sides is the same, the combined output gives omnidirectional response, as the cardioids are in phase. Intermediate combinations give cardioid and supercardioid polar responses. A typical double-diaphragm microphone with switchable polar patter: AKG C414B-ULS 19 Examples of microphones: stereo micophones Two microphones are built into a single casing where one capsule is rotatable with respect to the other so that the angle between the two can be adjusted. Each capsule can be switched to give desired polar response, such as a pair of figure-eight microphones, or a pair of cardioids. A typical stereo microphone: the Neumann SM69 20 Examples of microphones: stereo microphones The sum-and-difference microphone is another type of stereo in which the sum (middle, i.e. (L+R)/2 of the conventional stereo microphone) and difference (side, i.e. (L-R)/2)) are combined in a matrix box to produce a left-right stereo signal. A typical sum-and-difference stereo microphone: the Shure VP88 21 Examples of microphones: stereo microphones An example of sophisticated stereo microphone is the AMS Soundfield microphone, shown below. In this microphone, each channel is fully adjustable from omni through cardioid to figure-eight, and angles between the capsules are also fully adjustable. These are controlled electronically by a remotely sited control unit. Second generation AMS First generation AMS Source: Francis Rumsey and Tim McCormick (1994) 22 A/D Conversion A/D converter is used to convert the analogue audio signal (a time varying electrical voltage, say, the output of a microphone), into a series of ‘samples’ which are ‘snapshots’ of the analogue signal taken at periodic intervals (known as the sampling period). It usually consists of sampling and quantisation steps. 23 Sampling In this process, measurements (i.e. samples) are taken from the analogue audio signal (shown on the left sub-plot below) at regular intervals in time. This is usually achieved by a sample and hold circuit. To represent the fine detail of the signal (or to reconstruct the analogue signal perfectly from the samples), it is necessary to take a large number of samples per second. As dictated by the Shannon sampling theorem, at least two samples must be taken per audio cycle (i.e. period). In other words, the sampling frequency should be at least two times of the frequency of the highest frequency component within the signal. Sample period: T (in second) Sample frequency : f = 1/T (in Hz) T 24 Aliasing effect due to under sampling For the subplot a on the figure below, enough samples have been taken and the signal can be perfectly reconstructed from the samples. For the subplot b, less than two samples per cycle are taken from the wave, as a result the signal may be reconstructed as another signal (denoted by the dashed curve), instead of the signal that was originally sampled (the solid curve). This is known as aliasing effect. Source: Francis Rumsey and Tim McCormick (1994) 25 Frequency domain interpretation of sampling The sampling process can also be considered as a modulation process, called pulse amplitude modulation (PCM) where a series of pulses of constant amplitude is amplitude modulated by the analogue audio waveforms. In other words, the amplitudes of the pulses are modified by the instantaneous amplitude of the analogue audio signal. 26 Frequency domain interpretation of sampling (cont.) Source: Francis Rumsey and Tim McCormick (1994) 27 Frequency domain interpretation of sampling (cont.) (a) The unmodulated sample pulses display a typical harmonic series of components at integer multiples of fs (fs =30 kHz in this case). (b) When a 1 kHz sine wave is sampled at fs = 30k Hz, it generates sideband components at frequencies spaced at the either side of fs (i.e. 29 = fs-1 and 31 = fs+1 kHz), and its multiples (i.e., 59 =2fs-1 and 61 =2fs+1 kHz). (c) When a 17 kHz sine wave is sampled at fs = 30k Hz, it generates sideband components at frequencies spaced at the either side of fs (i.e. 13 = fs-17 and 47 = fs+17 kHz), and its multiples (i.e., 43= 2fs-17 and 77 = 2fs+17 kHz). As the sideband 13 kHz will be within the frequency range of the baseband, i.e. the spectrum of the original audio signal), it will also be audible. 28 Anti-aliasing One way to remove the aliasing effect is to make sure the sampling frequency to be at least twice the highest frequency in the signal. An alternative way is to use an anti-aliasing filter to remove the frequency components of the signal whose frequencies are higher than half of the sampling frequency (also usually called Nyquist frequency), as shown below. Source: Francis Rumsey and Tim McCormick (1994) Demo for aliasing effects and anti-aliasing: http://www.youtube.com/watch?v=YB9nALmwSL8 http://www.youtube.com/watch?v=EQ-ovLnVTIM 29 Quantisation In the quantisation process, each sample is assigned a value from a range of fixed possibilities, as shown in an example below, where a scale from 1 to 10 is used for both positive and negative ranges (i.e. a decimal system). Each sample is represented by an integer number on this scale, and hence if the amplitude of the sample obtained from the sampling process is a fraction or decimal, it will be rounded to the nearest integer number during quantisation. Source: Francis Rumsey and Tim McCormick (1994) The quantised sequence: -3, 1, 5, 7, …, -5, -7, -9 30 Quantisation (cont.) The difference between the sample amplitude represented by the numbers and the original amplitude of the sample is called quantisation error. The maximum quantisation error will be half of a quantisation step size, Q. In the subplot (a), there are a fewer number of quantisation steps, therefore, the quantisation error is bigger, as compared to the subplot (b). Source: Francis Rumsey and Tim McCormick (1994) 31 Quantisation (cont.) In digital audio systems, a binary number (instead of decimal) system is used to quantise the samples, as shown below: (a) a binary number consists of a number of bits; (b) each bit represents a power of two; (c) binary numbers can be represented electrically in pulse-code modulation (PCM) by a string of high and low voltages. Source: Francis Rumsey and Tim McCormick (1994) 32 Quantisation (cont.) A 4-bit binary quantisation scale: two’s complement. The leftmost bit is the most significant bit (MSB) which determines whether the number is positive or negative. Source: Francis Rumsey and Tim McCormick (1994) 33 Quantisation (cont.) The quantisation error (noise) can be considerably reduced by oversampling (the Nyquist frequency is above the upper limit of the audio band) which essentially spreads the quantisation noise into a wide range of frequencies, resulting in about 3 dB noise reduction per octave (i.e. double the sampling frequency) of oversampling. Therefore, it is the key in improving digital audio quality on both A/D and D/A converters. ‘Decimation’ is performed to reduce the sampling rate and increase the bit depth of the quantised samples obtained at high sampling rate. Source: Francis Rumsey and Tim McCormick (1994) 34 Quantisation (cont.) The dynamic range of the digital audio is limited by the high-level end of the quantisation scale. Any amplitude of the samples that is out of this range will be clipped, and, as a result, the signal will be distorted. Demos for quantisation noise: Source: Francis Rumsey and Tim McCormick (1994) http://www.youtube.com/watch?v=_cRFBBnUFug 35 D/A conversion The audio sample words are converted back into a staircase-like chain of electrical levels corresponding to the sample values. Resampling is used to reduce the width of the pulses, in order to reduce the so-called aperture effect (equalisation is required to correct for the aperture effect). Finally, a low-pass smoothing filter is used to reconstruct the audio signal. Source: Francis Rumsey and Tim McCormick (1994) 36 Earlier digital audio recording formats Digital tape recording (a magnetic tape data storage format introduced by Sony, 1980s) Hard-disk recording (a digital magnetic data storage format, introduced by IBM in 1956, used for audio recording in 1976 by Sony) Compact-disc (CD) recording (an optical disc used originally to store digital audio data, commercially available in 1982) DVD recording (an optical disc storage format, invented by Philip, Sony, Toshiba, Panasonic in 1995, offering higher storage capacity than CD while having the same dimensions) 37 Examples of digital audio recorder Sony’s PCM-F1, digital tape recording, sampling rate 44.1 kHz. Source: Francis Rumsey and Tim McCormick (1994) 38 Examples of digital audio recorder (cont.) Sony’s PCM-1610 digital tape recorder, sampling rates 44.1. Source: Francis Rumsey and Tim McCormick (1994) 39 Examples of digital audio recorder (cont.) A Sony portable DAT digital tape recorder, sampling rates 44.1 and 48 kHz. Source: Francis Rumsey and Tim McCormick (1994) 40 Examples of digital audio recorder (cont.) Lynx Digital Audio Recorder (containing the A/D, D/A converters) Demos: http://www.youtube.com/watch?v=gwLTr8v01AI http://www.youtube.com/watch?v=OVauM51sLYw 41 Recent developments in digital audio recording (since 2000s) Super audio CD (high resolution, optical disc for audio storage) DVD-A (a digital format for delivering high-fidelity audio contents on DVD) Blue-ray Disc (an optical disc storage media, a competitor of HD DVD) HD DVD (a high-density optical disc format using red laser for recording) Internet radio webcasting (audio service transmitted over the internet) Podcasting (non-streamed webcast, audio downloaded from web feed (a remote server) through a client software podcatcher) 42 Free digital audio recording and editing software Audacity Audacity is free, open source software for recording and editing sounds. It allows you to record live audio, converts tapes and records into digital recordings or CDs, edit Ogg Vorbis, MP3, WAV or AIFF sound files. You also can cut, copy, split or mix sounds together with Audacity. Built-in effects are given to remove static, hiss, hum or other constant background noises. Power Sound Editor Power Sound Editor is a visual audio editing and recording solution, which supports many advanced and powerful operations with audio data. MP3DirectCut mp3DirectCut is a fast and extensive audio editor and recorder for compressed mp3. You can directly cut, copy, paste or change the volume with no need to decompress your files for audio editing. Using Cue sheets, pause detection or Auto cue you can easily divide long files. 43 Free digital audio recording and editing software (cont.) Music Editor Free Music Editor Free (MEF) is a multi-award winning music editor software tool. MEF helps you to record and edit music and sounds. It lets you make and edit music, voice and other audio recordings. When editing audio files you can cut, copy and paste parts of recordings and, if required, add effects like echo, amplification and noise reduction. Wavosaur Wavosaur is a free sound editor, audio editor, wav editor software for editing, processing and recording sounds, wav and mp3 files. Wavosaur has all the features to edit audio (cut, copy, paste, etc.) produce music loops, analyze, record, batch convert. Wavosaur supports VST plugins, ASIO driver, multichannel wav files, real time effect processing. The program has no installer and doesn’t write in the registry. Use it as a free mp3 editor, for mastering, sound design. Ardour Ardour is a digital audio workstation. You can use it to record, edit and mix multi-track audio. You can produce your own CDs, mix video soundtracks, or just experiment with new ideas about music and sound. Source: http://www.hongkiat.com/blog/25-free-digital-audio-editors/, where you can find more free audio recording software from this link. 44 Reference F. Rumsey and T. McCormick, Sound and Recording: an Introduction, 2nd Edition, 1994. 45