EE3107B: Digital Sound and Music Production 0. Basic Audio Terminology Wave Concepts The basic set of waveforms covered by this document are simple mathematical shapes which are found on analogue synthesizers. The first waveform to look at is the "Sine" wave. Oscillator, Waveform and Timbre An Oscillator generates a sound. It actually generates a waveform or a shape. The oscillator does this continuously. The rate at which it generates each cycle of the waveform is what we hear as pitch. Pitch is measured in Hertz (Hz) [where one Hertz is one cycle per second]. A Waveform (or Wave) is a shape which the oscillator generates. The shape determines the "timbre" or quality, characteristic or brightness of a sound. While pitch tells us which note is being played, "timbre" tells us which instrument is being played. While pitch is a basic frequency which identifies the note, "timbre" is made up of many other frequencies or overtones which gives the instrument its overall character and identity. Wave Graphs (time-amplitude domain) When we graphically show a waveform cycle (e.g. Sine Wave), it has two axes: Amplitude (vertical axis) and Time (horizontal axis). "Time" is very short. A wave cycle at 440Hz occurs in only 2.27 microseconds: 1 micro-second is 1/1000000 second. "Amplitude" is not necessarily in terms of perceived overall loudness but it's relative changes over a cycle (i.e. a microscopic view) Perceived loudness is where we draw the graph 10 times taller (vertical axis) and say "Ooh! Yes, it is louder now!". Harmonic Series (Harmonic Contents) A Sine wave is the most basic waveform and is the building block of harmonic analysis. This is because a Sine wave has no harmonics (overtones) at all. It only has the tone of the fundamental frequency and no timbre. The Fundamental frequency is the base or root frequency which we identify as pitch (e.g. "A3" would be at 440 Hz). Since a Sine wave is pure (no harmonics), we can create other waveforms simply by adding together any number of sine waves at different frequencies and different volume levels (amplitudes). Mathematically, any sound can be created using Sine waves at different frequencies and amplitudes. Similarly, any sound can be broken down into discrete and distinct Sine 1 Dr. T. Itagaki EE3107B: Digital Sound and Music Production waves at different frequencies and amplitudes. If we look at any static waveform (a pitched sound which doesn't change timbre over time), it is made of sine waves which are Multiples of the Fundamental frequency (F). This is known as the Natural Harmonic Series where the series consists of F, Fx2, Fx3, Fx4, Fx5, Fx6 etc. If you analysed a bright violin sound played at the note "A2" (where F.Freq = 220 Hz), it is made of a series of sine waves at 220 Hz, 440 Hz, 660 Hz, 880 Hz, 1.1 kHz, 1.32 kHz, 1.54 kHz and so on. Spectrum Graphs (frequency-amplitude domain) We analyse waveforms using a Harmonic Spectrum (It's a lot like the spectrum analyser on a hi- fi). The Harmonic Spectrum has Amplitude (vertical axis) and Frequency of the sine waves (horizontal axis). To simplify analysis, Frequency is usually expressed as multiples of the fundamental frequency while Amplitude is usually expressed as relative to the fundamental sine wave's amplitude. Digitised Waveforms A digitised waveform is a waveform stored in a memory chip. It may originate from a sample (i.e. real sound) or it could be created. Typically, the waveforms stored would be single cycles only. Put tons of these digitised waveforms on a memory chip and you have the makings of a wave-table synthesizer. The quality of the digitised waveform will depend on the Resolution in amplitude and Sample Rate (as well as the inherent quality of the sound itself). Compact Disc quality uses 16-bit resolution and 44.1 kHz sample rate. Resolution of 16-bits means that the sound "amplitude" is captured and stored in leve ls from 0~65,535 (216 ). On the waveform graphs, this would be the vertical axis. The horizontal axis (time) is captured by the sample rate. A sample rate of 44.1kHz means that 44,100 of the amplitude "levels" (0~65,535) are taken per second. 1. Sampling Theorem The core concept in digital audio recording is sampling; converting continuous analogue signals into discrete time-sampled signal. Theoretical underpinning of sampling is the sampling theorem, which specifies the relation between the sampling rate and the audio bandwidth. This theorem is also called the Nyquist theorem after the work of Harold Nyquist of Bell Telephone Laboratories (USA). 2 Dr. T. Itagaki EE3107B: Digital Sound and Music Production When a 1 Hz sine wave (firm line) is quantised at 8 Hz sampling rate (vertical line), a series of digitised points (circles) are shown below: Figure 1.1.: 1 Hz [7 Hz] sampled at 8 Hz In the figure above, a 7 Hz sine wave is shown in dotted line that also crosses all the circles (digitised 1 Hz sine wave). This means that at the sampling rate of 8 Hz, we can't recognise the difference between a digitised 7 Hz sine wave and a digitised 1 Hz sine wave. In other words, when a 7 Hz (sampling frequency - 1 Hz) sine wave is sampled at 8 Hz, the result is a 1 Hz digitised sine wave. This phenomenon is called image spectrum or aliasing. Another example of the aliasing is shown below. A 9 Hz sine wave (sample frequency + 1 Hz) is shown in dotted line that also crosses all the circles (digitised 1 Hz sine wave). Figure 1.2.: 1 Hz [9 Hz] sampled at 8 Hz If we see the above phenomena on the frequency-amplitude domain, as a spectrum of the signal. 3 Dr. T. Itagaki EE3107B: Digital Sound and Music Production original signal: 1 Hz, fs : 8 Hz Figure 1.3.: 1 Hz signal and its alias signals The thin line in the left end is the original (1 Hz) signal. This flips over at the half of sample frequency (4 Hz) and is shown at 7 Hz. The alias signals appear repeatedly at multiple of sample frequency (8 Hz) plus 1 Hz and minus 1 Hz: 16 Hz [15 Hz, 17 Hz ], 24 Hz [23 Hz, 25 Hz]... This means that we can use a digital system, with a sampling rate of 8 Hz, to transmit a signal containing up to 4 Hz (half of the sample frequency): limitation of frequency range. This particular frequency, half of the sample frequency, is called Nyquist Frequency or Nyquist Limit. Also, when an analogue signal is quantised for the digital system, to avoid the aliasing, the frequency components higher than 4 Hz (half of the sample frequency) have to be filtered off. A low-pass filter used for the purpose is called anti-alias filter. When a digital signal sampled at 8 Hz is converted to an analogue signal, the same aliasing occurs. To prevent this, a low-pass filter, cut off the alias signal over 4 Hz (half of sample rate), is situated at the conversion process. A low-pass filter used for the purpose is called anti-alias filter or data-recovery filter. 2. Quantisation in Amplitude (Dynamic Range) The number of bits is responsible for the resolution of amplitude measurement. A general rule is that for an audio Analogue-to-Digital Converter (ADC) need at least 12 bits (96 dB in dynamic range whereas that for human auditory system is about 120 dB). Most ADCs require a finite time for the measurement process (about 1 - 10 ? sec). It is important that 4 Dr. T. Itagaki EE3107B: Digital Sound and Music Production the ADC does not miss out any codes. It must be liner: straight line (integral linearity). Also, the step size must be equal (differential linearity). Sample and Hold quantises in the time, but the ADC measures the amplitude of the signal and converts into a binary number. Since the ADC has only a finite range of output numbers (2n ) which represent the amplitude, there will be some pounding of measurement; quantisation in amplitude. Figure 2.1.: Sawtooth Wave in 16, 8 and 4-bit The above graphs show the different levels of quantisation; 16-bit, 8-bit and 4-bit. number of bits decreases, the quality of digitised sound decreases. 5 As the Dr. T. Itagaki EE3107B: Digital Sound and Music Production Figure 2.2.: "Analogue" Sine Wave (Input Signal to ADC) Figure 2.3.: "Digitised" Sine Wave (Output Signal from ADC) Figure 2.4.: "Error Signal" (Input - Output Signal) The other signal is "distortion". (i.e. difference between input and output) If the input is a regular wave, like sine wave, and the sample rate is fixed, we will probably hear the extra distortion frequency individually. However, if the input signal is a complex signal, such as speech or music, then it will be difficult to detect the correlation between the original 6 Dr. T. Itagaki EE3107B: Digital Sound and Music Production and the error signal. Also, the error signal will be perceived as noise: quantisation noise. The errors are liner and a standard ADC is directly proportion to the number of quantisation levels. The signal-to-noise ratio would be the number of bits times 6 dB. To increase the signal-to-noise ratio, we must to an ADC with more bits or use signal processing techniques to reduce the noise floor; such as over-sampling and noise shaping. 3. Analogue Synthesis In 1964, an American engineer Robert Moog built a transistor voltage-controlled oscillator and amplifier for the composer Herbert Deutsch. The development stimulated widespread interest, and led other American engineers to join the race to build a novel machine. In analogue synthesis, a few basic waveforms are used for the source of sounds; sine, triangle, square, sawtooth and pulse. A sine wave is most basic waveform and is the building block of harmonic analysis. This is because a sine wave has no harmonics (overtones) at all. It only has the tone of the fundamental frequency without timbre. Triangle wave sounds (and looks) a bit like the sine wave but has some hollow-sounding quiet overtones. It is made up of only odd numbered harmonics : F, 3F, 5F, 7F 9F etc. The amplitudes of the harmonic series decreases exponentially. Square wave has very sharp corners, has a hollow sound with quite strong overtones. It is made up of only odd numbered harmonics : F, 3F, 5F, 7F, 9F etc. The amplitudes of harmonic series decreases steadily. Sawtooth wave is available as RampUp or RampDown (which sound the same). It has a very bright and rich sound with strong overtones. t is made up of all harmonics : F, 2F, 3F, 4F, 5F,6F etc. the ampitudes of harmonic series decreases steadily. Sound of a Pulse wave (also known as Rectangle wave) depends on ho w far the Pulse Width deviates from the Square wave. The Pulse Width (period when the wave us "up") is usually expressed as a percentage (of the full wave cycle), so a Square wave is a 50% Pulse. A 10% Pulse and a 90% Pulse sound the same (to all intents and purposes). As the Pulse Width deviates from 50%, it sounds increasingly brighter and richer; but as the 7 Dr. T. Itagaki EE3107B: Digital Sound and Music Production Pulse Width becomes very narrow, it becomes more thin and nasal. Generally, it is made up of all harmonics : F, 2F, 3F, 4F, 5F etc. The amplitudes of harmonic series decreases but the odd and even numbered harmonics have different amplitudes depending on the Pulse Width. top to bottom; triangle, square, sawtooth and pulse Figure 3.1.: Waveforms and Harmonic Contents To control a musical sound, the basic requirements are measurable in terms of frequency [oscillator], harmonics content [filter] and amplitude [amplifier]. These three components were made voltage controllable, thus providing a common denominator of control. Varying voltages are easy to generate and to distribute to one or a number of associated devices, hence their attraction as a means of regulating the generation of sound. Analogue synthesis is often misleadingly called "subtractive synthesis", due to the preference of most users for configurations where timbres are generated by filtering the harmonics of raw electronic wave forms. Voltage-Controlled Oscillator A typical Voltage-Controlled Oscillator [VCO] can produce a number of wave forms; such as saw-tooth, triangle and square [or pulse] waves, in addition to the basic sine wave. A change in the control voltage means a change in pitch [frequency]. A space/mark ratio of a "pure" square wave is one. By varying the space/mark ratio with the controlled voltage supplied by an oscillator, the harmonic structure of the sound is changed by the oscillation; a narrower mark produces richer harmonics. This technique is known as Pulse Width Modulation [PWM], a very useful feature of a VCO. 8 Dr. T. Itagaki EE3107B: Digital Sound and Music Production Figure 3.2.: Square Wave. Many synthe sizers allow two or more oscillators to be mixed together. The resultant waveform is simply the sum of the waveforms (i.e. both added together). The resultant harmonic series would also be the sum of the harmonics. Some synthesizers only have on/off switches for its various selectable waveforms. Switching on two or more waveforms will produce a new waveform based on the sum of the parts. Where two oscillators in slightly different frequency are mixed together, the result is still the sum of the waveforms. The diagram below shows the resultant waveforms. The top and middle waves are the two individual detuned waves, while the bottom wave is the added result. The left shows 2 detuned Square waves added together and the right shows 2 detuned Tri-waves added together. Figure 3.3.: Modulation. The resultant waveform is quite dynamic. It changes over time, due to the interference of the detuning. This evolution of waveform over time gives a sense of movement to the timbre because of the "phase" differences. Sometimes the two waves are in-phase (i.e. both going up at the same time) and sometimes they are out-of-phase (i.e. one going up and one going down). Mostly, it's something in between. Note that, when they are in-phase, the resultant amplitude is strong (i.e. it is loud) and, when it is out-of-phase, the resultant amplitude is weak (i.e. it is quiet). This 9 Dr. T. Itagaki EE3107B: Digital Sound and Music Production will cause a bit of "tremolo" or beating of amplitude (The greater the detuning, the faster the rate or speed of the tremolo). The tremolo effect can be reduced by making one oscillator louder than the other. Voltage-Controlled Filter The controlled variable of a typical Voltage-Controlled Filter [VCF] is either the cut off/centre frequency or the "Q" value [resonance]. For example, when the "Q" is kept constant [constant Q] and the frequency is set to track at a fixed harmonic spacing to the fundamental of a compound wave, such as a square or saw-tooth function, a variety of consistent timbres can be generated at different pitches. Voltage-Controlled Amplifier The controlled variable of a typical Voltage-Controlled Amplifier [VCA] is the output amplitude. When a VCA is modulated at low frequencies, the result is a "tremolo" effect, but higher rates of modulation fuse the spectra to give sum and difference frequencies, a phenomenon known as "Amplitude Modulation" [AM] or “Ring Modulation”. Figure 3.4.: Source Signal (500 Hz sine wave). Figure 3.5.: Modified Signal (CV 100 Hz, Source 500 Hz). 10 Dr. T. Itagaki EE3107B: Digital Sound and Music Production Figure 3.6.: Frequency Spectrum of Modified Signal (500 ± 100 Hz). The side-band frequencies are generated as the source frequency [carrier] plus the control voltage frequency [modifier] and the source frequency minus the control voltage frequency. When the control voltage becomes negative, the output of the VCA is zero. If, however, the design is modified to create a four-quadrant multiplier with two- inputs and one-output, a device known as a "ring modulator", the negative control voltages can be used to produce negative frequencies with reversed phases by reflection. It is significant to note that the early digital software synthesis systems allowed easy replication of many of the features in analogue voltage-controlled systems, and many of the techniques are still widely used in the current generation of software synthesis program, for example CSOUND. Frequency Modulation The theory behind the Frequency Modulation dates back to the early twentieth century. In this case, FM means that of used in radio communications, in order of MHz. Chowning applied and explored the technique in the sound spectrum for musical synthesis purposes, commonly referred to as "simple FM" or "Chowning's FM", where a "carrier" oscillator is modulated in frequency by a "modulator" oscillator. In this case, the frequency range of both “carrier” and “modulator” are in audible range; 20 Hz - 20 KHz. Before the development of Chowning's FM method, most digitally generated sounds were produced by means of fixed wave forms based on fixed spectrum techniques, a consequence of the high computational costs of time- varying additive and subtractive synthesis. Chowning developed the FM technique as an efficient way of generating synthetic sounds that have time-varying spectral characteristics. In 1975, Yamaha [known as "Nippon Gakki" at that time] obtained a licence for the patent. In 1980, this Japanese firm introduced the algorithm as a hardware fabrication for the GS1 digital synthesiser. 11 Dr. T. Itagaki EE3107B: Digital Sound and Music Production The basic FM technique is such that a carrier oscillator is modulated in frequency by a modulator oscillator. When the carrier and the modulator are both sine waves, the equation for a frequency modulated signal at time t is; A ? sin ( Ct ? [I ? sin ( Mt )]) where A: amplitude of the carrier I: index of modulation Ct ? 2? ? C , Mt ? 2? ? M The positions of the modulated side-band frequencies depend on the ratio of the carrier to the modulator frequency; "C:M ratio". The side-bands are multiples of the carrier and modulator; C + nM and C -nM, where n is an integer number. If the "difference" frequencies turn negative, these are folded over to the positive side with phase inversion: the wave forms flip over the x-axis. This "fold-over" can cause cancellation of the positive partials if the negative partials overlap exactly with the positive counterparts. In the case of a digital implementation, this "fold-over" also occurs where the "sum" frequencies exceed the Nyquist limit a phenomenon known as aliasing. 12 Dr. T. Itagaki