0. Basic Audio Terminology Wave Concepts

advertisement
EE3107B: Digital Sound and Music Production
0.
Basic Audio Terminology
Wave Concepts
The basic set of waveforms covered by this document are simple mathematical shapes
which are found on analogue synthesizers. The first waveform to look at is the "Sine"
wave.
Oscillator, Waveform and Timbre
An Oscillator generates a sound. It actually generates a waveform or a shape. The
oscillator does this continuously. The rate at which it generates each cycle of the
waveform is what we hear as pitch. Pitch is measured in Hertz (Hz) [where one Hertz is
one cycle per second]. A Waveform (or Wave) is a shape which the oscillator generates.
The shape determines the "timbre" or quality, characteristic or brightness of a sound.
While pitch tells us which note is being played, "timbre" tells us which instrument is being
played. While pitch is a basic frequency which identifies the note, "timbre" is made up of
many other frequencies or overtones which gives the instrument its overall character and
identity.
Wave Graphs (time-amplitude domain)
When we graphically show a waveform cycle (e.g. Sine Wave), it has two axes: Amplitude
(vertical axis) and Time (horizontal axis). "Time" is very short. A wave cycle at 440Hz
occurs in only 2.27 microseconds: 1 micro-second is 1/1000000 second. "Amplitude" is
not necessarily in terms of perceived overall loudness but it's relative changes over a cycle
(i.e. a microscopic view) Perceived loudness is where we draw the graph 10 times taller
(vertical axis) and say "Ooh! Yes, it is louder now!".
Harmonic Series (Harmonic Contents)
A Sine wave is the most basic waveform and is the building block of harmonic analysis.
This is because a Sine wave has no harmonics (overtones) at all. It only has the tone of
the fundamental frequency and no timbre. The Fundamental frequency is the base or root
frequency which we identify as pitch (e.g. "A3" would be at 440 Hz). Since a Sine wave
is pure (no harmonics), we can create other waveforms simply by adding together any
number of sine waves at different frequencies and different volume levels (amplitudes).
Mathematically, any sound can be created using Sine waves at different frequencies and
amplitudes. Similarly, any sound can be broken down into discrete and distinct Sine
1
Dr. T. Itagaki
EE3107B: Digital Sound and Music Production
waves at different frequencies and amplitudes. If we look at any static waveform (a
pitched sound which doesn't change timbre over time), it is made of sine waves which are
Multiples of the Fundamental frequency (F). This is known as the Natural Harmonic
Series where the series consists of F, Fx2, Fx3, Fx4, Fx5, Fx6 etc. If you analysed a bright
violin sound played at the note "A2" (where F.Freq = 220 Hz), it is made of a series of sine
waves at 220 Hz, 440 Hz, 660 Hz, 880 Hz, 1.1 kHz, 1.32 kHz, 1.54 kHz and so on.
Spectrum Graphs (frequency-amplitude domain)
We analyse waveforms using a Harmonic Spectrum (It's a lot like the spectrum analyser on
a hi- fi). The Harmonic Spectrum has Amplitude (vertical axis) and Frequency of the sine
waves (horizontal axis). To simplify analysis, Frequency is usually expressed as
multiples of the fundamental frequency while Amplitude is usually expressed as relative to
the fundamental sine wave's amplitude.
Digitised Waveforms
A digitised waveform is a waveform stored in a memory chip. It may originate from a
sample (i.e. real sound) or it could be created. Typically, the waveforms stored would be
single cycles only. Put tons of these digitised waveforms on a memory chip and you have
the makings of a wave-table synthesizer.
The quality of the digitised waveform will depend on the Resolution in amplitude and
Sample Rate (as well as the inherent quality of the sound itself). Compact Disc quality
uses 16-bit resolution and 44.1 kHz sample rate. Resolution of 16-bits means that the
sound "amplitude" is captured and stored in leve ls from 0~65,535 (216 ). On the
waveform graphs, this would be the vertical axis. The horizontal axis (time) is captured
by the sample rate. A sample rate of 44.1kHz means that 44,100 of the amplitude "levels"
(0~65,535) are taken per second.
1.
Sampling Theorem
The core concept in digital audio recording is sampling; converting continuous analogue
signals into discrete time-sampled signal. Theoretical underpinning of sampling is the
sampling theorem, which specifies the relation between the sampling rate and the audio
bandwidth. This theorem is also called the Nyquist theorem after the work of Harold
Nyquist of Bell Telephone Laboratories (USA).
2
Dr. T. Itagaki
EE3107B: Digital Sound and Music Production
When a 1 Hz sine wave (firm line) is quantised at 8 Hz sampling rate (vertical line), a
series of digitised points (circles) are shown below:
Figure 1.1.: 1 Hz [7 Hz] sampled at 8 Hz
In the figure above, a 7 Hz sine wave is shown in dotted line that also crosses all the circles
(digitised 1 Hz sine wave). This means that at the sampling rate of 8 Hz, we can't
recognise the difference between a digitised 7 Hz sine wave and a digitised 1 Hz sine wave.
In other words, when a 7 Hz (sampling frequency - 1 Hz) sine wave is sampled at 8 Hz, the
result is a 1 Hz digitised sine wave. This phenomenon is called image spectrum or
aliasing.
Another example of the aliasing is shown below. A 9 Hz sine wave (sample frequency +
1 Hz) is shown in dotted line that also crosses all the circles (digitised 1 Hz sine wave).
Figure 1.2.: 1 Hz [9 Hz] sampled at 8 Hz
If we see the above phenomena on the frequency-amplitude domain, as a spectrum of the
signal.
3
Dr. T. Itagaki
EE3107B: Digital Sound and Music Production
original signal: 1 Hz,
fs : 8 Hz
Figure 1.3.: 1 Hz signal and its alias signals
The thin line in the left end is the original (1 Hz) signal. This flips over at the half of
sample frequency (4 Hz) and is shown at 7 Hz. The alias signals appear repeatedly at
multiple of sample frequency (8 Hz) plus 1 Hz and minus 1 Hz: 16 Hz [15 Hz, 17 Hz ],
24 Hz [23 Hz, 25 Hz]...
This means that we can use a digital system, with a sampling rate of 8 Hz, to transmit a
signal containing up to 4 Hz (half of the sample frequency): limitation of frequency range.
This particular frequency, half of the sample frequency, is called Nyquist Frequency or
Nyquist Limit. Also, when an analogue signal is quantised for the digital system, to avoid
the aliasing, the frequency components higher than 4 Hz (half of the sample frequency)
have to be filtered off. A low-pass filter used for the purpose is called anti-alias filter.
When a digital signal sampled at 8 Hz is converted to an analogue signal, the same aliasing
occurs. To prevent this, a low-pass filter, cut off the alias signal over 4 Hz (half of sample
rate), is situated at the conversion process. A low-pass filter used for the purpose is called
anti-alias filter or data-recovery filter.
2.
Quantisation in Amplitude (Dynamic Range)
The number of bits is responsible for the resolution of amplitude measurement. A general
rule is that for an audio Analogue-to-Digital Converter (ADC) need at least 12 bits (96 dB
in dynamic range whereas that for human auditory system is about 120 dB). Most ADCs
require a finite time for the measurement process (about 1 - 10 ? sec). It is important that
4
Dr. T. Itagaki
EE3107B: Digital Sound and Music Production
the ADC does not miss out any codes. It must be liner: straight line (integral linearity).
Also, the step size must be equal (differential linearity).
Sample and Hold quantises in the time, but the ADC measures the amplitude of the signal
and converts into a binary number. Since the ADC has only a finite range of output
numbers (2n ) which represent the amplitude, there will be some pounding of measurement;
quantisation in amplitude.
Figure 2.1.: Sawtooth Wave in 16, 8 and 4-bit
The above graphs show the different levels of quantisation; 16-bit, 8-bit and 4-bit.
number of bits decreases, the quality of digitised sound decreases.
5
As the
Dr. T. Itagaki
EE3107B: Digital Sound and Music Production
Figure 2.2.: "Analogue" Sine Wave (Input Signal to ADC)
Figure 2.3.: "Digitised" Sine Wave (Output Signal from ADC)
Figure 2.4.: "Error Signal" (Input - Output Signal)
The other signal is "distortion". (i.e. difference between input and output) If the input is
a regular wave, like sine wave, and the sample rate is fixed, we will probably hear the extra
distortion frequency individually. However, if the input signal is a complex signal, such
as speech or music, then it will be difficult to detect the correlation between the original
6
Dr. T. Itagaki
EE3107B: Digital Sound and Music Production
and the error signal. Also, the error signal will be perceived as noise: quantisation noise.
The errors are liner and a standard ADC is directly proportion to the number of
quantisation levels. The signal-to-noise ratio would be the number of bits times 6 dB.
To increase the signal-to-noise ratio, we must to an ADC with more bits or use signal
processing techniques to reduce the noise floor; such as over-sampling and noise shaping.
3.
Analogue Synthesis
In 1964, an American engineer Robert Moog built a transistor voltage-controlled oscillator
and amplifier for the composer Herbert Deutsch.
The development stimulated
widespread interest, and led other American engineers to join the race to build a novel
machine.
In analogue synthesis, a few basic waveforms are used for the source of sounds; sine,
triangle, square, sawtooth and pulse.
A sine wave is most basic waveform and is the building block of harmonic analysis. This
is because a sine wave has no harmonics (overtones) at all. It only has the tone of the
fundamental frequency without timbre.
Triangle wave sounds (and looks) a bit like the sine wave but has some hollow-sounding
quiet overtones. It is made up of only odd numbered harmonics : F, 3F, 5F, 7F 9F etc.
The amplitudes of the harmonic series decreases exponentially.
Square wave has very sharp corners, has a hollow sound with quite strong overtones. It is
made up of only odd numbered harmonics : F, 3F, 5F, 7F, 9F etc. The amplitudes of
harmonic series decreases steadily.
Sawtooth wave is available as RampUp or RampDown (which sound the same). It has a
very bright and rich sound with strong overtones. t is made up of all harmonics : F, 2F, 3F,
4F, 5F,6F etc. the ampitudes of harmonic series decreases steadily.
Sound of a Pulse wave (also known as Rectangle wave) depends on ho w far the Pulse
Width deviates from the Square wave. The Pulse Width (period when the wave us "up")
is usually expressed as a percentage (of the full wave cycle), so a Square wave is a 50%
Pulse. A 10% Pulse and a 90% Pulse sound the same (to all intents and purposes). As
the Pulse Width deviates from 50%, it sounds increasingly brighter and richer; but as the
7
Dr. T. Itagaki
EE3107B: Digital Sound and Music Production
Pulse Width becomes very narrow, it becomes more thin and nasal. Generally, it is made
up of all harmonics : F, 2F, 3F, 4F, 5F etc. The amplitudes of harmonic series decreases
but the odd and even numbered harmonics have different amplitudes depending on the
Pulse Width.
top to bottom; triangle, square, sawtooth and pulse
Figure 3.1.: Waveforms and Harmonic Contents
To control a musical sound, the basic requirements are measurable in terms of frequency
[oscillator], harmonics content [filter] and amplitude [amplifier]. These three components
were made voltage controllable, thus providing a common denominator of control.
Varying voltages are easy to generate and to distribute to one or a number of associated
devices, hence their attraction as a means of regulating the generation of sound.
Analogue synthesis is often misleadingly called "subtractive synthesis", due to the
preference of most users for configurations where timbres are generated by filtering the
harmonics of raw electronic wave forms.
Voltage-Controlled Oscillator
A typical Voltage-Controlled Oscillator [VCO] can produce a number of wave forms; such
as saw-tooth, triangle and square [or pulse] waves, in addition to the basic sine wave. A
change in the control voltage means a change in pitch [frequency]. A space/mark ratio of
a "pure" square wave is one. By varying the space/mark ratio with the controlled voltage
supplied by an oscillator, the harmonic structure of the sound is changed by the oscillation;
a narrower mark produces richer harmonics. This technique is known as Pulse Width
Modulation [PWM], a very useful feature of a VCO.
8
Dr. T. Itagaki
EE3107B: Digital Sound and Music Production
Figure 3.2.: Square Wave.
Many synthe sizers allow two or more oscillators to be mixed together. The resultant
waveform is simply the sum of the waveforms (i.e. both added together). The resultant
harmonic series would also be the sum of the harmonics. Some synthesizers only have
on/off switches for its various selectable waveforms. Switching on two or more
waveforms will produce a new waveform based on the sum of the parts.
Where two oscillators in slightly different frequency are mixed together, the result is still
the sum of the waveforms. The diagram below shows the resultant waveforms. The top
and middle waves are the two individual detuned waves, while the bottom wave is the
added result. The left shows 2 detuned Square waves added together and the right shows
2 detuned Tri-waves added together.
Figure 3.3.: Modulation.
The resultant waveform is quite dynamic. It changes over time, due to the interference of
the detuning. This evolution of waveform over time gives a sense of movement to the
timbre because of the "phase" differences.
Sometimes the two waves are in-phase (i.e. both going up at the same time) and sometimes
they are out-of-phase (i.e. one going up and one going down). Mostly, it's something in
between. Note that, when they are in-phase, the resultant amplitude is strong (i.e. it is
loud) and, when it is out-of-phase, the resultant amplitude is weak (i.e. it is quiet). This
9
Dr. T. Itagaki
EE3107B: Digital Sound and Music Production
will cause a bit of "tremolo" or beating of amplitude (The greater the detuning, the faster
the rate or speed of the tremolo). The tremolo effect can be reduced by making one
oscillator louder than the other.
Voltage-Controlled Filter
The controlled variable of a typical Voltage-Controlled Filter [VCF] is either the cut
off/centre frequency or the "Q" value [resonance]. For example, when the "Q" is kept
constant [constant Q] and the frequency is set to track at a fixed harmonic spacing to the
fundamental of a compound wave, such as a square or saw-tooth function, a variety of
consistent timbres can be generated at different pitches.
Voltage-Controlled Amplifier
The controlled variable of a typical Voltage-Controlled Amplifier [VCA] is the output
amplitude. When a VCA is modulated at low frequencies, the result is a "tremolo" effect,
but higher rates of modulation fuse the spectra to give sum and difference frequencies, a
phenomenon known as "Amplitude Modulation" [AM] or “Ring Modulation”.
Figure 3.4.: Source Signal (500 Hz sine wave).
Figure 3.5.: Modified Signal (CV 100 Hz, Source 500 Hz).
10
Dr. T. Itagaki
EE3107B: Digital Sound and Music Production
Figure 3.6.: Frequency Spectrum of Modified Signal (500 ± 100 Hz).
The side-band frequencies are generated as the source frequency [carrier] plus the control
voltage frequency [modifier] and the source frequency minus the control voltage frequency.
When the control voltage becomes negative, the output of the VCA is zero. If, however,
the design is modified to create a four-quadrant multiplier with two- inputs and one-output,
a device known as a "ring modulator", the negative control voltages can be used to produce
negative frequencies with reversed phases by reflection.
It is significant to note that the early digital software synthesis systems allowed easy
replication of many of the features in analogue voltage-controlled systems, and many of
the techniques are still widely used in the current generation of software synthesis program,
for example CSOUND.
Frequency Modulation
The theory behind the Frequency Modulation dates back to the early twentieth century.
In this case, FM means that of used in radio communications, in order of MHz.
Chowning applied and explored the technique in the sound spectrum for musical synthesis
purposes, commonly referred to as "simple FM" or "Chowning's FM", where a "carrier"
oscillator is modulated in frequency by a "modulator" oscillator. In this case, the
frequency range of both “carrier” and “modulator” are in audible range; 20 Hz - 20 KHz.
Before the development of Chowning's FM method, most digitally generated sounds were
produced by means of fixed wave forms based on fixed spectrum techniques, a
consequence of the high computational costs of time- varying additive and subtractive
synthesis. Chowning developed the FM technique as an efficient way of generating
synthetic sounds that have time-varying spectral characteristics. In 1975, Yamaha [known
as "Nippon Gakki" at that time] obtained a licence for the patent. In 1980, this Japanese
firm introduced the algorithm as a hardware fabrication for the GS1 digital synthesiser.
11
Dr. T. Itagaki
EE3107B: Digital Sound and Music Production
The basic FM technique is such that a carrier oscillator is modulated in frequency by a
modulator oscillator. When the carrier and the modulator are both sine waves, the
equation for a frequency modulated signal at time t is;
A ? sin ( Ct ? [I ? sin ( Mt )])
where
A:
amplitude of the carrier
I:
index of modulation
Ct ? 2? ? C , Mt ? 2? ? M
The positions of the modulated side-band frequencies depend on the ratio of the carrier to
the modulator frequency; "C:M ratio". The side-bands are multiples of the carrier and
modulator; C + nM and C -nM, where n is an integer number. If the "difference"
frequencies turn negative, these are folded over to the positive side with phase inversion:
the wave forms flip over the x-axis. This "fold-over" can cause cancellation of the
positive partials if the negative partials overlap exactly with the positive counterparts. In
the case of a digital implementation, this "fold-over" also occurs where the "sum"
frequencies exceed the Nyquist limit a phenomenon known as aliasing.
12
Dr. T. Itagaki
Download