# L6 CH06 Digital Audio

Basics of Digital Audio
Chapter 6
http://groups.yahoo.com/group/Multimedia_CS471
Textbook:
Fundamental of Multimedia, Ze-Nian Li. Mark S.
Drew, International Edition, 2004
Basics of Digital Audio
•
•
•
•
Digitization of sound
MIDI: Musical Instrument Digital Interface
Quantization and Transmission of Audio
Further Exploration
6.1 Digitization of Sound
6.1.1 What is sound?
• Sound is a wave phenomenon like light, but it is
macroscopic and involves molecules of air being
compressed and expanded under the action of some
physical device.
• Without air there is no sound.
• For using digital version of sound waves, we must form
digitized representation of audio information.
• Sound waves can be measured or detected by
measuring the pressure level at a location, using a
transducer to convert pressure to voltage level.
Figure 6.1 An analog signal
6.1.2 Digitization
• Digitization means conversion to a stream of
numbers-preferably integers for efficiency.
• To fully digitize the signal shown, in two-dimensional, we have to sample in each dimension-in time
and in amplitude.
• Sampling is sampling the analog signal in the time
dimension, and the rate at which it is performed is
called the sampling frequency.
• Quantization is sampling the analog signal in the
amplitude dimension.
• For audio signal sampling rates are from 8
kHz(8,000 samples per second) to 48kHz.
Quantization & Sampling
Amplitude
Quantization Levels
Sampling
t (seconds)
Digitization
• The human ear can hear from 20Hz to as much as
20kHz. Above this range is the ultrasound.
• The human voice can reach approximately 4kHz.
• Typical sampling rate from 8 to 48 kHz.
• Typical uniform quantization rates are 8-bit and 16-bit;
8-bit quantization divides the vertical axis into 256
levels, and 16-bit divides it into 65,536 levels.
• How to digitize audio data? To have an answer, we have
1. What is the sampling rate?
2. How finely is the data to be quantized, and is the
quantization uniform?
3. How is audio data formatted(i.e., what is the file
format)?
6.1.3 Nyquist Theorem
• Signals can be decomposed into a sum of sinusoids.
Figure 6.3 shows how weighted sinusoids can build
up quite a complex signal.
• Harmonics are defined as any series of musical
tones whose frequencies are integral multiple of a
frequency of a fundamental tone.
• Nyquist frequency is the sampling frequency that is
used to retrieve the original frequency signal
without aliasing.
fsambling ≥ 2 fmax
Digitization Of Sound
• In audio digitization, two components form the
composite sinusoidal signal of the actual sound –
the fundamental sine wave and the harmonics.
• Amplitude = intensity
• Frequency = pitch
sine wav pattern
derive from y=sin x graph
Digitization Of Sound
•Sound is a continuous wave that travels through the air, at around 343 m/s.
• without air there is no sound !!! (like in space)
•The wave is made up of pressure differences.
•Sound is detected by measuring the pressure level at a certain location.
•Sound waves have normal wave properties
– Reflection (bouncing)
– Refraction (change of angle when entering a medium with different
density)
– Diffraction (bending around an obstacle)
•This makes the design of “surround sound” possible
Digitization Of Sound
• Human ears can hear in the range of 20 Hz (a deep
rumble) to about 20 kHz and at intensities starting –
10dB to 25 dB.
• The intensity of sound can be measured in terms of
Sound Pressure Level (SPL) in decibels (dBs).
• This changes with age, as the sensitivity of our
hearing usually reduces.
6.1.4 Signal-to-Noise Ratio(SNR)
• It is a measure of the quality of a signal. It is
the ratio of the power of the correct signal to
the noise.
• The SNR is usually measured in decibels (dB),
where 1 dB is a tenth of a bel.
SNR = 10 log10 V2sgnal/V2noise
= 20 log10 Vsignal/Vnoise
Table 6.1 Magnitudes of common sounds,
in decibels
Threshold of hearing
Rustle of leaves
Very quiet room
Average room
Conversation
Busy street
Train through station
Riveter
Threshold of discomfort
Threshold of pain
Damage to eardrum
0
10
20
40
60
70
80
90
100
120
140
160
6.1.5 Signal-to-Quantization-Noise
Ratio (SQNR)
• For digital signals, only quantized values are stored.
• For a digital audio signal, the precision of each
sample is determined by the number of bits per
sample, typically 8 or 16.
the digital signal. Although it is not really “noise”, it
is called quantization noise (or quantization error).
• The quality of the quantization is characterized by
the signal-to-quantization-noise ratio (SQNR).
Signal-to-Quantization-Noise Ratio (SQNR)
• Quantization noise is defined as the difference
between the value of the analog signal, for the
particular sampling time, and the nearest interval
value.
• For a quantization accuracy of N bits per sample,
the range of the digital signal is -2N-1 to 2N-1 -1.
SQNR = 20 log10 Vsignal/Vquan_noise
= 20 log10 2N-1/1/2 = 20 x N x log2 = 6.02N(dB).
Digitization Of Sound
•Sound is a continuous wave that travels through the air, at around 343 m/s.
• without air there is no sound !!! (like in space)
•The wave is made up of pressure differences.
•Sound is detected by measuring the pressure level at a certain location.
•Sound waves have normal wave properties
– Reflection (bouncing)
– Refraction (change of angle when entering a medium with different
density)
– Diffraction (bending around an obstacle)
•This makes the design of “surround sound” possible
Digitization Of Sound
• Human ears can hear in the range of 20 Hz (a deep
rumble) to about 20 kHz and at intensities starting –
10dB to 25 dB.
• The intensity of sound can be measured in terms of
Sound Pressure Level (SPL) in decibels (dBs).
• This changes with age, as the sensitivity of our
hearing usually reduces.
Digitization Of Sound
• A turbojet engine might be as loud as 165 dB.
• A car driving on the highway, about 100 dB.
• We cannot hear very high frequencies outside
our hearing range and neither can we hear very
low ones.
Digitization Of Sound
• Sounds around us happen in a wide range of frequency
and intensity.
• For example, A jet engine is a very high amplitude sound
and it is also in the high frequency area.
• Our speech sounds spread across many frequencies and
vary in intensity.
Digitization Of Sound
Digitization In General
• Microphones and video cameras produce analog
signals (continuous-valued voltages)
Digitization Of Sound
• In audio digitization, two components form the
composite sinusoidal signal of the actual sound –
the fundamental sine wave and the harmonics.
• Amplitude = intensity
• Frequency = pitch
sine wav pattern
derive from y=sin x graph
10010010000100
Voice
Sine wave
Digitization Of Sound
• To get audio or video into a computer, we must
digitize it (convert it into a string of numbers)
• So, we have to understand discrete sampling
• Sampling -- divide the horizontal axis (the time
dimension) into discrete pieces. Uniform sampling
is ubiquitous (everywhere at once).
Digitization Of Sound
• Quantization (sampling in the amplitude)-- divide the
vertical axis (signal strength) into pieces. Sometimes, a
non-linear function is applied.
• 8-bit quantization divides the vertical axis into 256
levels.
• 16 bit gives you 65536 levels.
• Digital audio is a real representation of sound in the
form of bits and bytes.
Quantization & Sampling
Amplitude
Quantization Levels
Sampling
t (seconds)
Audio Quality vs. Data Rate
Bits
per
Sample
Mono /
Stereo
Data Rate
(if Uncompressed)
Telephone 8
8
Mono
8 KBytes/sec
11.025
22.050
8
16
Mono
Stereo
11.0 KBytes/sec
88.2 KBytes/sec
CD
44.1
16
Stereo
176.4 KBytes/sec
DAT
48
16
Stereo
192.0 KBytes/sec
DVD
Audio
192
24
Stereo
1,152.0
KBytes/sec
Quality
Sample
Rate(KHz)
Frequency
Band
200-3,400
Hz
20-20,000
Hz
20-20,000
Hz
20-20,000
Hz
Quantization & Sampling
Question:
Given a CD-quality musical piece to be sampled (sampling
rate per second) for 10 minutes; Calculate the size of the file
in mono and stereo.
CD sample store in 16 [bit] and Sample Rate 44.1 [KHz]
File Size in Mono =44.1 x 103 x 16 x (10 x 6 0) / 8*1024*1024
= 50.486 [MB]
For stereo (left/right) channel, the amount is doubled:
50.486MBytes x 2 = 100.936 MBytes
Quantization & Sampling
44.1 x 103 x 16 bit x (10 x 60)
(mono)
=
=
=
=
423360000 bits (/8)
52920000 bytes (/1024)
51679.688 KB (/1024)
50.468 MB
Stereo (mono x 2)
=
=
50.468 x 2
100.936 MB
That’s why we need COMPRESSION!
Digitizing Audio
Questions for producing digital audio (Analogto-Digital Conversion):
1.How often do you need to sample the signal?
2.How good is the signal?
3.How is audio data formatted?
Nyquist Theorem
• Suppose we are sampling a waveform. How often do we need
to sample it to figure out its frequency?
• The Nyquist Theorem, also known as the sampling theorem, is a
principle that engineers follow in the digitization of analog signals.
• For analog-to-digital conversion (ADC) to result in a faithful
reproduction of the signal, slices, called samples, of the analog waveform
must be taken frequently.
• The number of samples per second is called the sampling rate or
sampling frequency.
Nyquist Theorem
• If we sample only once per cycle (blue area), we may
think the signal is a constant.
Nyquist Theorem
• If we sample at another low rate, e.g., 1.5 times per
cycle, we may think it's a lower frequency waveform
Nyquist Theorem
• Nyquist rate -- It can be proven that a bandwidth-limited
signal can be fully reconstructed from its samples, if the
sampling rate is at least twice the highest frequency in the
signal.
• The highest frequency component, in hertz, for a given
analog signal is fmax.
• According to the Nyquist Theorem, the sampling rate must
be at least 2(fmax ), or twice the highest analog frequency
component
Typical Audio Formats
• Popular audio file formats include .au (Unix
workstations), .aiff (MAC), .wav (PC etc)
• A simple and widely used audio compression
method is Adaptive Delta Pulse Code Modulation
– Based on past samples, it predicts the next sample and
encodes the difference between the actual value and the
predicted value.
Audio Quality vs. Data Rate
Bits
per
Sample
Mono /
Stereo
Data Rate
(if Uncompressed)
Telephone 8
8
Mono
8 KBytes/sec
11.025
22.050
8
16
Mono
Stereo
11.0 KBytes/sec
88.2 KBytes/sec
CD
44.1
16
Stereo
176.4 KBytes/sec
DAT
48
16
Stereo
192.0 KBytes/sec
DVD
Audio
192
24
Stereo
1,152.0
KBytes/sec
Quality
Sample
Rate(KHz)
Frequency
Band
200-3,400
Hz
20-20,000
Hz
20-20,000
Hz
20-20,000
Hz
Audio Quality vs. Data Rate
• For 44.1 KHz, CD quality recording, one sample is
taken every 1 / 44.1x103 = 22.675 s (microsecond)
for a single channel
• For DVD quality recording, one sample is taken every
1/ 192,000 = 5.20 s (microsecond) for a single
channel. You can expect it to have 4 times better
accuracy and fidelity than a CD!
Applying Digital Audio In MM Systems
• When using audio in a multimedia system, you have
to consider some things, like :
–
–
–
–
–
–
–
–
The format of the original file (ex. WAV)
The overall amplitude (is it loud enough)
Trimming (how long do you want it to be?)
Time stretching
Frequency and channels (ex. 44.1KHz, Stereo)
Effects
File size
Applying Digital Audio In MM Systems
 The format we usually work with is WAV.
 It is uncompressed but allows us to preserve the highest
quality.
 If the waveform isn’t loud enough, you can either ‘normalize’
it, which increases the amplitude as high as it can get without
clipping (pops).
 Or if it still isn’t loud enough, you might need to use
dynamics processing, which can enhance vocals over
instruments.
 For example, this is useful for making heavy metal songs
(which usually have portions of whispering and loud guitar
solos) more palatable to the audience using such a MM
system.
Normalization
Before Normalization
After Normalization
Applying Digital Audio In MM Systems
 Sometimes, we may only want a portion of the music file,
so we need to ‘trim’ it.
 This can be done by selecting the portion you want to keep
and then, executing the ‘trim’ command.
 Usually, after trimming, we’re left with a sample that ‘just
starts’ and ‘suddenly ends’.
 To make it more pleasing to the ear, fade-ins and fade-outs
are performed.
 This is usually done to the first and last 5 seconds of the
waveform (depending on how long your sample is), to
make it seem as if the clip is just starting and ends properly
Trimming
A portion of the
waveform is selected
And then, the ‘trimming’
function removes
everything else, leaving
just the part we require
The front portion (usually
first 5 seconds) of the
trimmed clip is selected
function is executed
The resulting clip
The front portion (usually
first 5 seconds) of the
trimmed clip is selected
function is executed
The resulting clip
Applying Digital Audio In MM Systems
 Some sound files end abruptly with a simple drum beat or
scream.
 If it’s too short, you can always ‘time-stretch’ that portion.
 This powerful function usually distorts the waveform a little.
 However, for a sharp, loud shriek at the end (of a song, for
example), stretching it to last maybe 0.25 seconds longer, might
actually make it sound better.
 Time stretching is also useful when your audio stream doesn’t
quite match the video stream.
 It can then be used to equal their lengths for synchronization.
The better way, is of course, to adjust the video frame rate or
delete some scenes, but that is another story.
Applying Digital Audio In MM Systems
 You also need to decide, what frequency you wish to keep
the audio sample in.
 If it’s a song, CD quality is standard (known as Red Book or
ISO 10149).
 If it’s merely narration, a lower frequency of about 11.025
KHz, and a single channel (mono) is sufficient.
 You can also do the same for simple music, like reducing
the frequency to 22.05 KHz, which still sounds good.
Applying Digital Audio In MM Systems
Another technique is resampling.
This involves reducing the bitrate from 16 to 8 or 24 to
16 and so forth.
Resampling usually has a greater effect on the overall
sound quality (lower bitrate means poorer quality).
So, it’s often best to retain the bitrate, but reduce the
sample frequency.
All of this is done to save precious space and memory.
Applying Digital Audio In MM Systems
 There are times when you need to add certain effects
 Unlike the things we can do with images, audio effects
are not always so obvious.
 For example, we can mimic the stereo effect of an
inherently mono signal by duplicating the waveform
(creating two channels from the single one), and then,
slightly delaying one of them (left or right) to create a
pseudo-stereo effect.
 Other effects or functions like reverb, allow you to
make a studio recording sound like a live performance
or simply make your own voice recording sound a little
better than it actually is.
Applying Digital Audio In MM Systems
 Finally, the file size is important. If you saved as much
memory as you can by using a lower frequency, shorter clip
and lower bitrate, you can still save more by using a good
compression scheme like mp3, which gives you a ratio of
 There are often other audio compression codecs available
to you as well.
 Remember though, that compression has its price – the
user will need to be able to decode that particular codec
and usually, more processing power is required to play the
file.
 It may also cause synchronization problems with video. All
things considered, developers usually still compress their
audio.
MIDI
• What is MIDI? MIDI is an acronym for Musical
Instruments Digital Interface
• Definition of MIDI: a protocol that enables
computers, synthesizers, keyboards, and other
musical devices to communicate with each other.
• It is a set of instructions how a computer should play
musical instruments.
Terminologies:
Synthesizer:
• It is a sound generator (various pitch, loudness etc.).
• A good (musician's) synthesizer often has a
microprocessor, keyboard, control panels, memory,
etc.
Terminologies:
Sequencer:
• It can be a stand-alone unit or a software program for a
personal computer. (It used to be a storage server for MIDI
data. Nowadays it is more a software music editor on the
computer.)
• It has one or more MIDI INs and MIDI OUTs.
Terminologies:
Track:
• Track in the sequencer is used to organize the recordings.
• Tracks can be turned on or off on recording or playback.
• To illustrate, one might record an oboe melody
line on Track Two, then record a bowed bass line
on Track Three.
• When played, the sounds can be simultaneous.
• Most MIDI software now accommodates 64
tracks of music, enough for a rich orchestral
sound.
• Important: Tracks are purely for convenience;
channels are required.
Terminologies:
Channel:
• MIDI channels are used to separate information in a MIDI
system.
• There are 16 MIDI channels in one cable.
• Each channel address one MIDI instrument.
• Channel numbers are coded into each MIDI message.
Terminologies:
Timbre:
• The quality of the sound, e.g., flute sound,
cello sound, etc.
• Multi-timbral -- capable of playing many
different sounds at the same time (e.g., piano,
brass, drums, etc.)
Pitch:
• musical note that the instrument plays
Terminologies:
Voice:
• Voice is the portion of the synthesizer that produces sound.
• Synthesizers can have many (16, 20, 24, 32, 64, etc.) voices.
• Each voice works independently and simultaneously to
produce sounds of different timbre and pitch.
Patch:
• the control settings that define a particular timbre.
General MIDI
General MIDI
• MIDI + Instrument Patch Map + Percussion Key Map
 a piece of MIDI music (usually) sounds the same
anywhere it is played
– Instrument patch map is a standard program list consisting
of 128 patch types.
– Percussion map specifies 47 percussion sounds.
– Key-based percussion is always transmitted on MIDI
channel 10.
General MIDI
Requirements for General MIDI Compatibility:
• Support all 16 channels.
• Each channel can play a different
instrument/program (multi-timbral).
• Each channel can play many voices (polyphony).
• Minimum of 24 fully dynamically allocated voices.
General MIDI
• The playback on MIDI will only be accurate if the playback
device is identical to the one used for production.
• Even with the general MIDI standard, the sound of a MIDI
instrument varies according to the electronics of the playback
device and the sound generation method it uses.
• MIDI is also unsuitable for spoken dialog.
• MIDI usually requires a certain amount of knowledge in
music theory.
Application Of MIDI
•
•
•
•
Is MIDI suitable for MM Systems?
Sometimes it is, sometimes not.
A webpage is a valid MM system.
If the choice of music is not complex and repetitive,
then MIDI is ideal for playing in the background,
because it is small and compatible.
• Otherwise, most MM Systems work with digital
audio, such as WAV, mp3 etc.
MIDI IN-OUT
Example
A musician pushes down (and holds down) the
middle C key on a keyboard.
This causes a MIDI Note-On message to be sent
out of the keyboard's MIDI OUT jack.
That message is received by the second
instrument which sounds its middle C in unison.
MIDI Application –
Cakewalk (sequencer)
Instrument Assignment
(Multi-Timbral)