Digital Representation of Audio Information

advertisement
EE513
Audio Signals and Systems
Introduction
Kevin D. Donohue
Electrical and Computer Engineering
University of Kentucky
Question!
If a tree falls in the forest and nobody
is there to hear it, will it make a sound?
Sound provided by
http://www.therecordist.com/downloads.html
Ambiguity!
• Merriam-Webster Dictionary:
• Sound a : a particular auditory impression
b : the sensation perceived by the sense of
hearing c : mechanical radiant energy that is
transmitted by longitudinal pressure waves
in a material medium (as air) and is the
objective cause of hearing.
Electronic Audio Systems
Sound Sources –
Vibrations at
20Hz-20kHz
Transmission
Media
Playback
Information
Extraction /
Measurement
Storage
Electoacoustic
Transducer
Amplification,
Signal
Conditioning
Processing for
Intended
Application
Natural Audio Systems
Synthetic Audio: Imitating Nature
 1780 Wolfgang von Kemplen’s Speaking Machine U2B
 Mid 1800’s Charles Wheatstone MR
 Late 1800’s Alexander Graham Bell
 1939 Homer Dudley’s Voder U2B
http://www.acoustics.hut.fi/~slemmett/wave/track01f.wav
 1898 Thaddeus Cahill’s Telharmonium (First Music
Synthesizer)
 1919 Lev Theremin’s Theremin U2B1 and U2B2
Speech Analysis and Synthesis
 Communication channels (acoustic and electric)
 1874/1876 (Antonio Meucci’s) Alexander
Graham Bell’s Telephone.
 1940’s Homer Dudley’s Channel Vocoder first
analysis-synthesis system
Voice-Coding Models
The general speech model:
Quasi-Periodic
Pulsed Air
Voiced Speech
Vocal Tract
Filter
Air Burst or
Continuous flow
Vocal
Radiator
Unvoiced Speech
Speech sounds can be analyzed by determining the states of the vocal
system components (vocal chords, track, lips, tongue … ) for each
fundamental sound of speech (phoneme).
Spectral Analysis Voiced Speech
Spectral envelop => vocal tract formants
Harmonic peaks => vocal chord pitch
Spectrum of Speech Segment - ah
-40
dB
-60
-80
-100
-120
0
1000
2000
Hertz
3000
4000
Time Analysis Voiced Speech
Time envelop => Volume dynamics
Oscillations => Vocal chord motion
Waveform of Speech Segment - ah
0.1
Amplitude
0.05
0
-0.05
12 ms
-0.1
0
50
100
150
Milliseconds
83 Hz
200
250
Spectrogram Analysis
There
shoe
old lived
She
do
Frequency
4000
20
10
3000
0
-10
2000
-20
-30
1000
-40
0
-50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1
0
-1
0
Time
5
Spectogram of CD sound
25
10000
20
15
Frequency
8000
10
5
6000
0
-5
4000
-10
2000
-15
-20
0
2
4
6
8
10
12
14
2
4
6
8
10
12
14
2
0
-2
0
Time
16
18
Speech Recognition
 1920’s Radio Rex
 1950’s (Bell Labs) Digit Recognition
 Spectral/Formant analysis
 Filter Banks
 1960’s Neural Networks
 1970’s ARPA Project for Speech Understanding
 Applications of spectral analysis methods FFT,
Cepstral/homomorphic, LPC
 1970’s Application of pattern matching methods DTW,
and HMM
Speech Recognition
1980’s
Standardize Training and Test with Large
Corpora (TIMIT) (RM) (DARPA)
New Front Ends (feature extractors) more
perceptually based
Dominance/Development of HMM
Backpropagation and Neural Networks U2B
Rule-Base AI systems
Specification of Speech Recognition
Speaker dependent or independent
Recognize isolated, continuous, or spot
speech
Vocabulary Size, Grammar Perplexity,
Speaking style
Recording conditions
Components of Speech Recognition
Speech Transduction
Acoustic/Electronic
Input Speech
Detected
Speech
String
Front End
Local Match
Global Detector
Language Model
Matlab Examples
%% Create and play a 2 second 440 Hz tone in Matlab:
fs = 8000; % Set a sampling frequency
fq = 440; % frequency to play
t = [0:round(2*fs)-1]/fs; % Sampled time axis
sig = cos(2*pi*fq*t); % Create sampled signal
soundsc(sig,fs) % Play it
plot(t,sig); xlabel('Seconds'); ylabel('Amplitude')
wavwrite(sig,fs,'t440.wav')
clear % Remove all variables from work space
%% Reload tone and weight it with a decaying exponential of time constant .6 seconds
tc = .6; % Set time constant
[y, fs] = wavread('t440.wav'); % read in wave file
t =[0:length(y)-1]'/fs; % Create sampled time axis
dw = exp(-t/tc); % Compute sampled decaying exponential
dsig = y.*dw; % Multiply sinusoid with decaying exponential
soundsc(dsig,fs)
plot(t,dsig); xlabel('Seconds'); ylabel('Amplitude')
Matlab Examples
Explore demo and help files
>> help script
SCRIPT About MATLAB scripts and M-files.
A SCRIPT file is an external file that contains a sequence
of MATLAB statements. By typing the filename, subsequent
MATLAB input is obtained from the file. SCRIPT files have
a filename extension of ".m" and are often called "M-files".
To make a SCRIPT file into a function, see FUNCTION.
See also type, echo.
Reference page in Help browser
doc script
In the help window (click on question mark) Go through section on
programming and then go to the demo tab and view a few of the demo.
Matlab Examples
• In class examples …
Matlab Exercise
 Use the sine/cosine function in Matlab to write a function that
generates a Dorian scale (for testing the function use start tones
between 100 and 440 Hz with a sampling rate of 8 kHz). Let
the Matlab function input arguments be the starting frequency
and the time interval for each scale tone in seconds. Let the
output be a vector of samples that can be played with Matlab
command “soundsc(v,8000)” (where v is the vector output of
your function).
 The frequency range of a scale covers one octave, which implies the last
frequency is twice the starting frequency. On most fixed pitch
instruments, 12 semi-tones or half steps make up the notes within an
octave. A minor scale sequentially increases by a whole, half, whole,
whole, half, whole, and whole (8 notes altogether – including the
starting note).
Matlab Exercise - Scales
Just
Pythagorean
Equal Temperament
Interval - 0 (1)
1/1 = 1
1=1
2^(0)=1
Interval - 1
16/15
256/243
2^(1/12)
Interval - 2 (2)
10/9 (or 9/8)
9/8
2^(2/12)
Interval - 3 (3)
6/5
32/27
2^(3/12)
Interval - 4
5/4
81/64
2^(4/12)
Interval - 5 (4)
4/3
4/3
2^(5/12)
Interval - 6
45/32 (or 64/45)
1024/729 (or 729/512)
2^(6/12)
Interval - 7 (5)
3/2
3/2
2^(7/12)
Interval - 8 (6)
8/5
128/81
2^(8/12)
Interval - 9
5/3
27/16
2^(9/12)
Interval - 10 (7)
7/4 (or 16/19 or 9/5)
16/9
2^(10/12)
Interval - 11
15/8
243/128
2^(11/12)
Interval - 12 (8)
2/1 = 2
2/1 = 2
2^(12/12) = 2
Matlab Exercise – Famous Notes
Middle C = 261.626 Hz (standard tuning)
Concert A (A above middle C) = 440 Hz
Middle C = 256 Hz (Scientific tuning)
Lowest note on piano A=27.5 Hz
Highest note on piano C= 4186.009
Download