lecture1 - Docjava.com

advertisement
Voice and Signal Processing
•
•
•
•
Professor Douglas Lyon
Lyon@docjava.com
Fairfield University
http://www.docjava.com
Two Course Texts!
• Java for Programmers
• Available from:
– http://www.docjava.com
Java Digital Signal Processing
• Java for Programmers
• Available from:
– http://www.docjava.com
Grading
•
•
•
•
Midterm: 1/3
Homework: 1/3
Final: 1/3
Midterm and Final – Take home!
Email
• Please send me an e-mail asking to be
placed on the CR310 List
• E-mail: lyon@docjava.com
Pre-reqs
• You should have CS232 and MA 172
• OR permission of the instructor
• You need a working knowledge of Java!
What do I need to learn this?
• Basic multimedia programming
– It helps implement interesting programs
– It enables active learning
– It requires a good background in Java
programming
Preliminary Java Topics
• exceptions (ch11)
• nested reference data types (ch 12)
• threads (ch13)
Preliminary IO Topics
•
•
•
•
files (ch14)
streams(15)
readers (16)
writers (17)
Preliminary GUI Topics
• Swing (ch 18)
• Events (ch 19)
What is Voice and Signal
Processing?
• 1D data processing
– input sound
– output sound
– a time varying functions are used as both input
and output.
What is Digital Signal Processing?
• A kind of data processing.
• Typically numeric data processing
–
–
–
–
–
Look at kind and DIMENSION of data.
1D in, 1D out -> DSP.
2D in, 2D out -> Image Processing
2D in, symbols out -> computer vision
3D in, 2D out -> computer graphics
What are some DSP examples?
• If the input is images and the output is images we
call it image processing
• If the input is images and the output is symbols we
call it pattern recognition or machine vision
• If the input is text and the output is voice we call it
voice synthesis.
• If the input is voice and the output is text we call it
voice recognition
• If the input is images and geometry and the output is
images we call it image warping
What are some 1D DSP
applications?
• Analysis
– weak variables -> strong variables
• Systhesis
– Strong variables - > weak variables
What are some kinds of 1D data?
• Any form of energy that can be digitized.
• Any source of data (a function in 1D).
–
–
–
–
Voice data
Sound data
Temperature data
Range, blood pressure, EEG (brain stuff), EKG
(heart stuff), weight, age…..
non-physical phenomena and DSP
• Anything that can produce a digital stream
of data is suitable for DSP
– i.e., financial data,
– statistical data,
– network traffic, etc.
What is Audio?
• Pressure wave that moves air.
• Human auditory system (ear).
• Audio is a sensation.
What is a signal?
•
•
•
•
•
signal is any sequence of values.
Stock price-a function of time
Image (2d signal)
Movie – 2d signal as a function of time
Any collection of symbols or numbers
What is a continuous signal?
• A signal that can be represented as a
function of a real-valued domain (i.e., time)
What is a discrete signal?
• A signal that can be represented by a
function over a whole number domain.
• sampling is the reduction of a continuous
signal to a discrete signal.
What is quantization?
• Given a signal with a set of values, S, create
a new signal with a set of values, S’, such
that the card(S) > card(S’).
Quantization
•
•
•
•
•
•
•
1 part of digitization
input v(t)
ouput Vq(t)
let N = the number of quantization levels.
Suppose minimum voltage is 0 vdc
Suppose max voltage is 1 vdc
What is the min quantization step?
What is digitization?
• Sampling + quantization
• Converts a continuous signal into a discrete
signal
What is Digital Signal Processing?
• Data processing with signals that are both
sampled and quantized
Compute the quantization step
• maximum voltage / total number of steps.
• For example, a CD has 16 audio sampling.
– N = 2**16 = 65536
– Voltage of quantization = 1/ 65536=0.00002
• For AU files, N = 2 ** 8 = 256
– Voltage of quantization = 1/256=.003
How do I do digitization?
A low-pass filter removes high frequencies
ADC samples the signal and quantizes it
Parallel to serial converter is a shift-register
Sampling and Quantization
What is the noise relative to the
signal?
•
•
•
•
SNR = signal to noise ratio
Log(Signal power / noise power) to base 10.
This is named after Alexander Grahm Bell
It is called the decibel (dB).
SNR
Dynamic Range
Log(2) is about 0.3, 0.3*20 = 6
sampling rate
• Nyquist–Shannon sampling theorem
• If a function f(t) contains no frequencies
higher than W cps,
• it is completely determined by giving its
ordinates at a series of points spaced
1/(2W) seconds apart.
• W=10Hz, then sample at 20 Hz
aliasing
• Sampling artifact that occurs when
sampling below the Nyquest rate.
• High frequencies can be reconstructed as
low frequencies.
• Images can have interference patterns
What is an anti-aliasing filter?
Low-pass filter
What is oversampling?
• When you sample at higher than Nyquest
General Analysis for the ADC
The role of the low-pass filter
• anti-aliasing filter
• Nyquest frequency = sample freq /2
• only pass freqs below Nyquest Frequency
How do I reconstruct a signal?
sample/reconstruction process
R
Am plifier
v(t)
fs
low-pass
filter
output
Digitizing Voice: PCM
Waveform Encoding
• Nyquist Theorem: sample at twice the
highest frequency
– Voice frequency range: 300-3400 Hz
– Sampling frequency = 8000/sec (every 125us)
– Bit rate: (2 x 4 Khz) x 8 bits per sample
– = 64,000 bits per second (DS-0)
• By far the most commonly used method
CODEC
PCM
= DS-0
64 Kbps
In 1D, DSP Is…
• 1D Digital signal processing is a kind of
data processing that operates on 1D PCM
data.
O-scope
Harmonics
• The fundamental frequency of a sound is
said to be the component of strongest
magnitude.
• Few sounds are just sine waves.
• The extra waves in a sound refer to the
harmonic content or timbre.
Harmonic formula
• A harmonic is a numeric multiple of
pitches.
• If 440 Hz is the 1st harmonic then
• 880 Hz is the 2nd harmonic
• Individual sine waves are called partials.
Harmonic Motion
The frequency of the oscillations is given by
How do I model Spectra?
• Suppose the continuous signal is v(t)
• Let the Fourier coefficients be denoted:
a0 ,a1,b1, a2 ,b2
v(t)  a0  (a1 cost  b1 sin t)  (a2 cos2t  b2 sin 2t)
Sawtooth Wave Form
K=10
Model of a Saw Wave
f (x) 
2
K


n1
1
(n 1)
sin(nx)
n
Sawwave k=100
Example: a 4 voice synthesizer
• Design a program that can:
– Play sound
– Provide a GUI for determining the amplitudes
of up to 7 harmonics
– Enable the user to alter the frequency for the
fundamental tone.
– Enable the playing of 4 voices
– Enable the control of the overall volume.
Building an Oscillator in software
• //the period of the wave form is
•
lambda = 1 / frequency in seconds
• //The number of samples per period is
•
samplesPerCycle = sampleRate *
lambda;
• sampleRate = 8000 samples/ second
Fourier transform

V( f )  F[v(t )] 
2 ift
v(t
)e
dt



v(t)  F
1
V( f )   V( f )e

2 ift
dt
How do you compute the Fourier
Coefficients?
• Use the Fourier transform!
v(t)  a0  (a1 cost  b1 sin t)  (a2 cos2t  b2 sin 2t)

V( f )  F[v(t )] 
2 ift
v(t
)e
dt


v(t)  F 1 V( f ) 

2 ift
V(
f
)e
dt


Recall Euler’s identity
• Complex numbers have a real and
imaginary part:
i
e  cos  i sin 
Another way to express a function
v(t)  a0  (a1 cost  b1 sin t)  (a2 cos2t  b2 sin 2t)
f 0  frequency
nf 0  nth harmonic off 0
Sine-Cosine Representation


n0
n1
x(t)   an cos(2nf 0 t)   bn sin(2nf 0 t)
f 0  frequency
nf 0  nth harmonic off 0
Correlation
• Fourier coefficients, are found by
correlating the time dependent function,
x(t), with a Nth harmonic sine-cosine pair:
1
a0 
T
2
an 
T
2
bn 
T
T
 x(t)dt
0
T
 x(t)cos(2nf
0
0
t)dt
T
 x(t )sin(2nf
0
0
t)dt
amplitude-phase representation

x(t) = c 0   c n cos(2f 0 t   n )
n1
1 T
c0   x(t)dt
T 0
cn  a2n  bn2
bn 
 n   tan  
an 
1
Average Power
1
P
t1  t2
1
P
T

T
0

t2
t1
x(t)
2
2
x (t ) dt
Periodic signal avg power
PSD (Power Spectral Density)
• is the power at a
specific frequency, .
S( f )
Linear combinations in the time
domain become linear combinations
in the frequency domain
aV
1 1 ( f )  a2V2 ( f )  F[a1v1 (t )  a2v1 (t )]
Delay in the time domain causes a
phase shift in the frequency domain
2 if
V ( f )e
 F (v(t  td ))
Scale change in the time domain
causes a reciprocal scale change in
the frequency domain
f 
V    F (v( t )),   0
  
1
convolution theorem: multiplication
in the time domain causes
convolution in the frequency domain
V *W ( f )  F (v(t )w(t ))
Convolution between two functions
of the same variable is defined by

V *W ( f )   V ( )W ( f   )d 

Various Codec Bandwidth
Consumptions
Encoding/
Compression
Standard
Transmission
Rate for Voice
G.711 PCM
A-Law/u-Law
Result
Bit Rate
64 kbps (DS0)
G.726 ADPCM
16, 24, 32, 40 kbps
G.727 E-ADPCM
16, 24, 32, 40 kbps
G.729 CS-ACELP
8 kbps
G.728 LD-CELP
16 kbps
G.723.1 CELP
6.3/5.3 kbps
Variable
A means to improve SNR
• Compression uses a coder and a decoder.
• One CODEC is called U-Law.
• U-Law runs at 8 khz sampling and 8 bits per
digitized sample.
• ULaw is meant for voice.
Voice grade audio-Application
•
•
•
•
voice over IP
Voice ranged to about 3.4 khz
Sample at 8 Khz, that should be plenty
Quantize to 8 bits of data (about 48 db
SNR)
• Improve the SNR with compression
Voice Quality of Service (QoS)
Requirements
Avoiding The 3 Main QoS Challenges
Loss
Delay
Delay Variation (Jitter)
The u-law codec
• X is a number whose range is 0..255
• Log, to the base 2 of X is a number whose
range is 0..8
• U-law uses a scale factor (mu) that
multiplies the input before log is taken.
• Log (x), base 2 = Log(x)/Log(2)
• Mu-law takes the log to the base 1+mu.
Download