Voice and Signal Processing • • • • Professor Douglas Lyon Lyon@docjava.com Fairfield University http://www.docjava.com Two Course Texts! • Java for Programmers • Available from: – http://www.docjava.com Java Digital Signal Processing • Java for Programmers • Available from: – http://www.docjava.com Grading • • • • Midterm: 1/3 Homework: 1/3 Final: 1/3 Midterm and Final – Take home! Email • Please send me an e-mail asking to be placed on the CR310 List • E-mail: lyon@docjava.com Pre-reqs • You should have CS232 and MA 172 • OR permission of the instructor • You need a working knowledge of Java! What do I need to learn this? • Basic multimedia programming – It helps implement interesting programs – It enables active learning – It requires a good background in Java programming Preliminary Java Topics • exceptions (ch11) • nested reference data types (ch 12) • threads (ch13) Preliminary IO Topics • • • • files (ch14) streams(15) readers (16) writers (17) Preliminary GUI Topics • Swing (ch 18) • Events (ch 19) What is Voice and Signal Processing? • 1D data processing – input sound – output sound – a time varying functions are used as both input and output. What is Digital Signal Processing? • A kind of data processing. • Typically numeric data processing – – – – – Look at kind and DIMENSION of data. 1D in, 1D out -> DSP. 2D in, 2D out -> Image Processing 2D in, symbols out -> computer vision 3D in, 2D out -> computer graphics What are some DSP examples? • If the input is images and the output is images we call it image processing • If the input is images and the output is symbols we call it pattern recognition or machine vision • If the input is text and the output is voice we call it voice synthesis. • If the input is voice and the output is text we call it voice recognition • If the input is images and geometry and the output is images we call it image warping What are some 1D DSP applications? • Analysis – weak variables -> strong variables • Systhesis – Strong variables - > weak variables What are some kinds of 1D data? • Any form of energy that can be digitized. • Any source of data (a function in 1D). – – – – Voice data Sound data Temperature data Range, blood pressure, EEG (brain stuff), EKG (heart stuff), weight, age….. non-physical phenomena and DSP • Anything that can produce a digital stream of data is suitable for DSP – i.e., financial data, – statistical data, – network traffic, etc. What is Audio? • Pressure wave that moves air. • Human auditory system (ear). • Audio is a sensation. What is a signal? • • • • • signal is any sequence of values. Stock price-a function of time Image (2d signal) Movie – 2d signal as a function of time Any collection of symbols or numbers What is a continuous signal? • A signal that can be represented as a function of a real-valued domain (i.e., time) What is a discrete signal? • A signal that can be represented by a function over a whole number domain. • sampling is the reduction of a continuous signal to a discrete signal. What is quantization? • Given a signal with a set of values, S, create a new signal with a set of values, S’, such that the card(S) > card(S’). Quantization • • • • • • • 1 part of digitization input v(t) ouput Vq(t) let N = the number of quantization levels. Suppose minimum voltage is 0 vdc Suppose max voltage is 1 vdc What is the min quantization step? What is digitization? • Sampling + quantization • Converts a continuous signal into a discrete signal What is Digital Signal Processing? • Data processing with signals that are both sampled and quantized Compute the quantization step • maximum voltage / total number of steps. • For example, a CD has 16 audio sampling. – N = 2**16 = 65536 – Voltage of quantization = 1/ 65536=0.00002 • For AU files, N = 2 ** 8 = 256 – Voltage of quantization = 1/256=.003 How do I do digitization? A low-pass filter removes high frequencies ADC samples the signal and quantizes it Parallel to serial converter is a shift-register Sampling and Quantization What is the noise relative to the signal? • • • • SNR = signal to noise ratio Log(Signal power / noise power) to base 10. This is named after Alexander Grahm Bell It is called the decibel (dB). SNR Dynamic Range Log(2) is about 0.3, 0.3*20 = 6 sampling rate • Nyquist–Shannon sampling theorem • If a function f(t) contains no frequencies higher than W cps, • it is completely determined by giving its ordinates at a series of points spaced 1/(2W) seconds apart. • W=10Hz, then sample at 20 Hz aliasing • Sampling artifact that occurs when sampling below the Nyquest rate. • High frequencies can be reconstructed as low frequencies. • Images can have interference patterns What is an anti-aliasing filter? Low-pass filter What is oversampling? • When you sample at higher than Nyquest General Analysis for the ADC The role of the low-pass filter • anti-aliasing filter • Nyquest frequency = sample freq /2 • only pass freqs below Nyquest Frequency How do I reconstruct a signal? sample/reconstruction process R Am plifier v(t) fs low-pass filter output Digitizing Voice: PCM Waveform Encoding • Nyquist Theorem: sample at twice the highest frequency – Voice frequency range: 300-3400 Hz – Sampling frequency = 8000/sec (every 125us) – Bit rate: (2 x 4 Khz) x 8 bits per sample – = 64,000 bits per second (DS-0) • By far the most commonly used method CODEC PCM = DS-0 64 Kbps In 1D, DSP Is… • 1D Digital signal processing is a kind of data processing that operates on 1D PCM data. O-scope Harmonics • The fundamental frequency of a sound is said to be the component of strongest magnitude. • Few sounds are just sine waves. • The extra waves in a sound refer to the harmonic content or timbre. Harmonic formula • A harmonic is a numeric multiple of pitches. • If 440 Hz is the 1st harmonic then • 880 Hz is the 2nd harmonic • Individual sine waves are called partials. Harmonic Motion The frequency of the oscillations is given by How do I model Spectra? • Suppose the continuous signal is v(t) • Let the Fourier coefficients be denoted: a0 ,a1,b1, a2 ,b2 v(t) a0 (a1 cost b1 sin t) (a2 cos2t b2 sin 2t) Sawtooth Wave Form K=10 Model of a Saw Wave f (x) 2 K n1 1 (n 1) sin(nx) n Sawwave k=100 Example: a 4 voice synthesizer • Design a program that can: – Play sound – Provide a GUI for determining the amplitudes of up to 7 harmonics – Enable the user to alter the frequency for the fundamental tone. – Enable the playing of 4 voices – Enable the control of the overall volume. Building an Oscillator in software • //the period of the wave form is • lambda = 1 / frequency in seconds • //The number of samples per period is • samplesPerCycle = sampleRate * lambda; • sampleRate = 8000 samples/ second Fourier transform V( f ) F[v(t )] 2 ift v(t )e dt v(t) F 1 V( f ) V( f )e 2 ift dt How do you compute the Fourier Coefficients? • Use the Fourier transform! v(t) a0 (a1 cost b1 sin t) (a2 cos2t b2 sin 2t) V( f ) F[v(t )] 2 ift v(t )e dt v(t) F 1 V( f ) 2 ift V( f )e dt Recall Euler’s identity • Complex numbers have a real and imaginary part: i e cos i sin Another way to express a function v(t) a0 (a1 cost b1 sin t) (a2 cos2t b2 sin 2t) f 0 frequency nf 0 nth harmonic off 0 Sine-Cosine Representation n0 n1 x(t) an cos(2nf 0 t) bn sin(2nf 0 t) f 0 frequency nf 0 nth harmonic off 0 Correlation • Fourier coefficients, are found by correlating the time dependent function, x(t), with a Nth harmonic sine-cosine pair: 1 a0 T 2 an T 2 bn T T x(t)dt 0 T x(t)cos(2nf 0 0 t)dt T x(t )sin(2nf 0 0 t)dt amplitude-phase representation x(t) = c 0 c n cos(2f 0 t n ) n1 1 T c0 x(t)dt T 0 cn a2n bn2 bn n tan an 1 Average Power 1 P t1 t2 1 P T T 0 t2 t1 x(t) 2 2 x (t ) dt Periodic signal avg power PSD (Power Spectral Density) • is the power at a specific frequency, . S( f ) Linear combinations in the time domain become linear combinations in the frequency domain aV 1 1 ( f ) a2V2 ( f ) F[a1v1 (t ) a2v1 (t )] Delay in the time domain causes a phase shift in the frequency domain 2 if V ( f )e F (v(t td )) Scale change in the time domain causes a reciprocal scale change in the frequency domain f V F (v( t )), 0 1 convolution theorem: multiplication in the time domain causes convolution in the frequency domain V *W ( f ) F (v(t )w(t )) Convolution between two functions of the same variable is defined by V *W ( f ) V ( )W ( f )d Various Codec Bandwidth Consumptions Encoding/ Compression Standard Transmission Rate for Voice G.711 PCM A-Law/u-Law Result Bit Rate 64 kbps (DS0) G.726 ADPCM 16, 24, 32, 40 kbps G.727 E-ADPCM 16, 24, 32, 40 kbps G.729 CS-ACELP 8 kbps G.728 LD-CELP 16 kbps G.723.1 CELP 6.3/5.3 kbps Variable A means to improve SNR • Compression uses a coder and a decoder. • One CODEC is called U-Law. • U-Law runs at 8 khz sampling and 8 bits per digitized sample. • ULaw is meant for voice. Voice grade audio-Application • • • • voice over IP Voice ranged to about 3.4 khz Sample at 8 Khz, that should be plenty Quantize to 8 bits of data (about 48 db SNR) • Improve the SNR with compression Voice Quality of Service (QoS) Requirements Avoiding The 3 Main QoS Challenges Loss Delay Delay Variation (Jitter) The u-law codec • X is a number whose range is 0..255 • Log, to the base 2 of X is a number whose range is 0..8 • U-law uses a scale factor (mu) that multiplies the input before log is taken. • Log (x), base 2 = Log(x)/Log(2) • Mu-law takes the log to the base 1+mu.