Reversing Multiplication and Reversing Convolution: Important Ill-Posed Operations Les Atlas Professor and Bloedel Scholar This research funded by the Army Research Office, The Coulter Foundation, and the Office of Naval Research June 25, 2015 1 Overview • Why convolution for speech, audio, and other signals: – Why is it so important? • Why modulation for speech, audio, and other signals: – Why is it so important? • How are convolution and modulation mathematical duals of each other? – Their duality is related the discrete-time Fourier transform • • • How to “undo” convolution? Deconvolution, the original inspiration for cepstral coefficients. How to “undo” modulation? Demodulation, the original problem studied, before deconvolution, but side-stepped since it was too hard in the 1960’s. Why are deconvolution and demodulation hard? – They are likely not well-posed, in the sense of Hadamard. atlas@ee.washington.edu 2 Discrete-Time Systems’ Notation • Discrete Time System: A discrete time system maps an input sequence, x[ n], to a new sequence, the output sequence y[n] . atlas@ee.washington.edu 3 Discrete-Time Systems’ Convolution • Convolution: A linear and time-invariant way to maps an input sequence, x[ n] to a new output sequence ∞ y[n] = x[n] ∗ h[n] = ∑ x[k ]h[n − k ] k =−∞ h[n] , often called “an impulse response,” uniquely characterizes the system. • Convolution can be graphically represented: atlas@uw.edu 4 Frequency Domain Representation of Linear Time-Invariant (LTI) Discrete-Time Systems • jωn Consider an simple complex exponential input x[n] = e , where as per 2 common notation in ∞electrical engineering, j = −1 ∞ ∞ ⎛ ∞ ⎞ y[n] = x[n] ∗ h[n] = • Define H (e jω ) = ∞ j ( n−k ) ∑ x[k ]h[n − k ] = ∑ x[n − k ]h[k ] = ∑ h[k ]e ω k =−∞ ∑ h[k ]e k =−∞ k =−∞ = e jωn ⎜ ∑ h[k ]e − jωk ⎟ ⎜ ⎟ ⎜ ⎝ k =−∞ ⎟ ⎠ − jω k k =−∞ • • Then for input x[n] = e jωn the output y[n] = H (e jω )e jωn This above result, in a figure helps show its profound impact: atlas@uw.edu 5 We now Have the Foundation Needed for: Frequency Response! • The discrete-time Fourier transform of a linear time-invariant (LTI) system can now be defined as: jω H (e ) = ∞ ∑ h[k ]e − jω k k =−∞ • Note: The FFT (fast Fourier transform) only approximates or estimates the discrete-time Fourier transform or spectrum, sometimes quite poorly. atlas@uw.edu 6 Relationship between Convolution and the Discrete-Time Fourier transform, and Duality • Convolution in time corresponds to multiplication in frequency. F y[n] = x[n] ∗ h[n] ←⎯ → X (e jω ) ⋅ H (e jω ) jω F : X (e ) = ∞ ∑ x[k ]e − jω k k =−∞ jω F : H (e ) = ∞ ∑ h[k ]e − jω k (The frequency response of the LTI system k =−∞ • Duality: Multiplication in time corresponds to convolution in frequency. 1 y[n] = x[n] ⋅ w[n] ←⎯→ 2π F jω F : W (e ) = ∞ ∑ w[k ]e − jω k π ∫π − X (e jθ )W (e j (ω −θ ) )dθ @X (e jω ) ∗ω W (e jω ) (E.g. the frequency response of a data window) k =−∞ atlas@uw.edu 7 Now for the Hard Problems: 1. Deconvolution 1. How to undo convolution? F y[n] = x[n] ∗ h[n] ←⎯ → X (e jω ) ⋅ H (e jω ) • Deconvolution: Given an observed signal(s)y[n] find or estimate the input voicing x[ n]or the vocal tract filter h[ n] . Usual approaches: a) Cepstral analysis (Mel frequency cepstral analysis, as applied to speech.) Commonly used for speech. Though recent deep nets skip this step. b) Model h[ n] with a small number of parameters and use, for example linear predictive analysis. (This is what cell phones do for speech.) • Yet demultiplication has to increase variance. atlas@uw.edu 8 Now for the Hard Problems: 2. Demodulation or Demultiplication 1. How to undo multiplication? F y[n] = x[n] ⋅ e[n] ←⎯ → X (e jω ) ∗ω E (e jω ) • Demodulation: Given an observed signal(s) y[n] find or estimate the input envelope e[ n ] . Usual approaches: a) Hilbert envelope or low pass filters magnitude or magnitude squared. (Both are the same, or similar.) • More detail coming shortly • Yet deconvolution has to increase variance. atlas@uw.edu 9 Why are Deconvolution and Deconvolution Hard? • • 1. 2. 3. They are not well-posed, in the sense of Hadamard: Hadamard’s 3 conditions for a problem being well-posed: A solution exists The solution is unique The solution's behavior changes continuously with the initial condition Problems which were not well-posed used to be considered impossible to solve correctly. But…these problems are ill-posed: • Adaptive filtering • Matrix factorization • Neural net and deep net training Convex optimization can help ensure some problems are well posed, at least the 1st 2 conditions of Hadamard. atlas@uw.edu 10 Amplitude Simple Motivational Example: A Metronome at 120 beats per minute (2 Hz) 0 0 1 1 2 2 3 Time in Seconds 3 4 4 5 5 Time in Seconds atlas@ee.washington.edu 11 A Standard (Welch's) Power Spectral Density Estimate for the Metronome Signal 140 120 dB 100 80 60 40 20 0 0 5000 10000 15000 20000 Frequency in Hertz Nothing at 2 Hertz atlas@ee.washington.edu 12 Zoom in on the Lowest 50 Hertz 0 -5 -10 dB -15 -20 -25 -30 -35 -40 0 10 20 30 40 50 Frequency in Hz Nothing but noise at 2 Hertz atlas@ee.washington.edu 13 Wavelet (Scale) Coefficient Amplitude Wavelet Analysis of Metronome 61 57 53 49 45 41 37 33 29 25 21 17 13 9 5 1 5000 4000 3000 2000 1000 0 0 0 1 1 2 2 3 Time in Seconds 3 4 5 6 Nothing at scales corresponding to 2 Hertz Similar results for discrete wavelets atlas@ee.washington.edu 14 Background Quote • 1939 …the basic nature of speech as composed of audible sound streams on which the intelligence content is impressed of the true message-bearing waves which, however, by themselves are inaudible. – Homer Dudley [Dudley39] Translation • Speech and other acoustic signals are actually low bandwidth processes which modulate higher bandwidth carriers. 6me atlas@ee.washington.edu 15 Temporal Modulation in Speech • Claim: Speech signals encode information via low-frequency envelopes modulating high-frequency carriers “B i r d pop u l a t i o n s” 0.1 Amplitude 0.05 0 0.1 -0.05 0.05 -0.1 0 0.2 0.4 0.6 Time (s) 0.8 1 1.2 0 -0.05 -0.1 0.95 atlas@ee.washington.edu 1 1.05 1.1 Time (s) 1.15 1.2 16 Alternative Views of Envelope Demodulation 1. Simplest (last slide) model: Rectification Lowpass Filter (LPF) m (t ) , modulation envelope for subband n. n 2. Common computational model, for speech experiments and cochlear implant processing: Hilbert transform to form Analytic signal m (t ) ⋅ e jφn ( t ) m (t ) , Hilbert (Modulation) envelope for subband n. n n ∠φ (t ) or cos{φ (t )}, Hilbert phase (carrier) for n. n n Note: this is a “multiplicative” view. 3. Our speculative “additive” model: Rectification m (t ) , modulation envelope for subband n. mˆ (t ) , “fast envelope” (“carrier”) Bandpass filter (BPF), f LPF, fc + 17 n fh c n for subband n. Key Point: Modulation Ambiguity • Demodulation of the envelope is under-determined (infinitely many solutions!) • Example: 1 0.5 Signal: 0 -0.5 -1 200 400 600 atlas@ee.washington.edu 800 1000 18 Key Point: Modulation Ambiguity • Demodulation is under-determined (infinitely many solutions!) • Example: Solution A 1 0.5 Envelope: 0 -0.5 -1 200 400 600 800 1000 200 400 600 800 1000 1 0.5 Carrier: 0 -0.5 -1 atlas@ee.washington.edu 19 Key Point: Modulation Ambiguity • Demodulation is under-determined (infinitely many solutions!) • Example: 1 Solution A 1 0.5 0.5 0 0 -0.5 -0.5 Envelope: -1 -1 200 Carrier: Solution B 400 600 800 1000 1 1 0.5 0.5 0 0 -0.5 -0.5 -1 200 400 600 800 1000 200 400 600 800 1000 -1 200 400 600 atlas@ee.washington.edu 800 1000 20 Coherent vs. Incoherent Demodulation For most recent theory details, see: P. Clark and L. Atlas, “Time-frequency coherent modulation filtering of non-stationary signals,” IEEE Trans. Sig. Process, in press. • Incoherent (conventional rectification or Hilbert envelope): Assume nonnegative envelope 1 1 0.5 0.5 0 0 -0.5 -0.5 -1 -1 200 400 600 800 1000 200 400 600 800 1000 800 Temporal envelope will, in general, be complex 1000 Discontinuity in carrier: →Carrier not bandlimited! • Coherent (our new contribution): Assume bandlimited carrier 1 1 0.5 0.5 0 0 -0.5 -0.5 -1 -1 200 400 atlas@ee.washington.edu 600 800 1000 200 400 600 21 Speech Modulation Model: Sum of Products Usually Complex Modulators Harmonic indices, k=1,…,K K K x(t ) = ∑ xk (t ) = ∑ mk (t ) ⋅ ck (t ) k =1 Carriers (harmonic in next demo.) k =1 Coherent demodulation of one speech harmonic: 20 – 40 Hz atlas@ee.washington.edu 20 000 Hz Frequency 22 Speech Modulation Model: Sum of Products Eight harmonics 3000 • Initial unprocessed speech 2500 Frequency 2000 1500 1000 500 0 atlas@ee.washington.edu 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Time 0.8 0.9 1 1.1 23 Demo: Sum of Products Fundamental carrier 3000 2500 Frequency 2000 1500 1000 • Fundamental carrier tone, as estimated by a time-varying harmonic model atlas@ee.washington.edu 500 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Time 0.8 0.9 1 1.1 24 Demo: Sum of Products One harmonic 3000 2500 Frequency 2000 1500 1000 • First modulated component 500 0 atlas@ee.washington.edu 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Time 0.8 0.9 1 1.1 25 Demo: Sum of Products Two harmonics 3000 2500 Frequency 2000 • Two modulated components 1500 1000 500 0 atlas@ee.washington.edu 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Time 0.8 0.9 1 1.1 26 Demo: Sum of Products Three harmonics 3000 2500 • Three modulated components Frequency 2000 1500 1000 500 0 atlas@ee.washington.edu 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Time 0.8 0.9 1 1.1 27 Demo: Sum of Products Four harmonics 3000 2500 • Four modulated components Frequency 2000 1500 1000 500 0 atlas@ee.washington.edu 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Time 0.8 0.9 1 1.1 28 Demo: Sum of Products Eight harmonics 3000 • Eight modulated components • Original carrier 2000 Frequency • Reminder, this is synthsized and not the original. 2500 1500 1000 500 0 atlas@ee.washington.edu 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Time 0.8 0.9 1 1.1 29 Try Modification: Sum of Products Coherent Temporal Fine Structure Only Eight carriers 3000 • (All modulation information removed.) 2500 2000 Frequency • Eight carriers (harmonics) • Modulation envelope for all is set to 1. 1500 1000 500 0 atlas@ee.washington.edu 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Time 0.8 0.9 1 1.1 30 Demo: Sum of Products Coherent Modulators Only • Eight modulated components • Fixed pitch synthetic carriers • All changing pitch (FM) information removed. Eight harmonics 3000 2500 Frequency 2000 Note: These demonstrations do not work correctly with conventional Hilbert or rectified or other incoherent envelopes! They (Hilbert TFS) only introduce distortion. Let’s demonstrate that on the next slide… atlas@ee.washington.edu 1500 1000 500 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Time 0.8 0.9 1 1.1 31 Demo: Sum of Products Conventional Hilbert Carriers Only – Undesired distortion overrules and dominates desired processing! Eight carriers (incoherent Hilbert) 3000 2500 2000 Frequency • Eight Hilbert phase carriers • Modulation=1. • Same colormap as last slides • For conventional real non-negative modulator approaches: 1500 1000 500 0 atlas@ee.washington.edu 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Time 0.8 0.9 1 1.1 32 New Theory: Complementary Processing • Scalar Case: Given a zero-mean scalar Gaussian complex random variable x = u + jv : { } = E {x ⋅ x } – The standard (Hermitian) variance is RxxH = E x where * is complex conjugation. 2 ∗ Difference is very significant { } – The new, complementary, variance is RxxC = E x 2 = E {x ⋅ x} = ρ RxxH with ρ < 1 – The complex correlation coefficient ρ is between x and x∗ is a measure of the degree of impropriety of x . Why? • If x is “proper,” u and v are uncorrelated, and have identical variances, then E {x ⋅ x} = E {( u + jv ) ⋅ (u + jv )} = E u 2 + E −v 2 + 2 jE {(u ⋅ v )} { } { } • Thus, if x ( ) = E {u 2 } − E {v 2 } + 2 j ⋅ 0 = 0 + 0 = 0 is proper, the complementary variance R C vanishes. xx ü But, as we now find for sonar and speech signals, after multi-band and PC-MLE processing, the complementary variance RxxC is significant or very significant! ü Thus a better signal model can advantageously us our hypothesized complementary part. Speech: Noncircularity Detected! Impropriety GLR 1 25 Hz 12.5 Hz 6.25 Hz 0.5 0 0 0.2 0.4 0.6 Time (sec) 0.8 1 Signal spectrogram 6000 Frequency (Hz) More noncircular S. Wisdom, G. Okopal, L. Atlas, and J. Pitton, “Voice Activity Detection Using Subband Noncircularity,” Proc. IEEE ICASSP, Brisbane, Austrailia, April 2015. More complete IEEE Trans ASLP in press: Null rejection threshold for the weakest estimator (pvalue = 0.05) 4000 2000 0 0 0.2 0.4 0.6 Time (sec) 0.8 1 Note: Impropriety most significant during voiced speech. Future Work and Opportunities • Theory – Can advanced communications theory, developed for manmade transmitters and receivers, and optimized for highspeed Wi-Fi and 4G Internet be applied to analysis of natural signals? – The cocktail-party problem: • Can the approach be generalized to noisy speech and other signals, as human perception does in auditory scene analysis? – Challenges: Lack of time synchronization, Unknown transmit signal set: Theory in progress. – New papers just coming out. Talk to Scott Wisdom, Brad Ekin, and Tommy Powers. • Plenty of other possible applications – Such as large sets of data, sonar, audio, and machine monitoring. • For details, see website link at: sites.google.com/a/uw.edu/isdl/ atlas@ee.washington.edu 35