Reversing Multiplication and Reversing Convolution: Important Ill-Posed Operations

advertisement
Reversing Multiplication and Reversing Convolution:
Important Ill-Posed Operations
Les Atlas
Professor and Bloedel Scholar
This research funded by the Army Research Office, The Coulter
Foundation, and the Office of Naval Research
June 25, 2015
1
Overview
• 
Why convolution for speech, audio, and other signals:
–  Why is it so important?
• 
Why modulation for speech, audio, and other signals:
–  Why is it so important?
• 
How are convolution and modulation mathematical duals of each other?
–  Their duality is related the discrete-time Fourier transform
• 
• 
• 
How to “undo” convolution? Deconvolution, the original inspiration for cepstral
coefficients.
How to “undo” modulation? Demodulation, the original problem studied, before
deconvolution, but side-stepped since it was too hard in the 1960’s.
Why are deconvolution and demodulation hard?
–  They are likely not well-posed, in the sense of Hadamard.
atlas@ee.washington.edu
2
Discrete-Time Systems’
Notation
• 
Discrete Time System: A discrete time system maps an input sequence, x[ n], to a
new sequence, the output sequence y[n] .
atlas@ee.washington.edu
3
Discrete-Time Systems’
Convolution
• 
Convolution: A linear and time-invariant way to maps an input sequence, x[ n]
to a new output sequence
∞
y[n] = x[n] ∗ h[n] =
∑ x[k ]h[n − k ]
k =−∞
h[n] , often called “an impulse response,” uniquely characterizes the
system.
•  Convolution can be graphically represented:
atlas@uw.edu
4
Frequency Domain Representation of Linear
Time-Invariant (LTI) Discrete-Time Systems
• 
jωn
Consider an simple complex exponential input x[n] = e , where as per
2
common notation in ∞electrical engineering,
j
= −1
∞
∞
⎛ ∞
⎞
y[n] = x[n] ∗ h[n] =
• 
Define H (e jω ) =
∞
j ( n−k )
∑ x[k ]h[n − k ] = ∑ x[n − k ]h[k ] = ∑ h[k ]e ω
k =−∞
∑ h[k ]e
k =−∞
k =−∞
= e jωn ⎜ ∑ h[k ]e − jωk ⎟
⎜
⎟
⎜
⎝ k =−∞
⎟
⎠
− jω k
k =−∞
• 
• 
Then for input x[n] = e jωn the output y[n] = H (e jω )e jωn
This above result, in a figure helps show its profound impact:
atlas@uw.edu
5
We now Have the Foundation Needed for:
Frequency Response!
•  The discrete-time Fourier transform of a linear time-invariant (LTI) system can
now be defined as:
jω
H (e ) =
∞
∑ h[k ]e
− jω k
k =−∞
•  Note: The FFT (fast Fourier transform) only approximates or estimates the
discrete-time Fourier transform or spectrum, sometimes quite poorly.
atlas@uw.edu
6
Relationship between Convolution and the
Discrete-Time Fourier transform, and Duality
• 
Convolution in time corresponds to multiplication in frequency.
F
y[n] = x[n] ∗ h[n] ←⎯
→ X (e jω ) ⋅ H (e jω )
jω
F : X (e ) =
∞
∑ x[k ]e
− jω k
k =−∞
jω
F : H (e ) =
∞
∑ h[k ]e
− jω k
(The frequency response of the LTI system
k =−∞
• 
Duality: Multiplication in time corresponds to convolution in frequency.
1
y[n] = x[n] ⋅ w[n] ←⎯→
2π
F
jω
F : W (e ) =
∞
∑ w[k ]e
− jω k
π
∫π
−
X (e jθ )W (e j (ω −θ ) )dθ @X (e jω ) ∗ω W (e jω )
(E.g. the frequency response of a data window)
k =−∞
atlas@uw.edu
7
Now for the Hard Problems:
1. Deconvolution
1. 
How to undo convolution?
F
y[n] = x[n] ∗ h[n] ←⎯
→ X (e jω ) ⋅ H (e jω )
• 
Deconvolution: Given an observed signal(s)y[n] find or estimate the input
voicing x[ n]or the vocal tract filter h[ n] .
Usual approaches:
a)  Cepstral analysis (Mel frequency cepstral analysis, as applied to
speech.) Commonly used for speech. Though recent deep nets skip this
step.
b)  Model h[ n] with a small number of parameters and use, for example
linear predictive analysis. (This is what cell phones do for speech.)
• 
Yet demultiplication has to increase variance.
atlas@uw.edu
8
Now for the Hard Problems:
2. Demodulation or Demultiplication
1. 
How to undo multiplication?
F
y[n] = x[n] ⋅ e[n] ←⎯
→ X (e jω ) ∗ω E (e jω )
• 
Demodulation: Given an observed signal(s)
y[n] find or estimate the input
envelope e[ n ] .
Usual approaches:
a)  Hilbert envelope or low pass filters magnitude or magnitude squared.
(Both are the same, or similar.)
• 
More detail coming shortly
• 
Yet deconvolution has to increase variance.
atlas@uw.edu
9
Why are Deconvolution and Deconvolution
Hard?
• 
• 
1. 
2. 
3. 
They are not well-posed, in the sense of Hadamard:
Hadamard’s 3 conditions for a problem being well-posed:
A solution exists
The solution is unique
The solution's behavior changes continuously with the initial
condition
Problems which were not well-posed used to be considered impossible
to solve correctly. But…these problems are ill-posed:
•  Adaptive filtering
•  Matrix factorization
•  Neural net and deep net training
Convex optimization can help ensure some problems are well posed, at least
the 1st 2 conditions of Hadamard.
atlas@uw.edu
10
Amplitude
Simple Motivational Example:
A Metronome at 120 beats per minute (2 Hz)
0
0
1
1
2
2
3
Time in Seconds
3
4
4
5
5
Time in Seconds
atlas@ee.washington.edu
11
A Standard (Welch's) Power Spectral Density
Estimate for the Metronome Signal
140
120
dB
100
80
60
40
20
0
0
5000
10000
15000
20000
Frequency in Hertz
Nothing at 2 Hertz atlas@ee.washington.edu
12
Zoom in on the Lowest 50 Hertz
0
-5
-10
dB
-15
-20
-25
-30
-35
-40
0
10
20
30
40
50
Frequency in Hz
Nothing but noise at 2 Hertz atlas@ee.washington.edu
13
Wavelet (Scale) Coefficient
Amplitude
Wavelet Analysis of Metronome
61
57
53
49
45
41
37
33
29
25
21
17
13
9
5
1
5000
4000
3000
2000
1000
0
0
0
1
1
2
2
3
Time in Seconds
3
4
5
6
Nothing at scales corresponding to 2 Hertz Similar results for discrete wavelets atlas@ee.washington.edu
14
Background Quote
•  1939
…the basic nature of speech as composed of audible sound
streams on which the intelligence content is impressed of
the true message-bearing waves which, however, by
themselves are inaudible. – Homer Dudley [Dudley39]
Translation
•  Speech and other acoustic signals are actually low bandwidth
processes which modulate higher bandwidth carriers.
6me atlas@ee.washington.edu
15
Temporal Modulation in Speech
•  Claim: Speech signals encode information via low-frequency
envelopes modulating high-frequency carriers
“B i r d pop u l a
t i o n s”
0.1
Amplitude
0.05
0
0.1
-0.05
0.05
-0.1
0
0.2
0.4
0.6
Time (s)
0.8
1
1.2
0
-0.05
-0.1
0.95
atlas@ee.washington.edu
1
1.05
1.1
Time (s)
1.15
1.2
16
Alternative Views of Envelope Demodulation
1.  Simplest (last slide) model:
Rectification
Lowpass
Filter
(LPF)
m (t ) , modulation envelope for
subband n.
n
2.  Common computational model, for speech experiments and
cochlear implant processing:
Hilbert
transform to
form Analytic
signal
m (t ) ⋅ e
jφn ( t )
m (t ) , Hilbert (Modulation) envelope for subband n.
n
n
∠φ (t ) or cos{φ (t )}, Hilbert phase (carrier) for n.
n
n
Note: this is a “multiplicative” view.
3.  Our speculative “additive” model:
Rectification
m (t ) , modulation envelope for subband n.
mˆ (t ) , “fast envelope” (“carrier”)
Bandpass filter (BPF), f LPF, fc
+
17
n
fh
c
n
for subband n.
Key Point: Modulation Ambiguity
•  Demodulation of the envelope is under-determined
(infinitely many solutions!)
•  Example:
1
0.5
Signal:
0
-0.5
-1
200
400
600
atlas@ee.washington.edu
800
1000
18
Key Point: Modulation Ambiguity
•  Demodulation is under-determined (infinitely many
solutions!)
•  Example:
Solution A
1
0.5
Envelope:
0
-0.5
-1
200
400
600
800
1000
200
400
600
800
1000
1
0.5
Carrier:
0
-0.5
-1
atlas@ee.washington.edu
19
Key Point: Modulation Ambiguity
•  Demodulation is under-determined (infinitely many
solutions!)
•  Example:
1
Solution A
1
0.5
0.5
0
0
-0.5
-0.5
Envelope:
-1
-1
200
Carrier:
Solution B
400
600
800
1000
1
1
0.5
0.5
0
0
-0.5
-0.5
-1
200
400
600
800
1000
200
400
600
800
1000
-1
200
400
600
atlas@ee.washington.edu
800
1000
20
Coherent vs. Incoherent Demodulation
For most recent theory details, see: P. Clark and L. Atlas, “Time-frequency coherent
modulation filtering of non-stationary signals,” IEEE Trans. Sig. Process, in press.
•  Incoherent (conventional rectification or Hilbert envelope):
Assume
nonnegative
envelope
1
1
0.5
0.5
0
0
-0.5
-0.5
-1
-1
200
400
600
800
1000
200
400
600
800
1000
800
Temporal envelope will,
in general, be complex
1000
Discontinuity in carrier:
→Carrier not bandlimited!
•  Coherent (our new contribution):
Assume
bandlimited
carrier
1
1
0.5
0.5
0
0
-0.5
-0.5
-1
-1
200
400
atlas@ee.washington.edu
600
800
1000
200
400
600
21
Speech Modulation Model:
Sum of Products
Usually Complex Modulators
Harmonic indices, k=1,…,K
K
K
x(t ) = ∑ xk (t ) = ∑ mk (t ) ⋅ ck (t )
k =1
Carriers
(harmonic in
next demo.)
k =1
Coherent demodulation of one speech harmonic:
20 – 40 Hz
atlas@ee.washington.edu
20 000 Hz
Frequency
22
Speech Modulation Model:
Sum of Products
Eight harmonics
3000
•  Initial
unprocessed
speech
2500
Frequency
2000
1500
1000
500
0
atlas@ee.washington.edu
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Time
0.8
0.9
1
1.1
23
Demo: Sum of Products
Fundamental carrier
3000
2500
Frequency
2000
1500
1000
•  Fundamental
carrier tone, as
estimated by a
time-varying
harmonic model
atlas@ee.washington.edu
500
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Time
0.8
0.9
1
1.1
24
Demo: Sum of Products
One harmonic
3000
2500
Frequency
2000
1500
1000
•  First modulated
component
500
0
atlas@ee.washington.edu
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Time
0.8
0.9
1
1.1
25
Demo: Sum of Products
Two harmonics
3000
2500
Frequency
2000
•  Two modulated
components
1500
1000
500
0
atlas@ee.washington.edu
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Time
0.8
0.9
1
1.1
26
Demo: Sum of Products
Three harmonics
3000
2500
•  Three
modulated
components
Frequency
2000
1500
1000
500
0
atlas@ee.washington.edu
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Time
0.8
0.9
1
1.1
27
Demo: Sum of Products
Four harmonics
3000
2500
•  Four modulated
components
Frequency
2000
1500
1000
500
0
atlas@ee.washington.edu
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Time
0.8
0.9
1
1.1
28
Demo: Sum of Products
Eight harmonics
3000
•  Eight
modulated
components
•  Original carrier
2000
Frequency
•  Reminder, this
is synthsized
and not the
original.
2500
1500
1000
500
0
atlas@ee.washington.edu
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Time
0.8
0.9
1
1.1
29
Try Modification: Sum of Products
Coherent Temporal Fine Structure Only
Eight carriers
3000
•  (All
modulation
information
removed.)
2500
2000
Frequency
•  Eight carriers
(harmonics)
•  Modulation
envelope for
all is set to 1.
1500
1000
500
0
atlas@ee.washington.edu
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Time
0.8
0.9
1
1.1
30
Demo: Sum of Products
Coherent Modulators Only
•  Eight modulated
components
•  Fixed pitch
synthetic carriers
•  All changing pitch
(FM) information
removed.
Eight harmonics
3000
2500
Frequency
2000
Note: These demonstrations
do not work correctly with
conventional Hilbert or rectified
or other incoherent envelopes!
They (Hilbert TFS) only
introduce distortion. Let’s
demonstrate that on the next
slide…
atlas@ee.washington.edu
1500
1000
500
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Time
0.8
0.9
1
1.1
31
Demo: Sum of Products
Conventional Hilbert Carriers Only
–  Undesired distortion
overrules and dominates
desired processing!
Eight carriers (incoherent Hilbert)
3000
2500
2000
Frequency
•  Eight Hilbert phase
carriers
•  Modulation=1.
•  Same colormap as last
slides
•  For conventional real
non-negative modulator
approaches:
1500
1000
500
0
atlas@ee.washington.edu
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Time
0.8
0.9
1
1.1
32
New Theory: Complementary Processing
•  Scalar Case: Given a zero-mean scalar Gaussian complex random
variable x = u + jv :
{ } = E {x ⋅ x }
–  The standard (Hermitian) variance is RxxH = E x
where * is complex conjugation.
2
∗
Difference is
very significant
{ }
–  The new, complementary, variance is RxxC = E x 2 = E {x ⋅ x} = ρ RxxH with ρ < 1
–  The complex correlation coefficient ρ is between x and x∗ is a measure
of the degree of impropriety of x . Why?
•  If x is “proper,” u and v are uncorrelated, and have identical variances, then
E {x ⋅ x} = E {( u + jv ) ⋅ (u + jv )} = E u 2 + E −v 2 + 2 jE {(u ⋅ v )}
{ } { }
•  Thus, if
x
(
)
= E {u 2 } − E {v 2 } + 2 j ⋅ 0 = 0 + 0 = 0
is proper, the complementary variance R C vanishes.
xx
ü  But, as we now find for sonar and speech signals, after multi-band and
PC-MLE processing, the complementary variance RxxC is significant or
very significant!
ü  Thus a better signal model can advantageously us our hypothesized
complementary part.
Speech: Noncircularity Detected!
Impropriety GLR
1
25 Hz
12.5 Hz
6.25 Hz
0.5
0
0
0.2
0.4
0.6
Time (sec)
0.8
1
Signal spectrogram
6000
Frequency (Hz)
More noncircular
S. Wisdom, G. Okopal, L. Atlas, and J. Pitton, “Voice Activity
Detection Using Subband Noncircularity,” Proc. IEEE ICASSP,
Brisbane, Austrailia, April 2015. More complete IEEE Trans ASLP
in press:
Null rejection
threshold for the
weakest
estimator (pvalue = 0.05)
4000
2000
0
0
0.2
0.4
0.6
Time (sec)
0.8
1
Note:
Impropriety most
significant during
voiced speech.
Future Work and Opportunities
•  Theory
–  Can advanced communications theory, developed for manmade transmitters and receivers, and optimized for highspeed Wi-Fi and 4G Internet be applied to analysis of
natural signals?
–  The cocktail-party problem:
•  Can the approach be generalized to noisy speech and other
signals, as human perception does in auditory scene analysis?
– Challenges: Lack of time synchronization,
Unknown transmit signal set: Theory in progress.
– New papers just coming out. Talk to Scott
Wisdom, Brad Ekin, and Tommy Powers.
•  Plenty of other possible applications
–  Such as large sets of data, sonar, audio, and machine
monitoring.
•  For details, see website link at:
sites.google.com/a/uw.edu/isdl/
atlas@ee.washington.edu
35
Download