Lab ROSA Mapping Meetings:

advertisement
Mapping Meetings:
Columbia’s plans
Dan Ellis
http://labrosa.ee.columbia.edu/
Outline
1
Columbia participants
2
Mapping Meetings: Perspectives
3
Techniques
4
Summary
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 1 of 24
Lab
ROSA
Columbia participants
1
•
Dan Ellis
Laboratory for Recognition and Organization of
Speech and Audio (LabROSA)
- signal processing, abstraction and indexing
- speech recognition
- computational auditory scene analysis
•
Kathy McKeown
Nautral Language Processing (NLP) group
- text analysis, information extraction, generation
•
“Mapping meetings” support
- 2 students, 3 academic years
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 2 of 24
Lab
ROSA
LabROSA
DOMAINS
http://labrosa.ee.columbia.edu/
• Meetings
• Personal recordings
• Location monitoring
• Broadcast
• Movies
• Lectures
ROSA
• Object-based structure discovery & learning
APPLICATIONS
• Speech recognition
• Speech characterization
• Nonspeech recognition
•
•
•
•
•
• Scene analysis
• Audio-visual integration
• Music analysis
Structuring
Search
Summarization
Awareness
Understanding
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 3 of 24
Lab
ROSA
Natural Language Processing Group
•
4 faculty & senior researchers
13 Ph.D. students + MS ...
•
Summarization
- Sports News task: max info-per-word
•
Shallow parsing
- phrase-based (not sentences)
- statistical
•
Intersection of similar documents
- common phrases → summary
- identify contradictions etc.
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 4 of 24
Lab
ROSA
Outline
1
Columbia Participants
2
Mapping meetings: Perspectives
- project themes
- broad plans
3
Techniques
4
Summary
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 5 of 24
Lab
ROSA
Mapping meetings: Project themes
•
Discourse analysis
- interaction patterns: predicted, emergent
- individuals .. pairs .. groups
- words, speech style, timing, ... (speaker-relative)
•
Summarization & abstraction
- topic content
- “meeting acts”
- participant roles (...)
•
Browsing & visualization
- access methods / dimensions
- scale (time, detail); pov orientation
- data classes & types; rendering
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 6 of 24
Lab
ROSA
Information flow
Words
ASR
per-speaker
Speaker ID
prosody
close-talking
x-cancel
Topic tracking/
segmentation
text sources
Acts/episodes
between-chan
Speaker activity
motion
Audio
per-chan
multi-pitch
nonspeech
Speaker profiles
missing-data
masks
de-reverb
background
laughter
tabletop mics
between-chan
azimuth, triangulation
separation (CASA, ICA)
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 7 of 24
Lab
ROSA
Signal organization plans
•
Speaker turns from channel energy envelopes
- mixing matrix inversion
•
Extracting sources from mixed signals
- source ID from spatial cues, pitch
- feature/mask extraction → missing-data recog.
•
Distant signal recognition
- tandem modeling
- channel compensation
- multi-source recognition
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 8 of 24
Lab
ROSA
NLP plans
(Kathy McKeown)
•
Argumentative Summaries
- special case of summarization
- same theme, different perspectives
•
Outcomes
- identify points of disagreement
- identify resulting consensus
•
Depends on
- information extraction:
words, discourse, prosody
•
Manpower
- one student starting in January
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 9 of 24
Lab
ROSA
Outline
1
Columbia Participants
2
Mapping Meetings: Perspectives
3
Techniques
- speaker turn detection
- per-speaker signal extraction
- speech recognition
4
Summary
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 10 of 24
Lab
ROSA
Crosstalk cancellation
•
Baseline speaker activity detection is hard:
Spkr A
speaker
active
speaker B
cedes floor
Spkr B
Spkr C
interruptions
Spkr D
breath
noise
Spkr E
crosstalk
level/dB
40
Table
top
20
0
120
125
130
135
140
145
150
155
time / secs
•
Noisy crosstalk model: m = C ⋅ s + n
•
Estimate column CxA from A’s peak energy
- ... including pure delay (10 ms frames)
- ... then linear inversion
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 11 of 24
Lab
ROSA
Passive motion detection
•
Cross-correlation recovers impulse response
•
Coupling to each mic gives distances;
can infer which are moving
S xy = H xy ⋅ S xx
participant movement
mr-2000-11-02-1440
Spkr C
freq / chan
20
15
10
Coupling
C→A
delay / samples
5
20
15
10
5
Coupling
C → Tabletop
delay / samples
20
15
10
5
1020
1020.5
1021
Mapping meetings: Kickoff - Dan Ellis
1021.5
1022
1022.5
1023
1023.5
1024
1024.5
2001-10-13 - 12 of 24
time / sec
Lab
ROSA
PDA-based speaker change detection
•
Goal: small conference-tabletop device
•
Speaker turns from PDA mock-up signals?
pda.aif: excerpt with 512-pt xcorr, 80% max thresh
8000
6000
4000
2000
0
Morgan
31
32
what's this one here this this one has a couple
Adam
Dan
33
34
35
cheesy electrets oh
36
37
38
that's connected too or
39
yeah that's
40
that's connected twoyep
that's the dummy dummy P.D.A.
both
yeah both channels
yeah it is
1.5
lag/ms
1
0.5
0
-0.5
-1
-1.5
31
32
33
34
35
36
37
38
39
40
31
32
33
34
35
time/s
36
37
38
39
40
0.5
0
-0.5
•
SCD algo on spectral + interaural features
- average spectral + per-channel ITD, ∆φ
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 13 of 24
Lab
ROSA
Computational Auditory Scene Analysis
(CASA)
•
input
mixture
Implement psychoacoustic theory? (Brown’92)
Front end
signal
features
(maps)
Element
formation
discrete
elements
Grouping
rules
Source
groups
freq
onset
time
period
frq.mod
- what are the features? how are they used?
•
Need top-down constraints:
hypotheses
Noise
components
Hypothesis
management
prediction
errors
input
mixture
Front end
signal
features
Compare
& reconcile
Mapping meetings: Kickoff - Dan Ellis
Periodic
components
Predict
& combine
predicted
features
2001-10-13 - 14 of 24
Lab
ROSA
Pitch-based monaural separation
•
frq/Hz
Bottom-up
brn1h.aif
frq/Hz
3000
3000
2000
1500
2000
1500
1000
1000
600
600
400
300
400
300
200
150
200
150
100
brn1h.fi.aif
100
0.2
0.4
0.6
0.8
1.0
time/s
0.2
0.4
0.6
0.8
1.0
time/s
- time-frequency cells tagged with fundamental
- delete cells not dominated by target f0
•
Model-based (wefts)
f/Hz
Weft1
4000
2000
1000
400
200
Voices
f/Hz
1000
4000
2000
1000
400
200
-40
1000
-50
400
200
100
50
0.0
400
200
100
50
-30
f/Hz
-60
dB
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Wefts2-5
4000
2000
1000
400
200
1000
400
200
100
50
0.0
0.2
0.4
0.6
0.8
1.0
- estimate multiple periodicities per cell
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 15 of 24
1.2
1.4 time/s
Lab
ROSA
Tandem speech recognition
(with Manuel Reyes)
•
Neural net estimates phone posteriors;
but Gaussian mixtures model finer detail
•
Combine them!
Hybrid Connectionist-HMM ASR
Feature
calculation
Conventional ASR (HTK)
Noway
decoder
Neural net
classifier
Feature
calculation
HTK
decoder
Gauss mix
models
h#
pcl
bcl
tcl
dcl
C0
C1
C2
s
Ck
tn
ah
t
s
ah
t
tn+w
Input
sound
Words
Phone
probabilities
Speech
features
Input
sound
Subword
likelihoods
Speech
features
Tandem modeling
Feature
calculation
Neural net
classifier
HTK
decoder
Gauss mix
models
h#
pcl
bcl
tcl
dcl
C0
C1
C2
s
Ck
tn
ah
t
tn+w
Input
sound
•
Speech
features
Phone
probabilities
Subword
likelihoods
Words
Train net, then train GMM on net output
- GMM is ignorant of net output ‘meaning’
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 16 of 24
Lab
ROSA
Words
Tandem system results: Aurora digits
WER as a function of SNR for various Aurora99 systems
100
Average WER ratio to baseline:
HTK GMM:
100%
Hybrid:
84.6%
Tandem:
64.5%
Tandem + PC: 47.2%
WER / % (log scale)
50
20
10
5
HTK GMM baseline
Hybrid connectionist
Tandem
Tandem + PC
2
1
-5
0
5
10
15
SNR / dB (averaged over 4 noises)
20
clean
Aurora 2 Eurospeech 2001 Evaluation
Columbia
Philips
Rel improvement % - Multicondition
60
UPC Barcelona
Bell Labs
IBM
50
40
Motorola 1
Motorola 2
Nijmegen
ICSI/OGI/Qualcomm
30
ATR/Griffith
AT&T
Alcatel
Siemens
20
10
0
Avg. rel. improvement
-10
Mapping meetings: Kickoff - Dan Ellis
UCLA
Microsoft
Slovenia
Granada
2001-10-13 - 17 of 24
Lab
ROSA
Missing data recognition
(with Cooke, Green, Barker @ Sheffield)
•
Noisy training seems to miss the point
- rather have single ‘clean’ models
•
Use missing feature theory...
- integrate over missing data dimensions xm
p ( q xo ) =
∫ p(q
x o, x m ) p ( x m x o ) dx m
- trick is finding good/bad data mask
- soft classification improves
"1754" + noise
90
AURORA 2000 - Test Set A
80
Missing-data
Recognition
WER
70
MD Discrete SNR
MD Soft SNR
HTK clean training
HTK multi-condition
60
50
40
"1754"
Mask based on
stationary noise estimate
30
20
10
0
-5
Mapping meetings: Kickoff - Dan Ellis
0
5
10
2001-10-13 - 18 of 24
15
Lab
ROSA
20
Clean
Multi-source decoding
(Jon Barker @ Sheffield)
•
Search of sound-fragment interpretations
"1754" + noise
Mask based on
stationary noise estimate
Mask split into subbands
`Grouping' applied within bands:
Spectro-Temporal Proximity
Common Onset/Offset
Multisource
Decoder
"1754"
•
Comparing different masks
- evaluate p(M,K|O) = p(M|K,O)·p(K|O)
•
CASA for masks/fragments
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 19 of 24
Lab
ROSA
Outline
1
Columbia Participants
2
Mapping Meetings: Perspectives
3
Techniques
4
Summary
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 20 of 24
Lab
ROSA
Summary
•
Columbia:
Audio Organization,
Language Processing
•
Meetings:
Raw information sources,
Multiple analyses
•
Audio signals (close talk / tabletop):
Turns
Isolated speech
Other information
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 21 of 24
Lab
ROSA
Random ideas ...
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 22 of 24
Lab
ROSA
Speaker turn (H)MM
•
Markov model for speaker changes
- optimal path for ambiguous cues
A
AD
BC
AC
C
B
AB
BD
CD
D
•
Transition matrix represents...
- turn durations (self-loops)
- response patterns
•
ML choice between alternate trans. matrices
- detect different meeting modes:
presentation, debate, conflict...
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 23 of 24
Lab
ROSA
Marginalizing turn features
•
Each turn may have features
- pitch range, duration, rate, fluency
•
Features depend on speaker and
discussion mode
- marginalize two ways
- speaker-relative features indicate mode
meetings
A
B
participants
C
D
E
F
mr1
mr2
mr3
ed1
ed2
au1
G
H
meeting
profiles
participant profiles
Mapping meetings: Kickoff - Dan Ellis
2001-10-13 - 24 of 24
Lab
ROSA
Download