Searching and Summarizing Speech Julia Hirschberg CS 6998

advertisement
Searching and Summarizing
Speech
Julia Hirschberg
CS 6998
7/15/2016
1
Today
Speech browsing and search
Speech summarization: 2 views
Hori et al
Barzilay et al
Speech data mining
7/15/2016
2
Searching Audio Data
Today, large amounts of audio data available: on
the web, in company archives, in our homes
But what can we do with it?
We have tools supporting random access to
text – but for audio we’re limited to serial
search
Goal: tools to search audio as easily as text
7/15/2016
3
Why?
Searching online news and archives
Searching a/v archives, movies
Searching trial recordings and legislative
sessions
Browsing meetings, customer care exchanges,
focus groups
Telephone calls and voicemail
7/15/2016
4
Audio Browsing/Retrieval for
Voicemail
Motivated by interviews, surveys and usage
logs of heavy users:
Hard to scan new msgs to find those you need
to deal with quickly
Hard to find msg you want in archive
Hard to locate information you want in any msg
How could we help?
7/15/2016
5
Caller
SCANMail Architecture
SCANMail
Subscriber
Corpus Collection
Recordings collected from 138 AT&T Labs
employees’ mailboxes
100 hours; 10K msgs; 2500 speakers
Gender balanced: 12% non-native speakers
Mean message duration 36.4 secs, median 30.0
secs
Hand-transcribed and annotated with caller id,
gender, age, entity demarcation (names, dates,
telnos)
7/15/2016
7
Transcription and Bracketing
[ Greeting: hi R ] [ CallerID: it's me ] give me a
call [ um ] right away cos there's [ .hn ] I guess
there's some [ .hn ] change [ Date: tomorrow ]
with the nursery school and they [ um ] [ .hn ]
anyway they had this idea [ cos ] since I think
J's the only one staying [ Date: tomorrow ] for
play club so they wanted to they suggested that
[ .hn ] well J2 actually offered to take J home
with her and then would she
7/15/2016
8
would meet you back at the synagogue at [ Time:
five thirty ] to pick her up [ .hn ] [ uh ] so I don't
know how you feel about that otherwise M_ and
one other teacher would stay and take care of
her till [ Date: five thirty tomorrow ] but if you [
.hn ] I wanted to know how you feel before I tell
her one way or the other so call me [ .hn ] right
away cos I have to get back to her in about an
hour so [ .hn ] okay [ Closing: bye [ .nhn ] [ .onhk ]
7/15/2016
9
SCANMail Demo
http://www.fancentral.org/~isen
hour/scanmail/demo.html
Audix extension: 8380
Audix password: (null)
7/15/2016
10
Information Extraction from
Speech
Jansche & Abney ‘02
7/15/2016
11
Speech Summarization:
Extraction Techniques
Hori et al ‘02
Inoue et al ‘04
7/15/2016
12
Domain Specific Summarization
(Barzilay et al ‘00)
Motivation: lab experiments show little
facilitation of speech summarization by
techniques that do improve search
Domain: Broadcast News
Idea: knowing what type of speaker (anchor,
reporter, interviewee) is speaking provides
structural clues that can “outline” the newscast
since programs are predictable
7/15/2016
13
SCAN: Spoken Content-based
Audio Navigator
 TREC SDR corpus of Broadcast News
 Segment speech `documents’ into audio
`paratones’ acoustically
Segmentation module trained on handlabeled discourse structure annotation in
another domain
 Classify recording conditions, e.g.
Music, telephone bandwidth, wide-band
 Run ASR with appropriate acoustic models
(~70% wac)
 Index (errorful) transcripts using SMART IR
7/15/2016
14
 Results in WYSIAWY (“What you see is almost what
you hear”) GUI
Transcript prosodically formatted
Overview provides abstract structure
7/15/2016
15
Acoustic Condition
Classification
Paratone Detector
Broadcast News
corpus
Recognition
SCAN db
Information
Retrieval
GUI
7/15/2016
16
Search
Overview
Transcript
7/15/2016
17
Patterns in Newscasts
Anchors present headlines and introduce stories
Most frequent speakers
Anchor/reporter turn alternation
Reporter/guest turntaking during stories
7/15/2016
18
Data
35 broadcasts of “All Things Considered”
Human and ASR transcripts (without
commercials but with turn boundaries)
Features to predict speaker role
Lexical: ngrams 1-5, explicit introductions
(current and prior segment)
Contextual: labels and features of prior turns
Durational: turn length (absolute and relative
to previous)
7/15/2016
19
Methods and Results
Boosting and maximum entropy --> simple
weighted rules to predict speaker role
Baseline: guess anchor (35.4%)
Result on human transcripts:
BoostTexter 79%
MaxEnt 80.5%
Result on ASR transcripts:
BoostTexter 72.8%
MaxEnt 77%
7/15/2016
20
Speech Data Mining
How does it differ from text data mining?
Maskey et al ‘04
7/15/2016
21
Download