Recognizing Structure: Sentence, Speaker, andTopic Segmentation Julia Hirschberg CS 4706

advertisement
Recognizing Structure: Sentence,
Speaker, andTopic Segmentation
Julia Hirschberg
CS 4706
7/15/2016
1
Today
•
•
•
•
•
Recognizing structural information in speech
Learning from generation
Learning from text segmentation
Types of structural information
Segmentation in spoken corpora
7/15/2016
2
Today
•
•
•
•
•
Recognizing structural information in speech
Learning from generation
Learning from text segmentation
Types of structural information
Segmentation in spoken corpora
7/15/2016
3
Recall: Discourse Structure for Speech
Generation
• Theoretical accounts (e.g. Grosz & Sidner ’86)
• Empirical studies
– Text vs. speech
• How can they help in recognition?
– Features to test
• Acoustic/prosodic features
• Lexical features
7/15/2016
4
Today
•
•
•
•
•
Recognizing structural information in speech
Learning from generation
Learning from text segmentation
Types of structural information
Segmentation in spoken corpora
7/15/2016
5
Indicators of Structure in Text
•
•
•
•
Cue phrases: now, well, first
Pronominal reference
Orthography and formatting -- in text
Lexical information (Hearst ‘94, Reynar ’98,
Beeferman et al ‘99):
– Domain dependent
– Domain independent
7/15/2016
6
Methods of Text Segmentation
• Lexical cohesion methods vs. multiple source
– Vocabulary similarity indicates topic cohesion
• Intuition from Halliday & Hasan ’76
• Features:
–
–
–
–
–
–
Stem repetition
Entity repetition
Word frequency
Context vectors
Semantic similarity
Word distance
• Methods:
– Sliding window
7/15/2016
7
– Lexical chains
– Clustering
– Combine lexical cohesion with other cues
• Features
– Cue phrases
– Reference (e.g. pronouns)
– Syntactic features
• Methods
– Machine Learning from labeled corpora
7/15/2016
8
Choi 2000: Text Segmentation
• Implements leading methods and compares new
algorithm to them on corpus of 700
concatenated documents
• Comparison algorithms:
– Baselines:
•
•
•
•
•
7/15/2016
No boundaries
All boundaries
Regular partition
Random # of random partitions
Actual # of random partitions
9
–
–
–
–
Textiling Algorithm (Hearst ’94)
DotPlot algorithms (Reynar ’98)
Segmenter (Kan et al ’98)
Choi ’00 proposal
• Cosine similarity measure
sim ( x, y ) 
 f f
 f  f
x, j
j
y, j
2
j
x, j
j
2
y, j
• Same: 1; no overlap 0
7/15/2016
10
• Similarity matrix  rank matrix
– Minimize effect of outliers
– How likely is this sentence to be a boundary, compared
to other sentences?
• Divisive clustering based on
– D(n) = sum of rank values (sI,j) of segment n/ inside area
of segment n (j-i+1) – for i,j the sentences at the
beginning and end of segment n
• Keep dividing the corpus
– until D(n) = D(n) - D(n-1) shows little change
– Choi’s algorithm has best performance (912% error)
7/15/2016
11
Utiyama & Isahara ’02: What if we have no
labeled data for our domain?
7/15/2016
12
Today
•
•
•
•
•
Recognizing structural information in speech
Learning from generation
Learning from text segmentation
Types of structural information
Segmentation in spoken corpora
7/15/2016
13
Types of Discourse Structure in Spoken
Corpora
• Domain independent
– Sentence/utterance boundaries
– Speaker turn segmentation
– Topic segmentation
• Domain dependent
– Broadcast news
– Meetings
– Telephone conversations
7/15/2016
14
Today
•
•
•
•
•
Recognizing structural information in speech
Learning from generation
Learning from text segmentation
Types of structural information
Segmentation in spoken corpora
7/15/2016
15
Spoken Cues to Discourse Structure
• Pitch range
Lehiste ’75, Brown et al ’83, Silverman ’86,
Avesani & Vayra ’88, Ayers ’92, Swerts et al
’92, Grosz & Hirschberg’92, Swerts &
Ostendorf ’95, Hirschberg & Nakatani ‘96
• Preceding pause
Lehiste ’79, Chafe ’80, Brown et al ’83,
Silverman ’86, Woodbury ’87, Avesani &
Vayra ’88, Grosz & Hirschberg’92, Passoneau
& Litman ’93, Hirschberg & Nakatani ‘96
7/15/2016
16
• Rate
Butterworth ’75, Lehiste ’80, Grosz &
Hirschberg’92, Hirschberg & Nakatani ‘96
• Amplitude
Brown et al ’83, Grosz & Hirschberg’92,
Hirschberg & Nakatani ‘96
• Contour
Brown et al ’83, Woodbury ’87, Swerts et al ‘92
7/15/2016
17
Finding Sentence and Topic Boundaries
• Statistical, Machine Learning approaches with
large segmented corpora
• Features:
– Lexical cues
• Domain dependent
• Sensitive to ASR performance
– Acoustic/prosodic cues
• Domain independent
• Sensitive to speaker identify
7/15/2016
18
Shriberg et al ’00: Prosodic Cues
• Prosody cues perform as well or better than textbased cues at sentence and topic segmentation
-- and generalize better?
• Goal: identify sentence and topic boundaries at
ASR-defined word boundaries
– CART decision trees provided boundary
predictions
– HMM combined these with lexical boundary
predictions from LM
7/15/2016
19
Features
– For each potential boundary location:
• Pause at boundary (raw and normalized by
speaker)
• Pause at word before boundary (is this a new ‘turn’
or part of continuous speech segment?)
• Phone and rhyme duration (normalized by inherent
duration) (phrase-final lengthening?)
• F0 (smoothed and stylized): reset, range (topline,
baseline), slope and continuity
7/15/2016
20
• Voice quality (halving/doubling estimates as
correlates of creak or glottalization)
• Speaker change, time from start of turn, # turns in
conversation and gender
– Trained/tested on Switchboard and Broadcast News
7/15/2016
21
Sentence segmentation results
• Prosodic features
• Better than LM for BN
• Worse (on transcription) and same for ASR
transcript on SB
• All better than chance
• Useful features for BN
• Pause at boundary ,turn/no turn, f0 diff across
boundary, rhyme duration
• Useful features for SB
• Phone/rhyme duration before boundary, pause at
boundary, turn/no turn, pause at preceding word
boundary, time in turn
7/15/2016
22
Topic segmentation results (BN only):
• Useful features
• Pause at boundary, f0 range, turn/no turn, gender,
time in turn
• Prosody alone better than LM
• Combined model improves significantly
7/15/2016
23
Next Class
• Identifying Speech Acts
• Reading:
– This chapter of J&M is a beta version
– Please keep a diary for:
• Any typos
• Any passages you think are hard to follow
• Any suggestions
• HW 3a due by class (2:40pm)
7/15/2016
24
Download