Recognizing Structure: Sentence, Speaker, andTopic Segmentation Julia Hirschberg CS 4706 7/15/2016 1 Today • • • • • Recognizing structural information in speech Learning from generation Learning from text segmentation Types of structural information Segmentation in spoken corpora 7/15/2016 2 Today • • • • • Recognizing structural information in speech Learning from generation Learning from text segmentation Types of structural information Segmentation in spoken corpora 7/15/2016 3 Recall: Discourse Structure for Speech Generation • Theoretical accounts (e.g. Grosz & Sidner ’86) • Empirical studies – Text vs. speech • How can they help in recognition? – Features to test • Acoustic/prosodic features • Lexical features 7/15/2016 4 Today • • • • • Recognizing structural information in speech Learning from generation Learning from text segmentation Types of structural information Segmentation in spoken corpora 7/15/2016 5 Indicators of Structure in Text • • • • Cue phrases: now, well, first Pronominal reference Orthography and formatting -- in text Lexical information (Hearst ‘94, Reynar ’98, Beeferman et al ‘99): – Domain dependent – Domain independent 7/15/2016 6 Methods of Text Segmentation • Lexical cohesion methods vs. multiple source – Vocabulary similarity indicates topic cohesion • Intuition from Halliday & Hasan ’76 • Features: – – – – – – Stem repetition Entity repetition Word frequency Context vectors Semantic similarity Word distance • Methods: – Sliding window 7/15/2016 7 – Lexical chains – Clustering – Combine lexical cohesion with other cues • Features – Cue phrases – Reference (e.g. pronouns) – Syntactic features • Methods – Machine Learning from labeled corpora 7/15/2016 8 Choi 2000: Text Segmentation • Implements leading methods and compares new algorithm to them on corpus of 700 concatenated documents • Comparison algorithms: – Baselines: • • • • • 7/15/2016 No boundaries All boundaries Regular partition Random # of random partitions Actual # of random partitions 9 – – – – Textiling Algorithm (Hearst ’94) DotPlot algorithms (Reynar ’98) Segmenter (Kan et al ’98) Choi ’00 proposal • Cosine similarity measure sim ( x, y ) f f f f x, j j y, j 2 j x, j j 2 y, j • Same: 1; no overlap 0 7/15/2016 10 • Similarity matrix rank matrix – Minimize effect of outliers – How likely is this sentence to be a boundary, compared to other sentences? • Divisive clustering based on – D(n) = sum of rank values (sI,j) of segment n/ inside area of segment n (j-i+1) – for i,j the sentences at the beginning and end of segment n • Keep dividing the corpus – until D(n) = D(n) - D(n-1) shows little change – Choi’s algorithm has best performance (912% error) 7/15/2016 11 Utiyama & Isahara ’02: What if we have no labeled data for our domain? 7/15/2016 12 Today • • • • • Recognizing structural information in speech Learning from generation Learning from text segmentation Types of structural information Segmentation in spoken corpora 7/15/2016 13 Types of Discourse Structure in Spoken Corpora • Domain independent – Sentence/utterance boundaries – Speaker turn segmentation – Topic segmentation • Domain dependent – Broadcast news – Meetings – Telephone conversations 7/15/2016 14 Today • • • • • Recognizing structural information in speech Learning from generation Learning from text segmentation Types of structural information Segmentation in spoken corpora 7/15/2016 15 Spoken Cues to Discourse Structure • Pitch range Lehiste ’75, Brown et al ’83, Silverman ’86, Avesani & Vayra ’88, Ayers ’92, Swerts et al ’92, Grosz & Hirschberg’92, Swerts & Ostendorf ’95, Hirschberg & Nakatani ‘96 • Preceding pause Lehiste ’79, Chafe ’80, Brown et al ’83, Silverman ’86, Woodbury ’87, Avesani & Vayra ’88, Grosz & Hirschberg’92, Passoneau & Litman ’93, Hirschberg & Nakatani ‘96 7/15/2016 16 • Rate Butterworth ’75, Lehiste ’80, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Amplitude Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Contour Brown et al ’83, Woodbury ’87, Swerts et al ‘92 7/15/2016 17 Finding Sentence and Topic Boundaries • Statistical, Machine Learning approaches with large segmented corpora • Features: – Lexical cues • Domain dependent • Sensitive to ASR performance – Acoustic/prosodic cues • Domain independent • Sensitive to speaker identify 7/15/2016 18 Shriberg et al ’00: Prosodic Cues • Prosody cues perform as well or better than textbased cues at sentence and topic segmentation -- and generalize better? • Goal: identify sentence and topic boundaries at ASR-defined word boundaries – CART decision trees provided boundary predictions – HMM combined these with lexical boundary predictions from LM 7/15/2016 19 Features – For each potential boundary location: • Pause at boundary (raw and normalized by speaker) • Pause at word before boundary (is this a new ‘turn’ or part of continuous speech segment?) • Phone and rhyme duration (normalized by inherent duration) (phrase-final lengthening?) • F0 (smoothed and stylized): reset, range (topline, baseline), slope and continuity 7/15/2016 20 • Voice quality (halving/doubling estimates as correlates of creak or glottalization) • Speaker change, time from start of turn, # turns in conversation and gender – Trained/tested on Switchboard and Broadcast News 7/15/2016 21 Sentence segmentation results • Prosodic features • Better than LM for BN • Worse (on transcription) and same for ASR transcript on SB • All better than chance • Useful features for BN • Pause at boundary ,turn/no turn, f0 diff across boundary, rhyme duration • Useful features for SB • Phone/rhyme duration before boundary, pause at boundary, turn/no turn, pause at preceding word boundary, time in turn 7/15/2016 22 Topic segmentation results (BN only): • Useful features • Pause at boundary, f0 range, turn/no turn, gender, time in turn • Prosody alone better than LM • Combined model improves significantly 7/15/2016 23 Next Class • Identifying Speech Acts • Reading: – This chapter of J&M is a beta version – Please keep a diary for: • Any typos • Any passages you think are hard to follow • Any suggestions • HW 3a due by class (2:40pm) 7/15/2016 24