Improving Automatic Meeting Understanding by Leveraging Meeting Participant Behavior Satanjeev Banerjee, Thesis Proposal. April 21, 2008 1 Using Human Knowledge Knowledge of human experts is used to build systems that function under uncertainty Often captured through in-lab data labeling Another source of knowledge: Users of the system Can provide subjective knowledge System can adapt to the users and their information needs Reduce data needed in the lab Technical goal: Improve system performance by automatically extracting knowledge from users 2 Domain: Meetings Problem: (Romano and Nunamaker, 2001) Large parts of meetings contain unimportant information Some small parts contain important information How to retrieve the important information? Impact goal: Help humans get information from meetings What information do people need from meetings? 3 Understanding Information Needs Survey of 12 CMU faculty members How often do you need information from past meetings? (Banerjee, Rose & Rudnicky, 2005) On average, 1 missed-meeting, 1.5 attended-meeting a month Task 1: Detect agenda item being discussed What information do you need? Missed-meeting: “What was discussed about topic X?” Attended-meeting: Detail question – “What was the accuracy?” Task 2: Identify utterances to include in notes How do you get the information? 4 From notes if available – high satisfaction If meeting missed – ask face-to-face Existing Approaches to Accessing Meeting Information Classic supervised Meeting recording and browsing learning (Cutler, et al,02), (Ionescu, et al, 02), (Ehlen, et al, 07), Automatic meeting understanding Unsupervised (Waibel, et al, 98). learning Meeting transcription (Stolcke, et al, 2004), (Huggins-Daines, et al, 2007) Meeting topic segmentation (Galley, et al, 2003), (Purver, et al, 2006) Activity recognition through vision (Rybski & Veloso, 2004) Meeting participants Action item detection (Ehlen, et al, 07) used after the meeting Our goal Extract high quality supervision ...from meeting participants (best judges of noteworthy info) ...during the meeting (when participants are most available) 5 Challenges for Supervision Extraction During the Meeting Giving feedback costs the user time and effort Creates a distraction from the user’s main task – participating in the meeting Our high-level approach Develop supervision extraction mechanisms that help meeting participants do their task 6 Interpret participants’ responses as labeled data Thesis Statement Develop approaches to extract high quality supervision from system users, by designing extraction mechanisms that help them do their own task, and interpret their actions as labeled data 7 Roadmap for the Rest of this Talk Review of past strategies for supervision extraction Approach: Passive supervision extraction for agenda item labeling Active supervision extraction to identify noteworthy utterances Success criteria, contribution and timeline 8 Past Strategies for Extracting Supervision from Humans Two types of strategies: Passive and Active Passive: System does not choose which data points user will label E.g.: Improving ASR from user corrections (Burke, et al, 06) Active: System chooses which data points user will label 9 E.g.: Have user label traffic images as risky or not (Saunier, et al, 04) Past strategies | Passive approach | Active approach | Summary Research Issue 1: How to Ask Users for Labels? Categorical labels Item scores/rank Associate desktop documents with task label (Shen, et al, 07) Label image of safe roads for robot navigation (Failes & Olsen, 03) Rank report items for inclusion in summary (Garera, et al, 07) Pick best schedule from system-provided choices (Weber, et al, 07) Feedback on features: 10 Tag movies with new text features (Garden, et al, 05) Identify terms that signify document similarity (Godbole, et al, 04) Past strategies | Passive approach | Active approach | Summary Research Issue 2: How to Interpret User Actions as Feedback? Depends on similarity between user and system behavior Interpretation simple when behaviors are similar E.g.: Email classification (Cohen 96) Interpretation may be difficult when user behavior and target system behavior are starkly different 11 E.g.: User corrections of ASR output (Burke, et al, 06) Past strategies | Passive approach | Active approach | Summary Research Issue 3: How to Select Data Points for Label Query (Active Strategy)? Typical active learning approach: Goal: Minimize number of labels sought to reach target error Approach: Choose data points most likely to improve learner E.g.: Pick data points closest to decision boundary (Monteleoni, et al, 07) Typical assumption: Human’s task is labeling System user’s task is usually not same as labeling data 12 Past strategies | Passive approach | Active approach | Summary Our Overall Approach to Extracting Data from System Users Goal: Extract high quality subjective labeled data from system users. Passive approach: Design the interface to ease interpretation of user actions as feedback Task: Label meeting segments with agenda item Active approach: Develop label query mechanisms that: 13 Query for labels while helping the user do his task Extract labeled data from user actions Task: Identify noteworthy utterances in meetings Talk Roadmap Review of past strategies for supervision extraction Approach: Passive supervision extraction for agenda item labeling Active supervision extraction to identify noteworthy utterances Success criteria, contribution and timeline 14 Passive Supervision: General Approach Goal: Design the interface to enable interpretation of user actions as feedback Recipe: Identify kind of labeled data needed Target a user task Find relationship between user task and data needed Build interface for user task that captures the relationship 15 Past strategies | Passive approach | Active approach | Summary Supervision for Agenda Item Detection Automatically detect agenda item being discussed Labeled data User task Meeting segments labeled with agenda item Note taking during meetings Relationship 1. Most notes refer to discussions in preceding segment 2. A note and its related segment belong to same agenda item Note taking interface 1. Time stamp speech and notes 2. Enable participants to label notes with agenda item 16 Past strategies | Passive approach | Active approach | Summary Insert Agenda Speech recognition research status Topic detection research status FSGs Shared note taking area Personal notes – not shared 17 Past strategies | Passive approach | Active approach | Summary Getting Segmentation from Notes Speech recognition research status 300 Topic detection research status 700 Speech recognition research status 18 Note’s time stamp Note’s agenda item box 100 Speech recognition research status 200 Speech recognition research status 400 Topic detection research status 600 Topic detection research status 800 Speech recognition research status 950 Speech recognition research status Past strategies | Passive approach | Active approach | Summary Evaluate the Segmentation How accurate is the extracted segmentation? Compare to human annotator Also compare to standard topic segmentation algorithms Evaluation metric: Pk For every pair of time points k seconds apart, ask: 19 Are the two points in the same segment or not, in the reference? Are the two points in the same segment or not, in the hypothesis? Pk = # time pairs where hypothesis and reference disagree Total # of time point pairs in the meeting Past strategies | Passive approach | Active approach | Summary SmartNotes Deployment in Real Meetings Has been used in 75 real meetings 16 unique participants overall 4 sequences of meetings 20 Sequence = 3 or more longitudinal meetings Sequence Num meetings so far 1 (ongoing) 30 2 27 3 (ongoing) 8 4 (ongoing) 4 Remaining 6 Past strategies | Passive approach | Active approach | Summary Data for Evaluation Data: 10 consecutive related meetings Avg meeting length 31 minutes Avg # agenda items per meeting 4.1 Avg # participants per meeting 3.75 (2 to 5) Avg # notes per agenda item 5.9 Avg # notes per meeting 25 Reference segmentation: Meetings segmented into agenda items by two different annotators. Inter-annotator agreement: Pk = 0.062 21 Past strategies | Passive approach | Active approach | Summary Results Baseline: TextTiling (Hearst 97) State of the art: (Purver, et al, 2006) Pk 0.39 Not significant 0.21 0.26 0.06 Unsupervised baseline 22 Segmentation Purver, et al, 2006 Inter annotator from SmartNotes agreement Pastdata strategies | Passive approach | Active approach | Summary Does Agenda Item Labeling Help Retrieve Information Faster? 2 10-minute meetings, manually labeled with agenda items 5 questions prepared for each meeting Questions prepared without access to agenda items 16 subjects, not participants of the test meetings Within subjects user study Experimental manipulation: Access to segmentation versus no segmentation 23 Past strategies | Passive approach | Active approach | Summary Minutes to Complete the Task 10 7.5 With agenda item labels 24 Without agenda item labels Past strategies | Passive approach | Active approach | Summary Shown So Far Method of extracting meeting segments labeled with agenda item from note taking Resulting data produces high quality segmentation Likely to help participants retrieve information faster Next: Learn to label meetings that don’t have notes 25 Past strategies | Passive approach | Active approach | Summary Proposed Task: Learn to Label Related Meetings that Don’t Have Notes Plan: Implement language model based detection similar to (Spitters & Kraaiij, 2001). 26 Train agenda item – specific language models on automatically extracted labeled meeting segments Perform segmentation similar to (Purver, et al, 06) Label new meeting segments with agenda item whose LM has the lowest perplexity Past strategies | Passive approach | Active approach | Summary Proposed Evaluation Evaluate agenda item labeling of meeting with no notes 3 real meeting sequences with 10 meetings each For each meeting i in each sequence Train agenda item labeler on automatically extracted labeled data from previous meetings in same sequence Compute labeling accuracy against manual labels Show improvement in accuracy from meeting to meeting Baseline: Unsupervised segmentation + text matching between speech and agenda item label text Evaluate effect on retrieving information Ask users to answer questions from each meeting 27 With agenda item labeling output by improved labeler, versus With agenda item labeling output by baseline labeler Past strategies | Passive approach | Active approach | Summary Talk Roadmap Review of past strategies for supervision extraction Approach: Passive supervision extraction for agenda item labeling Active supervision extraction to identify noteworthy utterances Success criteria, contribution and timeline 28 Active Supervision System goal: Select data points, and query user for labels In active learning, human’s task is to provide the labels But system user’s task may be very different from labeling data General approach Design query mechanisms such that 1. Choose data points to query by balancing 2. 29 Each label query also helps the user do his own task The user’s response to the query can be interpreted as a label Estimated benefit of query to user Estimated benefit of label to learner Past strategies | Passive approach | Active approach | Summary Task: Noteworthy Utterance Detection Goal: Identify noteworthy utterances – utterances that participants would include in notes Labeled data needed: Utterances labeled as either “noteworthy” or “not noteworthy” 30 Past strategies | Passive approach | Active approach | Summary Extracting Labeled Data Noteworthy utterance detector Label query mechanism Helps participants take notes Interpret participants’ acceptances / rejections as “noteworthy” / “not noteworthy” labels Method of choosing utterances for suggestion 31 Completed Notes assistance: Suggest utterances for inclusion in notes during the meeting Proposed Proposed Benefit to user’s note taking Benefit to learner (detector) from user’s acceptance/rejection Past strategies | Passive approach | Active approach | Summary Proposed: Noteworthy Utterance Detector Binary classification of utterances as noteworthy or not Support Vector Machine classifier Features: Lexical: Keywords, tf-idf, named entities, numbers Prosodic: speaking rate, f0 max/min Agenda item being discussed Structural: Speaker identity, utterances since last accepted suggestion Similar to meeting summarization work of (Zhu & Penn, 2006) 32 Past strategies | Passive approach | Active approach | Summary Extracting Labeled Data Noteworthy utterance detector Label query mechanism Helps participants take notes Interpret participants’ acceptances / rejections as “noteworthy” / “not noteworthy” labels Method of choosing utterances for suggestion 33 Completed Notes assistance: Suggest utterances for inclusion in notes during the meeting Proposed Proposed Benefit to user’s note taking Benefit to learner (detector) from user’s acceptance/rejection Past strategies | Passive approach | Active approach | Summary Mechanism 1: Direct Suggestion Fix the problem with emailing 34 Past strategies | Passive approach | Active approach | Summary Mechanism 2: “Sushi Boat” pilot testing has been successful most participants took twenty minutes ron took much longer to finish tasks there was no crash 35 Past strategies | Passive approach | Active approach | Summary Differences between The Mechanisms Direct suggestion User can provide accept/reject label Higher cost for the user if suggestion is not noteworthy Sushi boat suggestion 36 User only provides accept labels Lower cost for the user Past strategies | Passive approach | Active approach | Summary Will Participants Accept Suggestions? Wizard of Oz study Wizard listened to audio and suggested text 6 meetings – 2 direct mechanism, 4 sushi boat mechanism Num Offered Num offered per min Num accepted Num accepted per min % accepted Direct suggestion 50 0.6 17 0.2 34.0 Sushi boat 273 1.8 85 0.6 31.0 37 Past strategies | Passive approach | Active approach | Summary Percentage of Notes from Sushi Boat Meeting 38 Num lines of notes Num lines from Sushi boat % lines from Sushi boat 1 7 6 86% 2 24 20 83% 3 32 29 91% 4 32 30 94% Total/Avg 95 85 89% Past strategies | Passive approach | Active approach | Summary Extracting Labeled Data Noteworthy utterance detector Label query mechanism Helps participants take notes Interpret participants’ acceptances / rejections as “noteworthy” / “not noteworthy” labels Method of choosing utterances for suggestion 39 Completed Notes assistance: Suggest utterances for inclusion in notes during the meeting Proposed Proposed Benefit to user’s note taking Benefit to learner (detector) from user’s acceptance/rejection Past strategies | Passive approach | Active approach | Summary Method of Choosing Utterances for Suggestion One idea: Pick utterances that either have high benefit for detector, or high benefit for the user Most beneficial for detector: Least confident utterances Most beneficial for user: Noteworthy utterances with high conf Does not take into account user’s past acceptance pattern Our approach: 40 Estimate and track user’s likelihood of acceptance Pick utterances that either have high detector benefit, or is very likely to be accepted Past strategies | Passive approach | Active approach | Summary Estimating Likelihood of Acceptance Features: Estimated user benefit of suggested utterance Benefit(utt) = T(utt) – R(utt)), if utt is noteworthy according to detector – R(utt)), if utt is not noteworthy according to detector where T(utt) = time to type utterance, R(utt) = time to read utterance # suggestions, acceptances, rejections in this and previous meetings Amount of speech in preceding window of time Time since last suggestion Combine features using logistic regression Learn per participant from past acceptances/rejections 41 Past strategies | Passive approach | Active approach | Summary Overall Algorithm for Choosing Utterances for Direct Suggestion Given: An utterance and a participant Decision to make: Suggest utterance to participant? Estimate benefit of utterance label to detector Estimate likelihood of acceptance Combine > threshold? No Don’t suggest Yes 42 Suggest utterance to participant Past strategies | Passive approach | Active approach | Summary Learning Threshold and Combination Wts Train on WoZ data Split meetings into development and test set For each parameter setting Select utterances for suggestion to user in development set Compute acceptance rate by comparing against those accepted by the user in the meeting Of those shown, use acceptances and rejections to retrain utterance detector Evaluate utterance detector on test set Pick parameter setting with acceptable tradeoff between utterance detector error rate and acceptance rate 43 Past strategies | Passive approach | Active approach | Summary Proposed Evaluation Evaluate improvement in noteworthy utterance detection 3 real meeting sequences with 15 meetings each Initial noteworthy detector trained on prior data Retrain over first 10 meetings by suggesting notes Test over next 5 Evaluate: After each test meeting, ask participants to grade automatically identified noteworthy utterances Baseline: Grade utterances identified by prior-trained detector Evaluate effect on retrieving information Ask users to answer questions from test meetings 44 With utterances identified by detector trained on 10 meetings, vs. With utterances identified by prior-trained detector Past strategies | Passive approach | Active approach | Summary Talk Roadmap Review of past strategies for supervision extraction Approach: Passive supervision extraction for agenda item labeling Active supervision extraction to identify noteworthy utterances Success criteria, contribution and timeline 45 Thesis Success Criteria Show agenda item labeling improves with labeled data automatically extracted from notes Show participants can retrieve information faster Show noteworthy utterance detection improves with actively extracted labeled data 46 Show participants retrieve information faster Past strategies | Passive approach | Active approach | Summary Expected Technical Contribution Framework to actively acquire data labels from end users Learning to identify noteworthy utterances by suggesting notes to meeting participants. Improving topic labeling of meetings by acquiring labeled data from note taking 47 Past strategies | Passive approach | Active approach | Summary Summary: Tasks Completed/Proposed Agenda item detection through passive supervision Design interface to acquire labeled data Completed Evaluate interface and labeled data obtained Completed Implement agenda item detection algorithm Proposed Evaluate agenda item detection algorithm Proposed Important utterance detection through active learning 48 Implement notes suggestion interface Completed Implement SVM classifier Proposed Evaluate the summarization Proposed Past strategies | Passive approach | Active approach | Summary Proposal Timeline Time frame Scheduled task Apr – Jun 08 Iteratively do the following: Continue running Wizard of Oz studies in real meetings to finetune label query mechanisms. Analyze WoZ data to identify features for the automatic summarizer In parallel, implement baseline meeting summarizer Jul – Aug 08 Deploy online summarization and notes suggestion system in real meetings, and iterate on its development based on feedback Sep – Oct 08 Upon stabilization, perform summarization user study on test meeting groups Nov 08 Implement agenda detection algorithm Dec 08 Perform agenda detection based user study Jan – Mar 09 Write dissertation Apr 09 Defend thesis 49 Past strategies | Passive approach | Active approach | Summary Thank you! 50 Acceptances Per Participant Participant Num sushi boat lines accepted 1 87 2 4 3 6 4 8 Totals 105 51 % of acceptances 82.9% 3.8% 5.7% 7.6% % of notes in 5 prior meetings 90% 10% 0% Did not attend