What Works, What Doesn’t -And What Needs to Work Lynette Hirschman Information Technology Center The MITRE Corporation USA © 2001 The MITRE Corporation. ALL RIGHTS RESERVED. MITRE What Works... 0 If we believe Eric Brill, we should just collect and annotate data... - Since data collection seems to work better than looking for new algorithms 0 What this really means is that data collection is more cost-effective than funding research - Similarly, we might conclude that waiting for the next chip is more cost-effective than creating faster algorithms So we should all stop doing research and look for data and wait... © 2001 The MITRE Corporation. ALL RIGHTS RESERVED. MITRE Conversational Interaction: A Case Study 0 Speech researchers have always said, there’s no data like more data 0 Many speech problems are, by definition, dataconstrained: - Conversational interfaces require real(istic) data on what people will say to machines in the context of a specific application - Such application-specific data tends to be difficult (expensive) to collect - It requires simulation of interaction with a system, or a running system to collect data to build the system… 0 How do we collect millions of sentences of application specific data? © 2001 The MITRE Corporation. ALL RIGHTS RESERVED. MITRE How to Collect Real Data Cheaply 0 Lesson from Victor Zue’s MIT Jupiter system: - Put something out there that people want to use: on-line weather information - This can be done by bootstrapping from a primitive system, using the collected data - MIT has been very successful in collecting data from real users; methodology now used by the DARPA Communicator program So to collect data to build a system, we need a system that works well enough for people to use it © 2001 The MITRE Corporation. ALL RIGHTS RESERVED. MITRE Real Systems To Collect Real Data... 0 Building a usable system requires integration of multiple technologies: - We need ways to interface to real data sources - We need language understanding - We need intelligible generation and synthesis - We need dialogue management - We need ways to apply the techniques to a different problem domain (application portability) because otherwise, we have to do all this again for the next application 0 So collection of real data raises basic research issues © 2001 The MITRE Corporation. ALL RIGHTS RESERVED. MITRE Error rate log scale Error Rate Over Time in ATIS (Air Travel Info System) Understanding easier than transcription Limiting factor: understanding, not word error 10 0 Sentence Transcription 10 SL Error NL Error Word Error 1 0 8 20 31 45 53 Time (months) © 2001 The MITRE Corporation. ALL RIGHTS RESERVED. MITRE Conclusion: What Needs to Work 0 So we can’t just wait for data -- we need to collect it 0 And to collect data, we need systems that work so that real users will use them; they must be: - Scalable to handle large amounts of data - Robust so they keep working - Fast, so people can stand to use them - Interactive and engaging, so people want to use them 0 And while we are at it, it would be nice if the systems not only supported data collection, but were able to learn interactively… © 2001 The MITRE Corporation. ALL RIGHTS RESERVED. MITRE