What Works, What Doesn’t -- And What Needs to Work Lynette Hirschman

advertisement
What Works,
What Doesn’t -And What Needs to Work
Lynette Hirschman
Information Technology Center
The MITRE Corporation
USA
© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
MITRE
What Works...
0 If we believe Eric Brill, we should just collect and
annotate data...
- Since data collection seems to work better
than looking for new algorithms
0 What this really means is that data collection is
more cost-effective than funding research
- Similarly, we might conclude that waiting for
the next chip is more cost-effective than
creating faster algorithms
So we should all stop doing research
and look for data and wait...
© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
MITRE
Conversational Interaction: A Case Study
0 Speech researchers have always said, there’s no
data like more data
0 Many speech problems are, by definition, dataconstrained:
- Conversational interfaces require real(istic)
data on what people will say to machines in the
context of a specific application
- Such application-specific data tends to be
difficult (expensive) to collect
- It requires simulation of interaction with a
system, or a running system to collect data to
build the system…
0 How do we collect millions of sentences of
application specific data?
© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
MITRE
How to Collect Real Data Cheaply
0 Lesson from Victor Zue’s MIT Jupiter system:
- Put something out there that people want to
use: on-line weather information
- This can be done by bootstrapping from a
primitive system, using the collected data
- MIT has been very successful in collecting
data from real users; methodology now used by
the DARPA Communicator program
So to collect data to build a system,
we need a system that works
well enough for people to use it
© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
MITRE
Real Systems To Collect Real Data...
0 Building a usable system requires integration of
multiple technologies:
- We need ways to interface to real data
sources
- We need language understanding
- We need intelligible generation and synthesis
- We need dialogue management
- We need ways to apply the techniques to a
different problem domain (application
portability) because otherwise, we have to do
all this again for the next application
0 So collection of real data raises basic research
issues
© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
MITRE
Error rate log scale
Error Rate Over Time in ATIS
(Air Travel Info System)
Understanding easier than transcription
Limiting factor: understanding, not word error
10 0
Sentence
Transcription
10
SL Error
NL Error
Word Error
1
0
8
20
31
45
53
Time (months)
© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
MITRE
Conclusion: What Needs to Work
0 So we can’t just wait for data -- we need to
collect it
0 And to collect data, we need systems that work so
that real users will use them; they must be:
- Scalable to handle large amounts of data
- Robust so they keep working
- Fast, so people can stand to use them
- Interactive and engaging, so people want to use
them
0 And while we are at it, it would be nice if the
systems not only supported data collection, but
were able to learn interactively…
© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
MITRE
Download