Grounding

advertisement
The common conception of grounding is viewed as a method that comes up on occasion when a
novel term or object is introduced into an environment that people need to talk about. The method
underlying grounded communication consists of the transfer of some features of concepts and the
association of related terms. It is an important process that takes place on occasion, but is not so
pervasive in most communications or dialogs.
I think it is safe to say that grounding is not so pervasive in human communication because, as it
is normally seen in adult humans, it is seen in the context of the respective humans’ languages,
which are already large and congruent. The size and congruency of humans’ languages is due to:
1) a lifetime of learning
2) common social contexts
3) almost identical senses
a. rich high-quality senses
b. sight, sound, touch, temperature, and taste with tight sensory integration
c. ability to extract complex salient features with specialized brain centers
4) possibly similar representations (at least with a similar wetware substrate)
a. complex representations
b. robust representations
5) specialized language hardware (the language centers of the brain)
Whereas with robots involved in human-robot communication, their language is small and
incongruent (with humans’ language) due to:
1) very little learning time
2) different social contexts (robots don’t go to school, ride the bus, eat, play football, etc…)
3) nonhuman senses
a. sometimes rather poor quality senses
b. multi-modality and it’s integration is still a young research topic
c. extracting salient features from the environment is still a challenge
4) very different representations (at least with a very different substrate)
a. too simple
b. too brittle
5) generic hardware
Given the difficulties of the HRI environment, human-robot grounding is going to be different than
human-human grounding, and it is going to fill a large and important role. Rather than looking at
the surface-level phenomena that give rise to the common conception of grounding, we will need
to go back to the root of what grounding is in order to succeed in human-robot grounding. And at
the root, grounding is nothing less than the process that modifies an agent’s language so that its
communications are more likely to have the intended meaning (or effect). I am inclined to view it
as the _only_ major process that modifies language in this way, and, as such, I am inclined to
view language learning as a long coordinated process of grounding.
A general language learner is not intended to be the product of my research; that is a task too
large in scope. But I do believe that a language learner will need to be built on top of a solid
understanding of grounding. I hope that my research in grounding could serve as a foundation for
language learning, and not just as a representation of the common conception of grounding.
My research focus will be on grounding robot and human concepts through dialog, but I hope to
leave it compatible with grounding that occurs without dialog and even without intentional
communication. Dialog arguably represents both the most efficient and the most flexible method
for grounding. I would also argue that dialog is the preferred mode for grounding when grounding
is motivated and consciously planned. I expect to conduct experiments with both typed and
spoken dialog. Spoken dialog will be the more useful mode in the end, but difficulties with spoken
dialog will make it more expedient to conduct some preliminary research with typed dialog, and
typed grounding dialogs will have some usefulness in any case. My focus will be on grounding at
the semantic level, although in working with spoken dialog through automatic speech recognition,
I hope to also shed some light on channel grounding. Also, since I plan to conduct experiments
with real task-oriented spoken dialog systems, there is the possibility of some pragmatic-level
grounding.
Any communicative act can be the subject of grounding, and it is an open question as to how
grounding techniques will vary from one act to another. It is certainly possible that some aspects
of grounding are generic enough that they might apply to any concept, while other aspects of
grounding may depend on where the communicative act lies in some ontology. To answer this
question, I will experiment with grounding of several kinds of communicative acts as well as
several kinds of communicated concepts.
Another open question lies how to ground different features of language. Certainly grounding the
meanings of nouns will be investigated, by I will also investigate the meanings of other parts of
speech, as well as grounding grammatical relations. “Man bites dog” is a concept that can be
grounded and distinguished from the concept “Dog bites man”.
Robots have deficiencies and incongruities (w.r.t. humans) that cause difficulties in
communication and grounding, but they also have the potential for some advantages. They have:
1) basically perfect memories
2) within the confines of their representations, planning and learning are often very fast
3) they can be pre-programmed with some rather high-level knowledge
a. linguistic e.g. grammars, dictionaries, language models etc.
b. world e.g. encyclopedias, rules, cyc-like databases etc.
4) they have the potential to communicate experiences amongst themselves with perfect
fidelity and high speed (a concept learned by one robot is potentially a concept learned
by all robots)
5) they have access to extremely large stores of electronic information e.g. the world-wideweb and its semantic children
Advantage will of course be taken of these abilities to the extent feasible.
Early experiments will illicit grounding dialog from human-human conversations. I hope that data
and knowledge gleamed from these experiments with allow me to form hypotheses about how
grounding dialog takes place. Once I have these hypotheses, I will encode them into working
robots and test that code in further experiments.
The early human-human experiments will consist of short games between people operating in a
shared virtual space. The virtual space will most likely be an off-the-shelf 3-d game engine
environment. The participants will be given tasks that they will need to perform together in order
to motivate communication. More specifically, the participants in the game will need to impart
information to their co-participants in order to complete their own tasks. This will be accomplished
by giving them different abilities. Although they will be operating in a shared environment, they
will have different perceptions. The worlds that they operate in will be consistent with each other
but different. They will be allowed to speak to each other (through text messages, probably)
freely, but their language will be garbled by the system. The garbling will make the
communications only partly intelligible at first. The garbling will be done in such a way that
a) it is lossless
b) it is consistent
With these 2 criteria the garbled languages should be learnable through a process of grounding.
Hopefully this will be a good way to explore semantic-level grounding. I will also introduce
(separately or together) a semi-random garbling of tokens. This should help explore channel level
grounding. Skantz has done something like this already and Bohus has looked into the
effectiveness of various methods in spoken dialog systems.
The garbler makes use of a password generating program which generates pronounceable
passwords that are nonsense words.
Scratch Schedule:
Fall 2005
1) Start pilot experiments with human-human grounding in a constructed environment that
disrupts previously grounded concepts.
2) Revise experiment until it is deemed ready for recruited subjects.
3) Do experiments. Collect data.
4) Analyze data. Conclude
a. What terms should be known a priori by the system?
b. What strategies can a learner employ to deal with unknown or ambiguous terms?
c. What perceptions and world knowledge are necessary?
Spring 2006
5) Using this analysis, build a working model. This is a program that should function in place
of the human learner (akin to the blockworld but without much a priori domain knowledge)
(Caution: this step may be a major undertaking, and yet it continues to deviate from the
boeing project; to be able to pick up where Winograd left off would be pretty cool,
however) Find the limits of this model in further experiments.
6) Propose.
Summer/Fall 2006
7) These initial experiments use a text-based chat client. Speech, however, is much
preferred, but with unknown words it becomes quite troublesome. Should I deal with this
issue or finesse it?
8) Move the experiments/analysis/models closer to the real world. Go to virtual 3-d firstperson worlds or actual field experiments.
9) Broaden the domain from 2-d geometric picture drawing to search tasks and other robot
service tasks. Could some principles be generic enough that they could be plugged into
any dialog system.
Spring 2007
10) Distill experiences into coherent arguments; connect research to other related work and
show how it is related to other kinds of learning, and other kinds of dialog systems.
11) Write, write, write.
Summer 2007
12) Defend.
Download