Marvin-GRO - MIT Media Lab

advertisement
A Computational Model of Human Affective Memory
and Its Application to Mindreading
Hugo Liu
MIT Media Laboratory
20 Ames Street #320D
Cambridge, MA 02139, USA
+1 (617) 253-5334
hugo@media.mit.edu
ABSTRACT
The cognitive science and artificial intelligence communities are
both interested in the problem of how humans infer the mental
states of others, known as mindreading. Whereas cognitive science is interested in a deeper understanding of how humans mindread, artificial intelligence is interested in imparting mindreading
capabilities to social computers. Current AI approaches to mindreading are weak, however. Techniques such as user profiling
and collaborative filtering try to predict user preferences and actions, but do so very weakly. In this paper, we propose a deeper
model of a person in terms of their system of attitudes, and implement the system called PERSONA. Grounded in the episodic
and reflexive memories of a person, PERSONA uses saliencymediated associative learning to automatically acquire a human
affective memory model from a corpus of personal text, such as a
weblog. Applying this model, PERSONA performs affective
mindreading to predict a person’s likely affective response given a
new situation or event. In addition to memory-based prediction
alone, the system also analyzes the attitudes of a person’s Minskian imprimers and performs conceptual analogy to make predictions more robust. An evaluation of PERSONA indicates that it is
a promising approach, comfortably outperforming baselines; however, because affective communication is fairly fail-hard, more
refinement would be needed before this system can be applied to a
socialize a computer.
1. WHAT IS MINDREADING
INTERESTING?
Recently there has been much ado in the cognitive science community about the human faculty for Theory of Mind (ToM), otherwise known as mindreading. And no, it does not refer to psychic powers as one might guess. ToM and mindreading refer to
an animal’s capability for reflecting on its own mental states –
attitudes, beliefs, and desires – and modeling the mental states of
others. It is believed that humans evolved specialized mindreading abilities absent in other primates (Povinelli and Preuss, 1995),
and that the human mindreading faculty makes human social
learning uniquely powerful – inter alia, the rapid learning of
words (Bloom, 2002), and the learning of goals and values (Minsky, forthcoming). Cognitive scientists have gone about the study
of mindreading in many ways, including: by evolutionary comparison, e.g. (Call and Tomasello, 1996); by examining linked phenomena like imitation (Meltzoff and Gopnik, 1993); by studying
deficits of ToM in autistic children; by speculating on potential
neural substrates for ToM such as mirror neurons (Gallese and
Goldman, 1998); and by debating how it works, i.e. Simulation
Theory of ToM versus Theory Theory of ToM.
Across the divide, artificial intelligence researchers are also thinking about mindreading. However, being on the whole more
pragmatic, this community is more interested in imparting mindreading capabilities to computers and robots to create more sociable human-computer interaction (Nass et al., 1994). While
some results from the cognitive science literature is interesting for
to the AI community, such as the recent find of special action
recognition neurons called mirror neurons in macaque monkeys
(Gallese et al., 1996), we think it is fair to say that behavioral and
bottom-up approaches to mindreading is still far away from producing a compelling and predictive cognitive model that could
empower social computers.
Despite lacking a complete cognitive model of mindreading, the
AI community has been working on weaker forms of mindreading
for many years. User modeling, for example, attempts to model a
human user’s preferences and mental context in hopes of creating
more natural and personal interactions between human and computer. One common approach in user modeling is user profiling,
whereby users are modeled by their demographic information,
usually obtained via explicit questionnaires. Applying a small set
of rules, these user demographics can be mapping into predicted
user preferences. Another common approach in user model is
collaborative filtering, in which patterns of user actions are modeled against those of a whole user community. While these forms
of user modeling have enjoyed some success, particularly in product recommendation (Resnick and Varian, 1997), these approaches are too weak to be useful for socializing computers. User profiling oversimplifies people as obeying demographic lines, while
collaborative filtering is a purely statistical approach offering little
insight into a user’s beliefs or preferences; thus, user profiling and
collaborative filtering are weak mindreaders.
In this work, we explore how mindreading can be deepened in a
novel way: by considering knowledge of person’s life experiences
over a long period of time, and applying this knowledge to predict
how a person might respond in new situations.
In order to create a more complete and more intimate model of a
person, we would necessarily need a corpus of knowledge about
that person’s beliefs, desires, goals, and experiences. While it
may be possible to acquire this directly through interactions with
the user, building a sufficiently rich model of a person might require cumbersome interactions; thus the approach taken by this
work is to try to infer such a model automatically from personal
texts such as a journal, or a transcript of a person’s beliefs and
ideas, as might be manifested in an interview. Through our initial
experience, we realized that specific beliefs and goals would be
too hard to accurately infer from unconstrained natural language
text, but we did not want to sacrifice breadth of knowledge for
specificity, so instead, we decided to try to infer just the emotions,
attitudes and dispositions associated with these beliefs and goals.
Using this body of knowledge, we construct a mechanism to predict the affective context of a person in reaction to a topic, situation or event. We dub this task affective mindreading, with affect
referring to emotions, dispositions, and attitudes. If successful,
we believe that this type of mechanism can have great implications for sociable computers.
Our approach can be summarized as follows. From text, we wish
to infer a person’s emotions, attitudes, and dispositions toward
particular people, topics, events, and situations, at different times
in their lives, and to record these into a model of a person’s affective memory. By interpolating and extrapolating from this affective memory, a computer can perform affective mindreading – that
is to say, given a new topic, event, or situation, the system will try
to predict a person’s affective response. To implement this approach, we built PERSONA, a system that creates a model of a
person’s affective memory from personal texts, and exploits this
model for affective mindreading.
The rest of this paper is organized as follows. First, we present a
computational model of human affective memory, a model of
saliency-mediated associative learning from personal texts, and
discuss the implementation of the PERSONA model learner.
Second, we explore how the affective memory model is used in
conjunction with conceptual analogy to perform affective mindreading.
Third, we describe an experiment to evaluate
PERSONA in an affective mindreading task. Fourth, we reconnect with the literature and address how affective mindreading
aids social learning tasks in humans and computers.
2. A MODEL OF AFFECTIVE MEMORY
In the previous section, we motivated the development of a computational model of human affective memory by suggesting that
such a model would allow for more advanced mindreading by
computers than can be achieved through typical knowledgeimpoverished user modeling techniques such as profiling. We
begin this section with the caveat that the computational model
described here is not claimed or intended to be cognitively motivated. We attempt to model human affective memory only insofar
as it is feasible to infer from personal texts, and only insofar as it
is useful to the task of affective mindreading – predicting a person’s attitudes and dispositions toward a particular subject. In
this section, we first propose the two-part episode-reflex model of
human affective memory and connect it to the literature. Second,
we introduce saliency-mediated associative learning as a strategy
for automatic model acquisition from personal texts. Third, we
discuss how such a model has been implemented in PERSONA.
2.1 The Episode-Reflex Model
Of Human Affective Memory
Of the different types of human memory that have been studied,
two are of great interest to us as tools for modeling affective
memory: long-term episodic memory, and reflexive memory. In
PERSONA, we combine the strengths of two memories to form
the episode-reflex model.
2.1.1 Affective long-term episodic memory
Long-term episodic memory (LTEM) is a relatively stable
memory based on experiences and events in context. An episode
can be thought of as a coherent packet of events with a timesequence. Episodes are generally content-addressable, meaning
that they can be retrieved through a variety of cues based on the
sensory, affective, or semantic content of the episode, such as a
sight, sound, emotion, or location. LTEM can be very powerful
because even events which happen only once can become salient
memories and serve to recurrently influence a person’s future
thinking. If we hope to accurately predict a person’s affective
response to a future situation, we must account for the influence
of these one-time salient episodes. Even though our aim is to
model only the affective aspect of human memory, we cannot, in
the case of LTEM, completely disregard the non-affective aspects
because they may serve as cues for retrieval. Consequently, our
affective LTEM model represents episodes with some semantic
structure and several types of context. In PERSONA, an affective
LTEM episode has the following components:

A collection of the subevents of an episode that are salient to the evocation of the overall affect of the episode,
sequentially ordered.

If possible, the perceived root cause of the affective response in that episode are extracted

Possibly salient contexts: the date, the location, the topic

An affect valence score associated with the episode

Salience score of episode, measuring the perceived importance of the memory
The motivation of extracting only salient subevents and extracting
the perceived root cause of the episode make learning more precise, and will be discussed further in the next subsection. In addition to describe the thematic structure of the episode, we also
encode several other types of contextual cues with the episode.
As suggested by Tulving’s encoding specificity hypothesis
(1983), retrieval of an episode is more likely when current conditions match the encoding conditions, thus it is important to remember the salient contexts surrounding an episode as completely
as possible. Finally, because our main focus is on being able to
recall the attitudes experienced during an episode, we associate an
affect valence score, to be described in a later subsection.
2.1.2 Affective reflexive memory
While long-term episodic memory deals in salient, one-time
events and must generally be consciously recalled, reflexive
memory is full of automatic, instant, almost instinctive associations. Whereas LTEM is content-addressable and requires pattern-matching the current situation with that of the episode, reflexive memory is like a simple hash-table that directly associates
a cue with a reaction, thereby abstracting away the content.
Tulving equates LTEM with “remembering” and reflexive
memory with “knowing” (Tulving, 1983).
In humans, reflexive memories are generally formed through repeated exposures rather than one-time events, though subsequent
exposures may simply be recalls of a particularly strong primary
exposure (Locke, 1689). In addition to frequency of exposures,
the strength of an experience is also considered. Complementing
the event-specific Affective LTEM with an event-independent
affective reflexive memory makes sense because there may not
always be an appropriate distinct episode which shapes our appraisal of a situation; often, we react reflexively – our present
attitudes deriving from such amalgamation of our past experiences
now collapsed into something instinctive.
Because humans undergo forgetting, belief revision, and theory
change, update policies for human reflexive memory may actually
be quite complex. In PERSONA, we adopt a more simplistic
representation and update policy that is not cognitively motivated,
but instead, exploits the ability of a computer system to compute
an affect valence at runtime. An entry in the memory is as follows:

The key to the entry is one of two types:
o
1) A simple conceptual cue whose semantic
type belongs to the following ontology: a person, an action, an object, an activity, or a
named event; or
o
2) A simple conceptual cue Bayesian conditioned on the presence of a discourse topic.

The value of the entry is a list of exposures.

An exposure X is the following triple:
o
date of exposure;
o
affect valence score of exposure, V;
o
saliency of exposure S
To read off the current valence associated with a conceptual cue,
the formula given in Eq. (1) is applied.
enddate
1
log b max( n, b)   X ( Et )V ( Et )
n
t startdate

2.2 Saliency-Mediated Associative Learning
With origins in Aristotle, classical associative learning was popularized as an explanation of many brain processes beginning in the
17th century by several British philosophers, including John Locke
and James Mill. However, after the rise and fall of popularity of
Pavlovian classical conditioning, many in the cognitive science
community now dismiss associative learning as inadequate. In the
study of word learning in children, Paul Bloom reported that contrary to Locke’s assertion that repetition is necessary to associate
words with sights and sounds, children actually learn word meanings error-free, and without repetition, in a process dubbed fastmapping (Bloom, 2002). While it may seem that associative
learning is being debunked as a plausible theory of cognitive
learning, we suggest that associative learning can in many cases,
be salvaged, given that it is appropriately structured. In Bloom’s
research on word learning in children for example, error-free fast
mapping is possible because the child uses the teacher’s mental
and intentional context to disambiguate reference, and once disambiguated with sufficient confidence, the association between
word and meaning can then be made with greater confidence.
With a similar sentiment against weak associationism, Marvin
Minsky argues that simply remembering everything is not equivalent to learning. The defining criteria for learning is knowing precisely what is learned. (Minsky, forthcoming) Or, formulated
another way, learning involves credit assignment (Sutton, 1984).
The lesson to be learned from this (pun intended) is that associative learning is not useful unless it is precise. In other words, our
mechanism of learning should not involve solely semantically
weak statistical methods, but instead, perhaps incorporating some
external knowledge and heuristics to gain additional precision. In
particular, we see the identification of saliency and salient events
as a mechanism to focus associations. We dub this, saliencymediated associative learning (SMAL). SMAL is similar to credit
assignment, except that salience is a heuristically generated score
rather than an assertion, thus making it amenable to statistical
learning methods.
The learning mechanism of each of the two parts of the proposed
affective memory model incorporates saliency to focus learning.
(1)
where n = the number of exposures of the concept
This formula returns the valence of a conceptual cue averaged
over a particular time period. The term, log b max( n, b) , rewards frequency of exposures, while the term, X ( E t ) , rewards
the saliency of an exposure. In this simple model of an affective
reflexive memory, we do not consider phenomena such as belief
revision, reflexes conditioned over contexts, or forgetting.
In summary, we have motivated and characterized two components to our computational model of human affective memory: an
episodic component emphasizing the affect of one-time salient
memories, and a reflexive component, emphasizing instinctive
reactions to conceptual cues that are conditioned over time. In the
following subsection, we propose how this two-part model of
human affective memory can be acquired from personal texts via
saliency-mediated associative learning.
In the affective long-term episodic memory model, affect is associated with particularly salient subevents rather than the whole of
the episode. In addition, the perceived root cause of the affective
response in the episode are extracted or inferred when possible.
Finally, a saliency score is given to the whole of the episode, to
rate its importance and impact to the person being modeled. These
three features together focus the associative learning mechanism,
and help to answer the question, “what should be learned.” Of
course, identifying saliency, being a flavor of the credit assignment problem, is not an easy task, especially over domain unconstrained texts. In the next subsection, we explain the role that a
large common sense knowledge base plays in this important subtask.
In the affective reflexive memory model, associations are not
made at the word-level, which would tend to conflate the affect of
too many different senses of a word into the same entry, but rather, conceptual cues are those first-order or second-order phrases
which follow the ontology: a person, an action, an object, an activity, or a named event. The choice of ontology reflects the types
of salient concepts that we believe people typically form stable
attitudes about. In addition, to embrace the possibility that con-
cepts may have different affect valences under different contexts,
an entry in the affective reflexive model may be keyed on a concept Bayesian conditioned over a particular discourse contexts.
The difficulty in identifying the contexts which dictate a conceptual cue’s interpretation is discussed further in the evaluation of
PERSONA. Finally, each exposure is associated with a saliency
score, and conceptual cues with more entries are assignment more
salient valence scores. By putting constraints on the types of concepts that can learn affective associations, by considering contexts
is learning affect associations, and by valuating the saliency of the
strength and frequency of exposures, the reflexive memory model
seeks to incorporate as much precision as possible in its associative learning.
Having proposed the episode-reflex model of human affective
memory and saliency-mediated associative learning as a mechanism for model acquisition, the next section discusses how such
the model and learning mechanisms were implemented in
PERSONA.
2.3 Model Implementation in PERSONA
In proposing the model and learning mechanism, several subtasks
where implied but not addressed explicitly, such as 1) having a
source of personal texts meeting certain suitability criteria, 2) a
model for measuring affect valence, 3) a mechanism for judging
the affect of episodes and text in general, and 4) methods for determining saliency. These implementation issues are discussed in
the ensuing subsections, following by a start-to-finish architectural walkthrough of PERSONA’s model learner.
2.3.1 Suitability Criteria for Personal Texts
The suitability of texts for model generation is subject to the following criteria. First, texts should be first-person, subjective,
autobiographical narratives. Personal emotions and attitudes are
either not easily accessible, or not sufficiently detailed in thirdperson texts or objective writing. Second, texts should explore a
breadth of topics because an insufficiently broad model gives a
poor and disproportional sampling of a person and would be difficult to use to perform affective mindreading. Third, texts should
cover everyday events, situations, and topics, because that is the
optimal discourse domain of recognition of the mechanism with
which we will judge the affect of text. Fourth, texts should be
organized into episodes, occurring over a substantial period of
time relative to the length of a person’s life.
With these criteria in mind, an ideal source of personal text is a
personal journal. While private journals would be preferred for
their candor, publicly viewable journals in the form of recently
popular weblogs are still a good source of personal texts, though
the generated model may exhibit a bias toward the “public” personality of the person being modeled. A less good but workable
source of personal texts are interview transcripts. Interviews satisfy most of the criteria for personal text selection with the exception that interviews are not reliably organized around episodes,
and may represent a disproportionally narrow set of topics. Still,
they are suitable substrates for model generation, provided that
the limitations of the resulting model are realized.
2.3.2 Representing Affect using the PAD Model
Affect valence pervading the proposed models can take one of
two potential representations. They can take the form of an ontology of basic canonical emotions, represented prominently by
Paul Ekman’s six basic emotions (happy, sad, angry, scared, dis-
gusted, and surprised) (1993). Or, they can take the form of a
dimensional model, represented prominently by Albert Mehrabian’s Pleasure-Arousal-Dominance (PAD) model (1995). In this
model, the three nearly independent dimensions are PleasureDispleasure,
Arousal-Nonarousal,
and
DominanceSubmissiveness. Each dimension can assume values from –100%
to +100%, and a PAD valence score is a 3-tuple of these values
(e.g. [-.51, .59, .25] might represent anger).
We chose the dimensional PAD model over the discrete canonical
emotion model because PAD represents a sub-symbolic, continuous account of affect, where different symbolic affects can be
unified along one of the three dimensions. This model has robustness implications for the affective classification of text. For
example, in the affective reflexive memory, a conceptual cue may
be variously associated with anger, fear, and surprise, which can
be unified along the Arousal dimension of the PAD model, thus
enabling the affect association to be coherent and focused.
2.3.3 Affective Appraisal of Personal Text
Judging the affect of a personal text has three chief considerations. First, the mechanism for judging the affect should be robust and comprehensive enough to correctly appraise the affect of
a breadth of concepts. Second, to aid in the determination of
saliency, the mechanism must be able to appraise the affect of
very little text, such as on the sentence-level. Third, the mechanism should recognize specific emotions rather than convolving
affect onto any single dimension.
Several common approaches fail to meet the criteria. The naïve
keyword spotting approach looks for surface language features
like keywords. However, this approach is not acceptably robust
on its own because affect is often conveyed without mood keywords. Statistical affect classification using statistical learning
models such as latent semantic analysis (LSA) generally require
large inputs for acceptable accuracy because it is a semantically
weak method. Hand-crafted models and rules are not broad
enough to analyze the desired breadth of phenomena.
To analyze personal text with the desired robustness, granularity,
and specificity, we employ a model of textual affect sensing using
real-world knowledge, proposed by Hugo Liu et al. (2003). In
this model, defeasible knowledge of everyday people, things,
places, events, and situations is leveraged to sense the affect of a
text by evaluating the affective implications of each event or situation. For example, to evaluate the affect of “I got fired today,”
this model evaluates the consequences of this situation and characterizes it using negative emotions such as fear, sadness, and
anger. This model, coupled with a naïve keyword spotting approach, provides rather comprehensive and robust affective classification. Since the model uses knowledge rather than word statistics, it is semantically strong enough to evaluate text on the sentence level, classifying each sentence into a six-tuple of valences
(ranging from a value of 0.0 to 1.0) for each of the six basic Ekman emotions. These emotions are then mapped to the PAD
model.
One point of potential paradox should be addressed. The realworld knowledge-based model of affect sensing is based on defeasible commonsense knowledge from the Open Mind Commonsense corpus (Singh et al., 2002), which is in turn, gathered
from a web community of some 11,000 teachers. Therefore, the
affective assessment of text made by such a model represents the
judgment of a typical person. However, sometimes a personal
judgment of affect is contradicted by the typical judgment. Thus,
it would seem paradoxical to attempt to learn that a situation has a
personally negative affect when the typical person judges the situation as positive. To overcome this difficulty, we implement, in
parallel, a mood keyword-spotting affect sensing mechanism to
confirm and contradict the assessment of the primary model. In
addition, we make the assumption that although a personal affect
judgment may sometimes deviate from that of a typical person, it
will deviate most of the time. The implication of this is that on a
slightly larger granularity than a sentence, the affective appraisal
is likely to be accurate. To assess the affect of a sentence, we
factor in the affective assessment of not only the sentence itself,
but also of the paragraph, section, and whole journal entry or
episode.
verse. If the search for causes converges on a common node, then
that node is chosen as a cause.
Another way to view this is that the learning of personal attitudes
and dispositions can be bootstrapped by commonsense attitudes
and dispositions.
2.3.5 The PERSONA Model Learner Architecture
The key to saliency-mediated associative learning is of course,
being able to judge the saliency or importance of an episode,
cause of episode, subevent, context, or exposure.
Salient subevents. Within the analysis of an episode, saliency of
subevents is determined by two components: Relative contribution of valence to the overall valence, and alignment with key
events of everyday story scripts. First, the main verbs and arguments are extracted from the sentences, constituting a candidate
list of subevents. The affective valence of each of these subevents
is compared against the overall valence of the episode, and those
that contribute most to, and align best with the overall valence are
given higher saliency. Second, using a small collection of pithy
everyday stories from the Open Mind Commonsense (OMCS)
story corpus (Singh et al., 2002), an alignment procedure tries to
map the current episode to the corpus of stories. If a match exists,
then the episode’s key events can be identified and their saliencies
boosted.
Salient contexts. In identifying salient within-episode contexts,
the semantic recognition of possible types of contexts such as
date, time, location, and social circles first takes place. Contexts
which occur with the greatest number of repetitions and anaphoric
references are judged salient.
Salient cause of episode. This is perhaps the most important step
in learning. Episodes can unfold in multiple steps but a person
will ultimately attribute an affective response to a single root
cause. There are three heuristic processes for attempting to identify the perceived cause of a salient episode. First, a heuristic information extractor tries to use regular expression patterns and
syntactic cues to identify the explicitly stated perceived cause of
the affective response in the episode. Second, if not found in the
text, the alignment procedure between stories in the OMCS story
corpus and the episode may also produce a cause because may be
a cause or moral explicitly associated with the story. Third, we
use OMCSNet (Liu and Singh, 2003), a semantic network representation of 80,000 nodes and 200,000 edges generated from
OMCS, to reason abductively about the cause. In this representation, nodes are first- and second-order concepts like people, places, things, events, activities and actions, while edges are labeled
with one of 25 relational predicates. Salient events of the episode
are mapped onto nodes in OMCS, then, edges with the causal
predicates, “effectOf”, “and consequenceOf” are followed in re-
Saliency of exposure. Exposure is judged as the membership of a
conceptual cue within an episode whose affect valence is strong.
Naturally, this is a weak association, and can be strengthened if
particular conceptual cues are given salient exposure. This includes frequency counting and anaphoric reference counting within the episode. If a conceptual cue occurs with high frequency or
reference, then it is likely a topic of the episode.
The architecture of the PERSONA Model Learner is given in Fig.
1. The text inputter scrapes a weblog URL or other personal text
corpus for date-annotated episodes. In the linguistic processing
suite, raw episodes are syntactically and semantically processed,
meetings the needs of the two associative learners. The affective
text analyzer combines a real-world knowledge-based analyzer
(Emotus Ponens (Liu et al., 2003)) with a back-off mood keyword
spotter. The sentences within each episode are annotated with a
PAD valence triplet.
Text Inputter
Weblog extraction
Episode builder
weblog URL
PERSONA Model Learner
2.3.4 Determining Saliency
Saliency of episode. Episodic saliency is judged by the frequency
with which salient subevents of that episode are recalled in later
episodes, and by the detection of affect valence of episodes whose
fulcrum is the episode in question.
raw episodes
raw episodes
Linguistic Processing Suite:
semantic type recognition
temporal phrase recognition
tagging and chunking
SVO identification
anaphora resolution
semantically
processed episodes
episode
affectively annotated
episode
Affective Text Analyzer
Mood keyword spotting
Real-world knowledge based sensing
Ekman to PAD model mapping
Emotus Ponens
annotated episodes
Affective Reflexive Memory
Learner
Conceptual cue gather
Salient exposure assessor
update
Affective
Reflexive Memory
Affective Long-term
Episodic Memory Learner
Salient subevent assessor
Salient contexts assessor
Salient cause assessor
Episode salience calculation
OMCSNet
OMCS Story
Corpus
update
Affective
Long-term
Episodic Memory
Fig 1. PERSONA’s Model Learner Architecture
In the affective reflexive memory learner, conceptual cues previously identified in the linguistic processing suite are judged for
salient exposure by assessing their topicalization in the episode.
Then the affective reflexive memory is updated.
In the affective long-term memory learner, salient subevents, contexts, and causes are gathered together into an EpisodeFrame.
The salience of the episode is assessed by analyzing all later episodes for reference. Note that the salience of an episode may be
updated in the memory, as new episodes may refer to past epi-
sodes, thereby increasing their saliency. In personal theory
change, episodes may be more than recalled, they may actually be
relived, with different affective assessments. In such a case, the
original experience should be forgotten. In our current model of
PERSONA, we do not account for this. The EpisodeFrame is
associated with an episode affect valence, and stored in affective
long-term episodic memory.
To summarize, in this section, we proposed the episode-reflex
computational model of human affective memory, one based
around episodes, and one based on reflex. We presented saliencymediated associative learning as a focused heuristical/statistical
learning mechanism to acquire the proposed model from personal
texts such as weblogs. In the next section, we focus on how these
models can be used for affective mindreading – predicting a person’s attitude toward a familiar or new subject by reasoning with
their affective memory model.
3. AFFECTIVE MINDREADING
In this section, we discuss how we have applied the PERSONA
affective memory model to predict the affective context of a person in reaction to a person, thing, topic, situation, or event. We
have dubbed this task affective mindreading because it is similar
to the harder parent problem of mindreading – trying to predict a
person’s actions given knowledge of their beliefs and desires.
Dan Dennett’s intentional stance describes one mindreading strategy commonly used by people to understand other people (1987).
Given knowledge of a person’s beliefs and desires, you can expect
that person to act in rationally in such a way as to further their
goals.
In our problem domain, the PERSONA mindreader is given
knowledge of a person’s affective memories and present attitudes
toward people, things, topics, situations, and events. The
PERSONA mindreader is given some new text embodying people,
things, topics, situations, and events. Rather than predicting the
person’s actions, in affective mindreading, the system’s task is to
predict the person’s affective response to this text. To do this, we
will leverage both the episodic and reflexive memories to perform
interpolative prediction and extrapolative prediction. In interpolative prediction, the current episode activates known elements of
the affective memory. In extrapolative prediction, the current
episode does not contain known elements of the affective
memory, but rather, to make a prediction, we must reason by
analogy to connect it to known elements. So in essence our mindreader’s strategy is to believe that a person’s affective response
to a new situation will be consistent with attitudes to past occurrences of that situation or an analogous situation. Later in this
section, we augment this strategy by predicting that a person’s
affective response to a new situation will also be influenced by
that person’s internal imprimers (Minsky, forthcoming).
The next two subsections describe how the episodic memory and
reflexive memory are applied in affective mindreading.
3.1 Exploiting Episodes in Mindreading
Recall that episodes are kept separate from reflexive memories
because they are affectively powerful one-time occurrences rather
than frequency conditioned. Because they are such powerful singular examples which must be consciously recalled, their triggering typically involves several activated conceptual and contextual
cues. Episodic recollection can be triggered by a subevent or root
cause, with contextual cues as supporting triggers.
When analyzing a new episode, we apply the same heuristic
mechanisms to jist salient subevents, contexts, and root cause. If
multiple elements of the subevents, contexts, and root cause align
with an episode in memory, then that episode is triggered, causing
the triggered episode’s affect to be projected onto the affect of the
current episode. The saliency of the episode is a multiplier coefficient to this affect valence. It is sometimes the case that the root
cause of the current episode aligns with the root cause of the triggered episode, even though there were no matching subevents.
This can be thought of as a case of analogy because two different
sequences of events thought the person the same lesson.
3.2 Reflex Memories in Mindreading
Unlike episodes, reflex memories do not require multiple conceptual cues to be triggered. Each conceptual cue or conceptual cue
Bayesian conditioned on a topic will be directly triggered by the
same conceptual cue found in text. In applying reflex memories,
we separate the cases of interpolation versus extrapolation. Interpolation occurs when a conceptual cue in the current episode is
found in reflexive memory. In this case, the affect valence of the
memory is projected onto the current episode. However, if the
conceptual cue is not found in memory, then we can try to extrapolatively predict its affect by trying to map it to known concepts
via conceptual analogy, and then projecting the affect valence of
the analogous concept in memory onto the current episode.
Conceptual analogy analyzes two concepts for structural similarities, and if they are similar enough, they are deemed analogous.
To perform conceptual analogy, we use OMCSNet and a structural mapping algorithm (Getner, 1983). For example, the following
analogous concept can be computed:
bicycle is like car (90.0%) because both:
==[isA]==> vehicle
==[isA]==> machine
==[isA3]==> means of transportation
==[isA3]==> faster than walking
==[isA3]==> used for transport
==[isA3]==> mode of transportation
==[hasLocation3]==> street
==[hasLocation3]==> garage
==[hasCollocate2]==> wheel
==[hasCollocate3]==> wheel
==[hasUse2]==> transportation
==[hasAbility2]==> travel on road
==[hasLocation5]==> garage
==[hasPart]==> wheel
The intuition behind extrapolation using conceptual analogy is
that affective attitudes often transfer over to analogous concepts.
In indicative trials however, we discovered that there were certain
classes of concepts in which this was not the case. For example,
“dog” and “cat” are, all things considered, fairly close analogs;
however, person who love dogs are not likely to like cats. In the
short term, we stop-listed conceptual analogies among certain
“hot topics” like pets. However, we thought that this was an interesting finding and explored it a bit further.
We suggest several explanations for this. As kids, we were often
asked for our favorite pet, and perhaps there is a common perception that not having a definitive preference is an indication of
weak personality. Such a self-reflective critic may reinforce an
XOR preference relationship among possible pets. Another explanation, explored further in the next subsection, is that liking a
dog has the implication that the person is in fact a dog-person. If
the person identifies with the group, dog-people, then he/she can
be said to inherit some of the common attitudes of that group.
One of these common attitudes may be a distaste for cats, thus,
blocking the affective transfer of dog over to cat. In the next subsection, we see that dog-person is a public imprimer (Minsky,
forthcoming), and imprimers can be viewed computationally as
the inheritance of attitudes of the imprimer.
3.3 Inherited Attitudes
from Internal Imprimers
So far the mindreading strategy we have discussed employs only a
person’s own memory-recorded attitudes in making predictions of
that person’s affective response to a new episode. Now, we
would like to augment this strategy and discuss how the
knowledge of a person’s imprimers can aid in prediction.
Marvin Minsky describes an imprimer as someone to which one
becomes attached. He introduces the concept in the context of
attachment-learning of goals, and suggests that imprimers help to
shape a child’s values. Imprimers can be a parent, mentor, cartoon character, a cult, or a person-type. The two most important
criteria for an imprimer are that 1) the imprimer embodies some
image, filled with goals, ideas, or intentions, and that 2) one feels
attachment to the imprimer. Minsky theorizes that the images of
imprimers can be internalized and their effects still realized.
We extend this idea in the affect realm and make the further claim
that internal imprimers can do more than to critique our goals; our
attachment to them leads us to the willful emulation of a portion
of their values and attitudes. Keeping a collection of these internal imprimers, they help to support our identity. From the supposition that we conform to many of the attitudes of our internal
imprimers, we hypothesize that affective memory models of these
imprimers, if known, can complement the person’s own affective
memory model in affective mindreading. (Of course, a person’s
personality will affect the degree to which their attitudes are influenced by others). This hypothesis is supported by much of the
work in psychoanalysis. Sigmund Freud (1991) wrote of a process he called introjection, in which children unconsciously emulate aspects of their parents, such as the assumption of their parent’s personalities and values. Other psychologists have referred
to introjection by terms like identification, internalization, and
incorporation.
We propose the following model of internal imprimers to support
the affective mindreading mechanism. First, it is necessary to
identify people, groups, and images that may possibly be a person’s imprimer. We can do so but analyzing the affective
memory. From a list of all conceptual cues from both the episodic
and reflexive memories, we use semantic recognizers to identify
all people, groups (e.g. “my company”) and images (e.g. “dog”=>
“dog-person”) that on average, elicit high Arousal and high Submissiveness, show high frequency of exposure in the reflexive
memory, and collocate in past episodes with self-conscious emotion keywords like “proud”, “embarrassed”, “ashamed”.
busi ness
per sona
d o me s t i c
per sona
OLDER
BROTHER
DOGPERSON
WARREN
BUFFETT
MOM
MARTHA
STEWART
soci al
per sona
SELF
Fig 2. Affective models of internal imprimers, organized into
personas, complements one’s own affective model
Once imprimers are identified, we also wish to identify the context under which an imprimer’s attitudes show influence. Shown
in Fig. 2, we propose organizing the internal imprimer space into
personas representing different contextual realms. There is good
reason to believe that humans organize imprimers by persona
because we are different people for different reasons. One might
like Warren Buffett’s ideas about business but probably not about
cooking. Personas can also prevent internal conflicts but allowing
a person to maintain separate systems of attitudes in different
contexts. To identify an imprimer’s context, we must first agree
on an ontology of personas, which can be person-general (as the
personas in Fig. 2 are) or person-specific. Given this ontology,
we use features of each context, such as keywords taken from the
GetContext() function of OMCSNet, to classify episodes.
Once imprimers are associated with personae, we gather as much
“personal” text from each imprimer as desired and acquire only
the reflexive memory model, thus relaxing the constraint that texts
have episodic organization. In the augmented mindreading strategy (depicted in Fig. 3), when conceptual cues are unfamiliar to the
self, we identify internal imprimers whose persona matches the
genre of the new episode, and give them an opportunity to react to
the cue. These affective reactions are multiplied by a coefficient
representing the ability of this self to be influenced, and the valence score is added on to the episode. Rather than maintaining
all attitudes in the self, internal imprimers enable judgments about
certain things to be mentally outsourced to the personaappropriate imprimers.
cues
unfamiliar
to self
are referred
to imprimers
Conceptual
Analogy
SELF
unfamiliar
Affective Reflexive
Memory
conceptual
cue
Affective LTEM
episode
frame
New Episode
root
cause
Fig 3. The imprimer-augmented affective mindreading strategy.
Edges represent memory triggers.
4. EVALUATION
PERSONA was evaluated in affective mindreading experiments
performed with four subjects. Subjects were between the ages of
18 and 28, and have kept diary-style weblogs for at least 2 years,
with an average entry interval of three-to-four days. Subjects
submitted their weblog urls, for the generation of affective
memory models. An imprimer identification routine was run, and
the examiner hand-picked the top one imprimer for each of the
three personas implemented. A personal text corpus was built,
and imprimer reflexive memory models were generated. The
subjects were asked to engage in an interview-style experiment
with the examiner.
In the interview, subject and their corresponding PERSONA
models were asked to evaluate 12 short paragraph texts representative of three genres: social, business, and domestic (corresponding to the ontology of personas in the tested implementation). The same set of texts was presented to each participant and
the examiner chose texts that were generally evocative. They
were asked to summarize their reaction by rating three factors on
Likert-5 scales.

Feel negative about it (1)…. Feel positive about it (5)

Feel indifferent about it (1) … Feel intensely about it (5)

Don’t feel control over it (1)… Feel control over it (5)
These factors are mapped onto the PAD valence format, assuming
the following correspondence: 1-1.0, 2 -0.5, 30.0, 4
+0.5, and 5 +1.0. Subjects’ responses were not normalized. To
assess the performance of PERSONA, we record the spread between the human assessed valence and the computer assessed
valence,
Vspread  Vhuman  VPERSONA
(2)
We computed the mean spread and standard deviation across all
episodes along each PAD dimension. On the –1.0 to +1.0 valence
scale, the maximum spread is 2.0. Table 1 summarizes the results.
Table 1. Performance of PERSONA affective mindreader, measured as the spread between human and computer judged values.
Pleasure
Arousal
Dominance
mean
spread
std.
dev.
mean
spread
std.
dev.
mean
spread
std.
dev.
SUBJECT 1
0.39
0.38
0.27
0.24
0.44
0.35
SUBJECT 2
0.42
0.47
0.21
0.23
0.48
0.31
SUBJECT 3
0.22
0.21
0.16
0.14
0.38
0.38
SUBJECT 4
0.38
0.33
0.22
0.20
0.41
0.32
BASELINE1
0.50
BASELINE2
0.67
Assuming that human reactions obeyed a uniform distribution
over the Likert-5 scale, we give two baselines, which were simulated over 100,000 trials. In BASELINE 1, VPERSONA is fixed at
0.0. In BASELINE 2, VPERSONA is given a random value over
the interval [-1.0,1.0] with a uniform distribution. It should be
pointed out however, that in the context of an interactive sociable
computer, BASELINE 1 is not a fair comparison, because it
would never produce any behavior.
On average, PERSONA performed noticeably better than both
baselines, excelling particularly in predicting arousal, and having
the most difficulty predicting dominance. The standard deviations
were very high, reflecting the observation that PERSONA’s predictions were often either very close to the actual valence, or very
far. This can be attributed to one of several causes. First, multiple
episodes described in the same journal entries may have led to
improper associative learning. Second, the reflexive memory
model does not account for conflicting word senses. Third, personal texts inputted for the imprimers often generated models
skewed to positive or negative because text did not always have
an episodic organization. While results along the pleasure and
dominance dimensions are weaker, the arousal dimension recorded a mean spread of 0.22, suggesting the possibility that it alone
may have immediate applicability.
In the experiment, we also analyzed how often the episodic
memory, reflexive memory, and imprimers were triggered. Episodes were on average, 4 sentences long. For each episode, reflexive memory was triggered an average of 21.5 times, episodic
memory 0.8 times, and imprimer reflexive memory 4.2 times. To
measure the effect of imprimers and episodic memories, we re-ran
the experiment turning off imprimers only, episodic memory only,
and both. Table 2 summarizes the results.
Table 2. Performance of PERSONA that can be attributed to imprimers and episodic memory
Imp ON, Epi ON
Pleasure
Arousal
mean spread
mean spread
Dominance
mean spread
0.35
0.22
0.43
Imp ON, Epi OFF
0.34
0.21
0.43
Imp OFF, Epi ON
0.40
0.28
0.44
Imp OFF, Epi OFF
0.41
0.29
0.45
(table 1 results sum’ed)
These results suggest that the positive effect of episodic memory
was negligible on the results. This certainly has to do with its low
rate of triggering, and the fact that episodic memories were
weighted only slightly more than reflexive memories. The low
trigger rate of episodic memory can also be attributed to the strict
criteria that three conceptual cues in an episode frame must trigger
in order for the whole episode to trigger. These results also suggest that imprimers played a measurable role in improving performance, which is a very promising result.
[2] Call, J. & Tomasello, M. (1996). "The effect of humans on
Overall, the evaluation demonstrates that the proposed approach
is promising, but needs further refinement. The randomized
BASELINE 2 is a good comparison when considering possible
entertainment applications, whose interaction is more fail-soft.
PERSONA does quite well against the active BASELINE 2, and
is within the performance range of these applications. However,
the results also suggest that PERSONA may not be ready for deployment to a sociable computer just yet, because fallout (bad
predictions) can be very costly in the realm of affective communication (Nass et al., 1994). Affective communication obeys certain
social contracts, making them fail-hard applications.
[5] Freud, S. (1991). The essentials of psycho-analysis: the de-
5. CONCLUSION
Mindreading is a problem of interest to the cognitive science
community, as well as the artificial intelligence community.
While cognitive science is generally interested in attaining a
deeper understanding of the problem, we argue that their primarily
behavioral and neurological means of study will be insufficient to
uncover the larger mechanistic picture of theory of mind and mindreading in humans. On the other hand, the artificial intelligence
community is interested in applying mental modeling and mindreading to build sociable computers. However, their models of
profiling and collaborative filtering are too weak. In this work,
we have tried to address the need of both communities and walk a
middle ground. In our study of affective mindreading, we proposed and built a system that attempts to model a person much
deeper than present approaches in AI.
The proposed episode-reflex model of human affective memory
has interesting implications for psychology and cognitive sciences, providing a real way to be able to test classical theories such as
associative learning, memory organizations, and formation of
identity of the self, among others. The fact that attitudes can be
considered independently of beliefs and goals suggests that in
computationalizing mindreading, it may be possible to decompose
mindreading into several processes, of which affective mindreading is one. And indeed, it is possible that humans may have a
toolkit of processes which combine to be called mindreading. It is
not out of the realm of possibilities that a special process exists
for modeling and assessing just the attitudes of another person.
Such a process may serve the role, we speculate, of contextually
disambiguating processes for belief determination.
6. ACKNOWLEDGMENTS
I would like to thank Barbara Barry, Push Singh, and Andrea
Lockerd for their ideas, suggestions, and inspiration in the course
of this work.
7. REFERENCES
the cognitive development of apes". In Reaching into
Thought (eds. A.E. Russou, K. A. Bard and S. T. Parker).
Cambridge University Press, pp 371--403.
[3] Dennett, D. (1987). The Intentional Stance. Cambridge, MA:
Bradford Books/MIT Press.
[4] Ekman, P. Facial expression of emotion. American Psychologist, 48, 384-392. 1993.
finitive collection of Sigmund Freud's writing selected, with
an introduction and commentaries, by Anna Freud. London:
Penguin.
[6] Gallese, V. et al. (1996) Premotor cortex and the recognition
of motor actions Cognit. Brain Res. 3, 131-141.
[7] Gallese, V. and A. Goldman. (1998). Mirror neurons and the
simulation theory of mind-reading. Trends in Cognitive Sciences, 2(12).
[8] Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7, pp 155-170.
[9] Liu, H., Lieberman, H., and Selker, T. (2003) A model of
textual affect sensing using real-world knowledge. In Proceedings of the Seventh International Conference on Intelligent User Interfaces, pages 125--132.
[10] Liu, H. and Singh, P. (2003). OMCSNet: A commonsense
inference toolkit. In Submission.
Available
http://web.media.mit.edu/~hugo/publications
at:
[11] Locke, J. (1689). Essay Concerning Human Understanding
Hypertext by ITL at Columbia University, 1995. Print version ed. P.H. Nidditch. Oxford, 1975.
[12] Mehrabian, A. (1995). for a comprehensive system of
measures of emotional states: The PAD Model. (Available
from Albert Mehrabian, 1130 Alta Mesa Road, Monterey,
CA, USA 93940).
[13] Meltzoff, A. and Gopnik, A. (1993) "The role of imitation in
understanding persons and developing a theory of mind". In
Understanding other minds: perspectives from autism (eds.
S. Baron-cohen, H. Tager-Flusberg, D. Cohen) Oxford University Press, Chapter 16 pp 335--366.
[14] Minsky, M., (forthcoming). The Emotion Machine, Pantheon, New York. Several chapters
http://web.media.mit.edu/~minsky
are
available
at:
[15] Nass, C.I., Stener, J.S., and Tanber, E. (1994) Computers
are social actors. In Proceedings of CHI ’94, (Boston, MA),
pp. 72-78, April 1994.
[16] Povinelli, D.J. and Preuss, T.M. (1995) Theory of mind:
evolutionary history of a cognitive specialization. Trends in
Cognitive Neurosciences, 18:418-424.
[17] Resnick, P. Varian, H. R. (1997). “Recommender Systems”,
guest editors, “special section: recommendation systems” in
CACM Vol. 40, No. 3, pp 56-58.
[1] Bloom, P., (2002). Mindreading, communication, and the
[18] Singh, P. et al. (2002). Open Mind Common Sense:
learning of the names for things. Mind and Language, 17,
37-54.
Knowledge acquisition from the general public. In Proceedings of the First International Conference on Ontologies, Da-
tabases, and Applications of Semantics for Large Scale Information Systems. Lecture Notes in Computer Science.
Heidelberg: Springer-Verlag.
[19] Sutton, R.S. 1984. Temporal credit assignment in reinforcement learning. University of Massachusetts. Departement of
Computer and Information Science. Technical Report 84-2.
Amherst, MA.
[20] Tulving, E 1983 Elements of episodic memory. Oxford: New
York.
Download