A Computational Model of Human Affective Memory and Its Application to Mindreading Hugo Liu MIT Media Laboratory 20 Ames Street #320D Cambridge, MA 02139, USA +1 (617) 253-5334 hugo@media.mit.edu ABSTRACT The cognitive science and artificial intelligence communities are both interested in the problem of how humans infer the mental states of others, known as mindreading. Whereas cognitive science is interested in a deeper understanding of how humans mindread, artificial intelligence is interested in imparting mindreading capabilities to social computers. Current AI approaches to mindreading are weak, however. Techniques such as user profiling and collaborative filtering try to predict user preferences and actions, but do so very weakly. In this paper, we propose a deeper model of a person in terms of their system of attitudes, and implement the system called PERSONA. Grounded in the episodic and reflexive memories of a person, PERSONA uses saliencymediated associative learning to automatically acquire a human affective memory model from a corpus of personal text, such as a weblog. Applying this model, PERSONA performs affective mindreading to predict a person’s likely affective response given a new situation or event. In addition to memory-based prediction alone, the system also analyzes the attitudes of a person’s Minskian imprimers and performs conceptual analogy to make predictions more robust. An evaluation of PERSONA indicates that it is a promising approach, comfortably outperforming baselines; however, because affective communication is fairly fail-hard, more refinement would be needed before this system can be applied to a socialize a computer. 1. WHAT IS MINDREADING INTERESTING? Recently there has been much ado in the cognitive science community about the human faculty for Theory of Mind (ToM), otherwise known as mindreading. And no, it does not refer to psychic powers as one might guess. ToM and mindreading refer to an animal’s capability for reflecting on its own mental states – attitudes, beliefs, and desires – and modeling the mental states of others. It is believed that humans evolved specialized mindreading abilities absent in other primates (Povinelli and Preuss, 1995), and that the human mindreading faculty makes human social learning uniquely powerful – inter alia, the rapid learning of words (Bloom, 2002), and the learning of goals and values (Minsky, forthcoming). Cognitive scientists have gone about the study of mindreading in many ways, including: by evolutionary comparison, e.g. (Call and Tomasello, 1996); by examining linked phenomena like imitation (Meltzoff and Gopnik, 1993); by studying deficits of ToM in autistic children; by speculating on potential neural substrates for ToM such as mirror neurons (Gallese and Goldman, 1998); and by debating how it works, i.e. Simulation Theory of ToM versus Theory Theory of ToM. Across the divide, artificial intelligence researchers are also thinking about mindreading. However, being on the whole more pragmatic, this community is more interested in imparting mindreading capabilities to computers and robots to create more sociable human-computer interaction (Nass et al., 1994). While some results from the cognitive science literature is interesting for to the AI community, such as the recent find of special action recognition neurons called mirror neurons in macaque monkeys (Gallese et al., 1996), we think it is fair to say that behavioral and bottom-up approaches to mindreading is still far away from producing a compelling and predictive cognitive model that could empower social computers. Despite lacking a complete cognitive model of mindreading, the AI community has been working on weaker forms of mindreading for many years. User modeling, for example, attempts to model a human user’s preferences and mental context in hopes of creating more natural and personal interactions between human and computer. One common approach in user modeling is user profiling, whereby users are modeled by their demographic information, usually obtained via explicit questionnaires. Applying a small set of rules, these user demographics can be mapping into predicted user preferences. Another common approach in user model is collaborative filtering, in which patterns of user actions are modeled against those of a whole user community. While these forms of user modeling have enjoyed some success, particularly in product recommendation (Resnick and Varian, 1997), these approaches are too weak to be useful for socializing computers. User profiling oversimplifies people as obeying demographic lines, while collaborative filtering is a purely statistical approach offering little insight into a user’s beliefs or preferences; thus, user profiling and collaborative filtering are weak mindreaders. In this work, we explore how mindreading can be deepened in a novel way: by considering knowledge of person’s life experiences over a long period of time, and applying this knowledge to predict how a person might respond in new situations. In order to create a more complete and more intimate model of a person, we would necessarily need a corpus of knowledge about that person’s beliefs, desires, goals, and experiences. While it may be possible to acquire this directly through interactions with the user, building a sufficiently rich model of a person might require cumbersome interactions; thus the approach taken by this work is to try to infer such a model automatically from personal texts such as a journal, or a transcript of a person’s beliefs and ideas, as might be manifested in an interview. Through our initial experience, we realized that specific beliefs and goals would be too hard to accurately infer from unconstrained natural language text, but we did not want to sacrifice breadth of knowledge for specificity, so instead, we decided to try to infer just the emotions, attitudes and dispositions associated with these beliefs and goals. Using this body of knowledge, we construct a mechanism to predict the affective context of a person in reaction to a topic, situation or event. We dub this task affective mindreading, with affect referring to emotions, dispositions, and attitudes. If successful, we believe that this type of mechanism can have great implications for sociable computers. Our approach can be summarized as follows. From text, we wish to infer a person’s emotions, attitudes, and dispositions toward particular people, topics, events, and situations, at different times in their lives, and to record these into a model of a person’s affective memory. By interpolating and extrapolating from this affective memory, a computer can perform affective mindreading – that is to say, given a new topic, event, or situation, the system will try to predict a person’s affective response. To implement this approach, we built PERSONA, a system that creates a model of a person’s affective memory from personal texts, and exploits this model for affective mindreading. The rest of this paper is organized as follows. First, we present a computational model of human affective memory, a model of saliency-mediated associative learning from personal texts, and discuss the implementation of the PERSONA model learner. Second, we explore how the affective memory model is used in conjunction with conceptual analogy to perform affective mindreading. Third, we describe an experiment to evaluate PERSONA in an affective mindreading task. Fourth, we reconnect with the literature and address how affective mindreading aids social learning tasks in humans and computers. 2. A MODEL OF AFFECTIVE MEMORY In the previous section, we motivated the development of a computational model of human affective memory by suggesting that such a model would allow for more advanced mindreading by computers than can be achieved through typical knowledgeimpoverished user modeling techniques such as profiling. We begin this section with the caveat that the computational model described here is not claimed or intended to be cognitively motivated. We attempt to model human affective memory only insofar as it is feasible to infer from personal texts, and only insofar as it is useful to the task of affective mindreading – predicting a person’s attitudes and dispositions toward a particular subject. In this section, we first propose the two-part episode-reflex model of human affective memory and connect it to the literature. Second, we introduce saliency-mediated associative learning as a strategy for automatic model acquisition from personal texts. Third, we discuss how such a model has been implemented in PERSONA. 2.1 The Episode-Reflex Model Of Human Affective Memory Of the different types of human memory that have been studied, two are of great interest to us as tools for modeling affective memory: long-term episodic memory, and reflexive memory. In PERSONA, we combine the strengths of two memories to form the episode-reflex model. 2.1.1 Affective long-term episodic memory Long-term episodic memory (LTEM) is a relatively stable memory based on experiences and events in context. An episode can be thought of as a coherent packet of events with a timesequence. Episodes are generally content-addressable, meaning that they can be retrieved through a variety of cues based on the sensory, affective, or semantic content of the episode, such as a sight, sound, emotion, or location. LTEM can be very powerful because even events which happen only once can become salient memories and serve to recurrently influence a person’s future thinking. If we hope to accurately predict a person’s affective response to a future situation, we must account for the influence of these one-time salient episodes. Even though our aim is to model only the affective aspect of human memory, we cannot, in the case of LTEM, completely disregard the non-affective aspects because they may serve as cues for retrieval. Consequently, our affective LTEM model represents episodes with some semantic structure and several types of context. In PERSONA, an affective LTEM episode has the following components: A collection of the subevents of an episode that are salient to the evocation of the overall affect of the episode, sequentially ordered. If possible, the perceived root cause of the affective response in that episode are extracted Possibly salient contexts: the date, the location, the topic An affect valence score associated with the episode Salience score of episode, measuring the perceived importance of the memory The motivation of extracting only salient subevents and extracting the perceived root cause of the episode make learning more precise, and will be discussed further in the next subsection. In addition to describe the thematic structure of the episode, we also encode several other types of contextual cues with the episode. As suggested by Tulving’s encoding specificity hypothesis (1983), retrieval of an episode is more likely when current conditions match the encoding conditions, thus it is important to remember the salient contexts surrounding an episode as completely as possible. Finally, because our main focus is on being able to recall the attitudes experienced during an episode, we associate an affect valence score, to be described in a later subsection. 2.1.2 Affective reflexive memory While long-term episodic memory deals in salient, one-time events and must generally be consciously recalled, reflexive memory is full of automatic, instant, almost instinctive associations. Whereas LTEM is content-addressable and requires pattern-matching the current situation with that of the episode, reflexive memory is like a simple hash-table that directly associates a cue with a reaction, thereby abstracting away the content. Tulving equates LTEM with “remembering” and reflexive memory with “knowing” (Tulving, 1983). In humans, reflexive memories are generally formed through repeated exposures rather than one-time events, though subsequent exposures may simply be recalls of a particularly strong primary exposure (Locke, 1689). In addition to frequency of exposures, the strength of an experience is also considered. Complementing the event-specific Affective LTEM with an event-independent affective reflexive memory makes sense because there may not always be an appropriate distinct episode which shapes our appraisal of a situation; often, we react reflexively – our present attitudes deriving from such amalgamation of our past experiences now collapsed into something instinctive. Because humans undergo forgetting, belief revision, and theory change, update policies for human reflexive memory may actually be quite complex. In PERSONA, we adopt a more simplistic representation and update policy that is not cognitively motivated, but instead, exploits the ability of a computer system to compute an affect valence at runtime. An entry in the memory is as follows: The key to the entry is one of two types: o 1) A simple conceptual cue whose semantic type belongs to the following ontology: a person, an action, an object, an activity, or a named event; or o 2) A simple conceptual cue Bayesian conditioned on the presence of a discourse topic. The value of the entry is a list of exposures. An exposure X is the following triple: o date of exposure; o affect valence score of exposure, V; o saliency of exposure S To read off the current valence associated with a conceptual cue, the formula given in Eq. (1) is applied. enddate 1 log b max( n, b) X ( Et )V ( Et ) n t startdate 2.2 Saliency-Mediated Associative Learning With origins in Aristotle, classical associative learning was popularized as an explanation of many brain processes beginning in the 17th century by several British philosophers, including John Locke and James Mill. However, after the rise and fall of popularity of Pavlovian classical conditioning, many in the cognitive science community now dismiss associative learning as inadequate. In the study of word learning in children, Paul Bloom reported that contrary to Locke’s assertion that repetition is necessary to associate words with sights and sounds, children actually learn word meanings error-free, and without repetition, in a process dubbed fastmapping (Bloom, 2002). While it may seem that associative learning is being debunked as a plausible theory of cognitive learning, we suggest that associative learning can in many cases, be salvaged, given that it is appropriately structured. In Bloom’s research on word learning in children for example, error-free fast mapping is possible because the child uses the teacher’s mental and intentional context to disambiguate reference, and once disambiguated with sufficient confidence, the association between word and meaning can then be made with greater confidence. With a similar sentiment against weak associationism, Marvin Minsky argues that simply remembering everything is not equivalent to learning. The defining criteria for learning is knowing precisely what is learned. (Minsky, forthcoming) Or, formulated another way, learning involves credit assignment (Sutton, 1984). The lesson to be learned from this (pun intended) is that associative learning is not useful unless it is precise. In other words, our mechanism of learning should not involve solely semantically weak statistical methods, but instead, perhaps incorporating some external knowledge and heuristics to gain additional precision. In particular, we see the identification of saliency and salient events as a mechanism to focus associations. We dub this, saliencymediated associative learning (SMAL). SMAL is similar to credit assignment, except that salience is a heuristically generated score rather than an assertion, thus making it amenable to statistical learning methods. The learning mechanism of each of the two parts of the proposed affective memory model incorporates saliency to focus learning. (1) where n = the number of exposures of the concept This formula returns the valence of a conceptual cue averaged over a particular time period. The term, log b max( n, b) , rewards frequency of exposures, while the term, X ( E t ) , rewards the saliency of an exposure. In this simple model of an affective reflexive memory, we do not consider phenomena such as belief revision, reflexes conditioned over contexts, or forgetting. In summary, we have motivated and characterized two components to our computational model of human affective memory: an episodic component emphasizing the affect of one-time salient memories, and a reflexive component, emphasizing instinctive reactions to conceptual cues that are conditioned over time. In the following subsection, we propose how this two-part model of human affective memory can be acquired from personal texts via saliency-mediated associative learning. In the affective long-term episodic memory model, affect is associated with particularly salient subevents rather than the whole of the episode. In addition, the perceived root cause of the affective response in the episode are extracted or inferred when possible. Finally, a saliency score is given to the whole of the episode, to rate its importance and impact to the person being modeled. These three features together focus the associative learning mechanism, and help to answer the question, “what should be learned.” Of course, identifying saliency, being a flavor of the credit assignment problem, is not an easy task, especially over domain unconstrained texts. In the next subsection, we explain the role that a large common sense knowledge base plays in this important subtask. In the affective reflexive memory model, associations are not made at the word-level, which would tend to conflate the affect of too many different senses of a word into the same entry, but rather, conceptual cues are those first-order or second-order phrases which follow the ontology: a person, an action, an object, an activity, or a named event. The choice of ontology reflects the types of salient concepts that we believe people typically form stable attitudes about. In addition, to embrace the possibility that con- cepts may have different affect valences under different contexts, an entry in the affective reflexive model may be keyed on a concept Bayesian conditioned over a particular discourse contexts. The difficulty in identifying the contexts which dictate a conceptual cue’s interpretation is discussed further in the evaluation of PERSONA. Finally, each exposure is associated with a saliency score, and conceptual cues with more entries are assignment more salient valence scores. By putting constraints on the types of concepts that can learn affective associations, by considering contexts is learning affect associations, and by valuating the saliency of the strength and frequency of exposures, the reflexive memory model seeks to incorporate as much precision as possible in its associative learning. Having proposed the episode-reflex model of human affective memory and saliency-mediated associative learning as a mechanism for model acquisition, the next section discusses how such the model and learning mechanisms were implemented in PERSONA. 2.3 Model Implementation in PERSONA In proposing the model and learning mechanism, several subtasks where implied but not addressed explicitly, such as 1) having a source of personal texts meeting certain suitability criteria, 2) a model for measuring affect valence, 3) a mechanism for judging the affect of episodes and text in general, and 4) methods for determining saliency. These implementation issues are discussed in the ensuing subsections, following by a start-to-finish architectural walkthrough of PERSONA’s model learner. 2.3.1 Suitability Criteria for Personal Texts The suitability of texts for model generation is subject to the following criteria. First, texts should be first-person, subjective, autobiographical narratives. Personal emotions and attitudes are either not easily accessible, or not sufficiently detailed in thirdperson texts or objective writing. Second, texts should explore a breadth of topics because an insufficiently broad model gives a poor and disproportional sampling of a person and would be difficult to use to perform affective mindreading. Third, texts should cover everyday events, situations, and topics, because that is the optimal discourse domain of recognition of the mechanism with which we will judge the affect of text. Fourth, texts should be organized into episodes, occurring over a substantial period of time relative to the length of a person’s life. With these criteria in mind, an ideal source of personal text is a personal journal. While private journals would be preferred for their candor, publicly viewable journals in the form of recently popular weblogs are still a good source of personal texts, though the generated model may exhibit a bias toward the “public” personality of the person being modeled. A less good but workable source of personal texts are interview transcripts. Interviews satisfy most of the criteria for personal text selection with the exception that interviews are not reliably organized around episodes, and may represent a disproportionally narrow set of topics. Still, they are suitable substrates for model generation, provided that the limitations of the resulting model are realized. 2.3.2 Representing Affect using the PAD Model Affect valence pervading the proposed models can take one of two potential representations. They can take the form of an ontology of basic canonical emotions, represented prominently by Paul Ekman’s six basic emotions (happy, sad, angry, scared, dis- gusted, and surprised) (1993). Or, they can take the form of a dimensional model, represented prominently by Albert Mehrabian’s Pleasure-Arousal-Dominance (PAD) model (1995). In this model, the three nearly independent dimensions are PleasureDispleasure, Arousal-Nonarousal, and DominanceSubmissiveness. Each dimension can assume values from –100% to +100%, and a PAD valence score is a 3-tuple of these values (e.g. [-.51, .59, .25] might represent anger). We chose the dimensional PAD model over the discrete canonical emotion model because PAD represents a sub-symbolic, continuous account of affect, where different symbolic affects can be unified along one of the three dimensions. This model has robustness implications for the affective classification of text. For example, in the affective reflexive memory, a conceptual cue may be variously associated with anger, fear, and surprise, which can be unified along the Arousal dimension of the PAD model, thus enabling the affect association to be coherent and focused. 2.3.3 Affective Appraisal of Personal Text Judging the affect of a personal text has three chief considerations. First, the mechanism for judging the affect should be robust and comprehensive enough to correctly appraise the affect of a breadth of concepts. Second, to aid in the determination of saliency, the mechanism must be able to appraise the affect of very little text, such as on the sentence-level. Third, the mechanism should recognize specific emotions rather than convolving affect onto any single dimension. Several common approaches fail to meet the criteria. The naïve keyword spotting approach looks for surface language features like keywords. However, this approach is not acceptably robust on its own because affect is often conveyed without mood keywords. Statistical affect classification using statistical learning models such as latent semantic analysis (LSA) generally require large inputs for acceptable accuracy because it is a semantically weak method. Hand-crafted models and rules are not broad enough to analyze the desired breadth of phenomena. To analyze personal text with the desired robustness, granularity, and specificity, we employ a model of textual affect sensing using real-world knowledge, proposed by Hugo Liu et al. (2003). In this model, defeasible knowledge of everyday people, things, places, events, and situations is leveraged to sense the affect of a text by evaluating the affective implications of each event or situation. For example, to evaluate the affect of “I got fired today,” this model evaluates the consequences of this situation and characterizes it using negative emotions such as fear, sadness, and anger. This model, coupled with a naïve keyword spotting approach, provides rather comprehensive and robust affective classification. Since the model uses knowledge rather than word statistics, it is semantically strong enough to evaluate text on the sentence level, classifying each sentence into a six-tuple of valences (ranging from a value of 0.0 to 1.0) for each of the six basic Ekman emotions. These emotions are then mapped to the PAD model. One point of potential paradox should be addressed. The realworld knowledge-based model of affect sensing is based on defeasible commonsense knowledge from the Open Mind Commonsense corpus (Singh et al., 2002), which is in turn, gathered from a web community of some 11,000 teachers. Therefore, the affective assessment of text made by such a model represents the judgment of a typical person. However, sometimes a personal judgment of affect is contradicted by the typical judgment. Thus, it would seem paradoxical to attempt to learn that a situation has a personally negative affect when the typical person judges the situation as positive. To overcome this difficulty, we implement, in parallel, a mood keyword-spotting affect sensing mechanism to confirm and contradict the assessment of the primary model. In addition, we make the assumption that although a personal affect judgment may sometimes deviate from that of a typical person, it will deviate most of the time. The implication of this is that on a slightly larger granularity than a sentence, the affective appraisal is likely to be accurate. To assess the affect of a sentence, we factor in the affective assessment of not only the sentence itself, but also of the paragraph, section, and whole journal entry or episode. verse. If the search for causes converges on a common node, then that node is chosen as a cause. Another way to view this is that the learning of personal attitudes and dispositions can be bootstrapped by commonsense attitudes and dispositions. 2.3.5 The PERSONA Model Learner Architecture The key to saliency-mediated associative learning is of course, being able to judge the saliency or importance of an episode, cause of episode, subevent, context, or exposure. Salient subevents. Within the analysis of an episode, saliency of subevents is determined by two components: Relative contribution of valence to the overall valence, and alignment with key events of everyday story scripts. First, the main verbs and arguments are extracted from the sentences, constituting a candidate list of subevents. The affective valence of each of these subevents is compared against the overall valence of the episode, and those that contribute most to, and align best with the overall valence are given higher saliency. Second, using a small collection of pithy everyday stories from the Open Mind Commonsense (OMCS) story corpus (Singh et al., 2002), an alignment procedure tries to map the current episode to the corpus of stories. If a match exists, then the episode’s key events can be identified and their saliencies boosted. Salient contexts. In identifying salient within-episode contexts, the semantic recognition of possible types of contexts such as date, time, location, and social circles first takes place. Contexts which occur with the greatest number of repetitions and anaphoric references are judged salient. Salient cause of episode. This is perhaps the most important step in learning. Episodes can unfold in multiple steps but a person will ultimately attribute an affective response to a single root cause. There are three heuristic processes for attempting to identify the perceived cause of a salient episode. First, a heuristic information extractor tries to use regular expression patterns and syntactic cues to identify the explicitly stated perceived cause of the affective response in the episode. Second, if not found in the text, the alignment procedure between stories in the OMCS story corpus and the episode may also produce a cause because may be a cause or moral explicitly associated with the story. Third, we use OMCSNet (Liu and Singh, 2003), a semantic network representation of 80,000 nodes and 200,000 edges generated from OMCS, to reason abductively about the cause. In this representation, nodes are first- and second-order concepts like people, places, things, events, activities and actions, while edges are labeled with one of 25 relational predicates. Salient events of the episode are mapped onto nodes in OMCS, then, edges with the causal predicates, “effectOf”, “and consequenceOf” are followed in re- Saliency of exposure. Exposure is judged as the membership of a conceptual cue within an episode whose affect valence is strong. Naturally, this is a weak association, and can be strengthened if particular conceptual cues are given salient exposure. This includes frequency counting and anaphoric reference counting within the episode. If a conceptual cue occurs with high frequency or reference, then it is likely a topic of the episode. The architecture of the PERSONA Model Learner is given in Fig. 1. The text inputter scrapes a weblog URL or other personal text corpus for date-annotated episodes. In the linguistic processing suite, raw episodes are syntactically and semantically processed, meetings the needs of the two associative learners. The affective text analyzer combines a real-world knowledge-based analyzer (Emotus Ponens (Liu et al., 2003)) with a back-off mood keyword spotter. The sentences within each episode are annotated with a PAD valence triplet. Text Inputter Weblog extraction Episode builder weblog URL PERSONA Model Learner 2.3.4 Determining Saliency Saliency of episode. Episodic saliency is judged by the frequency with which salient subevents of that episode are recalled in later episodes, and by the detection of affect valence of episodes whose fulcrum is the episode in question. raw episodes raw episodes Linguistic Processing Suite: semantic type recognition temporal phrase recognition tagging and chunking SVO identification anaphora resolution semantically processed episodes episode affectively annotated episode Affective Text Analyzer Mood keyword spotting Real-world knowledge based sensing Ekman to PAD model mapping Emotus Ponens annotated episodes Affective Reflexive Memory Learner Conceptual cue gather Salient exposure assessor update Affective Reflexive Memory Affective Long-term Episodic Memory Learner Salient subevent assessor Salient contexts assessor Salient cause assessor Episode salience calculation OMCSNet OMCS Story Corpus update Affective Long-term Episodic Memory Fig 1. PERSONA’s Model Learner Architecture In the affective reflexive memory learner, conceptual cues previously identified in the linguistic processing suite are judged for salient exposure by assessing their topicalization in the episode. Then the affective reflexive memory is updated. In the affective long-term memory learner, salient subevents, contexts, and causes are gathered together into an EpisodeFrame. The salience of the episode is assessed by analyzing all later episodes for reference. Note that the salience of an episode may be updated in the memory, as new episodes may refer to past epi- sodes, thereby increasing their saliency. In personal theory change, episodes may be more than recalled, they may actually be relived, with different affective assessments. In such a case, the original experience should be forgotten. In our current model of PERSONA, we do not account for this. The EpisodeFrame is associated with an episode affect valence, and stored in affective long-term episodic memory. To summarize, in this section, we proposed the episode-reflex computational model of human affective memory, one based around episodes, and one based on reflex. We presented saliencymediated associative learning as a focused heuristical/statistical learning mechanism to acquire the proposed model from personal texts such as weblogs. In the next section, we focus on how these models can be used for affective mindreading – predicting a person’s attitude toward a familiar or new subject by reasoning with their affective memory model. 3. AFFECTIVE MINDREADING In this section, we discuss how we have applied the PERSONA affective memory model to predict the affective context of a person in reaction to a person, thing, topic, situation, or event. We have dubbed this task affective mindreading because it is similar to the harder parent problem of mindreading – trying to predict a person’s actions given knowledge of their beliefs and desires. Dan Dennett’s intentional stance describes one mindreading strategy commonly used by people to understand other people (1987). Given knowledge of a person’s beliefs and desires, you can expect that person to act in rationally in such a way as to further their goals. In our problem domain, the PERSONA mindreader is given knowledge of a person’s affective memories and present attitudes toward people, things, topics, situations, and events. The PERSONA mindreader is given some new text embodying people, things, topics, situations, and events. Rather than predicting the person’s actions, in affective mindreading, the system’s task is to predict the person’s affective response to this text. To do this, we will leverage both the episodic and reflexive memories to perform interpolative prediction and extrapolative prediction. In interpolative prediction, the current episode activates known elements of the affective memory. In extrapolative prediction, the current episode does not contain known elements of the affective memory, but rather, to make a prediction, we must reason by analogy to connect it to known elements. So in essence our mindreader’s strategy is to believe that a person’s affective response to a new situation will be consistent with attitudes to past occurrences of that situation or an analogous situation. Later in this section, we augment this strategy by predicting that a person’s affective response to a new situation will also be influenced by that person’s internal imprimers (Minsky, forthcoming). The next two subsections describe how the episodic memory and reflexive memory are applied in affective mindreading. 3.1 Exploiting Episodes in Mindreading Recall that episodes are kept separate from reflexive memories because they are affectively powerful one-time occurrences rather than frequency conditioned. Because they are such powerful singular examples which must be consciously recalled, their triggering typically involves several activated conceptual and contextual cues. Episodic recollection can be triggered by a subevent or root cause, with contextual cues as supporting triggers. When analyzing a new episode, we apply the same heuristic mechanisms to jist salient subevents, contexts, and root cause. If multiple elements of the subevents, contexts, and root cause align with an episode in memory, then that episode is triggered, causing the triggered episode’s affect to be projected onto the affect of the current episode. The saliency of the episode is a multiplier coefficient to this affect valence. It is sometimes the case that the root cause of the current episode aligns with the root cause of the triggered episode, even though there were no matching subevents. This can be thought of as a case of analogy because two different sequences of events thought the person the same lesson. 3.2 Reflex Memories in Mindreading Unlike episodes, reflex memories do not require multiple conceptual cues to be triggered. Each conceptual cue or conceptual cue Bayesian conditioned on a topic will be directly triggered by the same conceptual cue found in text. In applying reflex memories, we separate the cases of interpolation versus extrapolation. Interpolation occurs when a conceptual cue in the current episode is found in reflexive memory. In this case, the affect valence of the memory is projected onto the current episode. However, if the conceptual cue is not found in memory, then we can try to extrapolatively predict its affect by trying to map it to known concepts via conceptual analogy, and then projecting the affect valence of the analogous concept in memory onto the current episode. Conceptual analogy analyzes two concepts for structural similarities, and if they are similar enough, they are deemed analogous. To perform conceptual analogy, we use OMCSNet and a structural mapping algorithm (Getner, 1983). For example, the following analogous concept can be computed: bicycle is like car (90.0%) because both: ==[isA]==> vehicle ==[isA]==> machine ==[isA3]==> means of transportation ==[isA3]==> faster than walking ==[isA3]==> used for transport ==[isA3]==> mode of transportation ==[hasLocation3]==> street ==[hasLocation3]==> garage ==[hasCollocate2]==> wheel ==[hasCollocate3]==> wheel ==[hasUse2]==> transportation ==[hasAbility2]==> travel on road ==[hasLocation5]==> garage ==[hasPart]==> wheel The intuition behind extrapolation using conceptual analogy is that affective attitudes often transfer over to analogous concepts. In indicative trials however, we discovered that there were certain classes of concepts in which this was not the case. For example, “dog” and “cat” are, all things considered, fairly close analogs; however, person who love dogs are not likely to like cats. In the short term, we stop-listed conceptual analogies among certain “hot topics” like pets. However, we thought that this was an interesting finding and explored it a bit further. We suggest several explanations for this. As kids, we were often asked for our favorite pet, and perhaps there is a common perception that not having a definitive preference is an indication of weak personality. Such a self-reflective critic may reinforce an XOR preference relationship among possible pets. Another explanation, explored further in the next subsection, is that liking a dog has the implication that the person is in fact a dog-person. If the person identifies with the group, dog-people, then he/she can be said to inherit some of the common attitudes of that group. One of these common attitudes may be a distaste for cats, thus, blocking the affective transfer of dog over to cat. In the next subsection, we see that dog-person is a public imprimer (Minsky, forthcoming), and imprimers can be viewed computationally as the inheritance of attitudes of the imprimer. 3.3 Inherited Attitudes from Internal Imprimers So far the mindreading strategy we have discussed employs only a person’s own memory-recorded attitudes in making predictions of that person’s affective response to a new episode. Now, we would like to augment this strategy and discuss how the knowledge of a person’s imprimers can aid in prediction. Marvin Minsky describes an imprimer as someone to which one becomes attached. He introduces the concept in the context of attachment-learning of goals, and suggests that imprimers help to shape a child’s values. Imprimers can be a parent, mentor, cartoon character, a cult, or a person-type. The two most important criteria for an imprimer are that 1) the imprimer embodies some image, filled with goals, ideas, or intentions, and that 2) one feels attachment to the imprimer. Minsky theorizes that the images of imprimers can be internalized and their effects still realized. We extend this idea in the affect realm and make the further claim that internal imprimers can do more than to critique our goals; our attachment to them leads us to the willful emulation of a portion of their values and attitudes. Keeping a collection of these internal imprimers, they help to support our identity. From the supposition that we conform to many of the attitudes of our internal imprimers, we hypothesize that affective memory models of these imprimers, if known, can complement the person’s own affective memory model in affective mindreading. (Of course, a person’s personality will affect the degree to which their attitudes are influenced by others). This hypothesis is supported by much of the work in psychoanalysis. Sigmund Freud (1991) wrote of a process he called introjection, in which children unconsciously emulate aspects of their parents, such as the assumption of their parent’s personalities and values. Other psychologists have referred to introjection by terms like identification, internalization, and incorporation. We propose the following model of internal imprimers to support the affective mindreading mechanism. First, it is necessary to identify people, groups, and images that may possibly be a person’s imprimer. We can do so but analyzing the affective memory. From a list of all conceptual cues from both the episodic and reflexive memories, we use semantic recognizers to identify all people, groups (e.g. “my company”) and images (e.g. “dog”=> “dog-person”) that on average, elicit high Arousal and high Submissiveness, show high frequency of exposure in the reflexive memory, and collocate in past episodes with self-conscious emotion keywords like “proud”, “embarrassed”, “ashamed”. busi ness per sona d o me s t i c per sona OLDER BROTHER DOGPERSON WARREN BUFFETT MOM MARTHA STEWART soci al per sona SELF Fig 2. Affective models of internal imprimers, organized into personas, complements one’s own affective model Once imprimers are identified, we also wish to identify the context under which an imprimer’s attitudes show influence. Shown in Fig. 2, we propose organizing the internal imprimer space into personas representing different contextual realms. There is good reason to believe that humans organize imprimers by persona because we are different people for different reasons. One might like Warren Buffett’s ideas about business but probably not about cooking. Personas can also prevent internal conflicts but allowing a person to maintain separate systems of attitudes in different contexts. To identify an imprimer’s context, we must first agree on an ontology of personas, which can be person-general (as the personas in Fig. 2 are) or person-specific. Given this ontology, we use features of each context, such as keywords taken from the GetContext() function of OMCSNet, to classify episodes. Once imprimers are associated with personae, we gather as much “personal” text from each imprimer as desired and acquire only the reflexive memory model, thus relaxing the constraint that texts have episodic organization. In the augmented mindreading strategy (depicted in Fig. 3), when conceptual cues are unfamiliar to the self, we identify internal imprimers whose persona matches the genre of the new episode, and give them an opportunity to react to the cue. These affective reactions are multiplied by a coefficient representing the ability of this self to be influenced, and the valence score is added on to the episode. Rather than maintaining all attitudes in the self, internal imprimers enable judgments about certain things to be mentally outsourced to the personaappropriate imprimers. cues unfamiliar to self are referred to imprimers Conceptual Analogy SELF unfamiliar Affective Reflexive Memory conceptual cue Affective LTEM episode frame New Episode root cause Fig 3. The imprimer-augmented affective mindreading strategy. Edges represent memory triggers. 4. EVALUATION PERSONA was evaluated in affective mindreading experiments performed with four subjects. Subjects were between the ages of 18 and 28, and have kept diary-style weblogs for at least 2 years, with an average entry interval of three-to-four days. Subjects submitted their weblog urls, for the generation of affective memory models. An imprimer identification routine was run, and the examiner hand-picked the top one imprimer for each of the three personas implemented. A personal text corpus was built, and imprimer reflexive memory models were generated. The subjects were asked to engage in an interview-style experiment with the examiner. In the interview, subject and their corresponding PERSONA models were asked to evaluate 12 short paragraph texts representative of three genres: social, business, and domestic (corresponding to the ontology of personas in the tested implementation). The same set of texts was presented to each participant and the examiner chose texts that were generally evocative. They were asked to summarize their reaction by rating three factors on Likert-5 scales. Feel negative about it (1)…. Feel positive about it (5) Feel indifferent about it (1) … Feel intensely about it (5) Don’t feel control over it (1)… Feel control over it (5) These factors are mapped onto the PAD valence format, assuming the following correspondence: 1-1.0, 2 -0.5, 30.0, 4 +0.5, and 5 +1.0. Subjects’ responses were not normalized. To assess the performance of PERSONA, we record the spread between the human assessed valence and the computer assessed valence, Vspread Vhuman VPERSONA (2) We computed the mean spread and standard deviation across all episodes along each PAD dimension. On the –1.0 to +1.0 valence scale, the maximum spread is 2.0. Table 1 summarizes the results. Table 1. Performance of PERSONA affective mindreader, measured as the spread between human and computer judged values. Pleasure Arousal Dominance mean spread std. dev. mean spread std. dev. mean spread std. dev. SUBJECT 1 0.39 0.38 0.27 0.24 0.44 0.35 SUBJECT 2 0.42 0.47 0.21 0.23 0.48 0.31 SUBJECT 3 0.22 0.21 0.16 0.14 0.38 0.38 SUBJECT 4 0.38 0.33 0.22 0.20 0.41 0.32 BASELINE1 0.50 BASELINE2 0.67 Assuming that human reactions obeyed a uniform distribution over the Likert-5 scale, we give two baselines, which were simulated over 100,000 trials. In BASELINE 1, VPERSONA is fixed at 0.0. In BASELINE 2, VPERSONA is given a random value over the interval [-1.0,1.0] with a uniform distribution. It should be pointed out however, that in the context of an interactive sociable computer, BASELINE 1 is not a fair comparison, because it would never produce any behavior. On average, PERSONA performed noticeably better than both baselines, excelling particularly in predicting arousal, and having the most difficulty predicting dominance. The standard deviations were very high, reflecting the observation that PERSONA’s predictions were often either very close to the actual valence, or very far. This can be attributed to one of several causes. First, multiple episodes described in the same journal entries may have led to improper associative learning. Second, the reflexive memory model does not account for conflicting word senses. Third, personal texts inputted for the imprimers often generated models skewed to positive or negative because text did not always have an episodic organization. While results along the pleasure and dominance dimensions are weaker, the arousal dimension recorded a mean spread of 0.22, suggesting the possibility that it alone may have immediate applicability. In the experiment, we also analyzed how often the episodic memory, reflexive memory, and imprimers were triggered. Episodes were on average, 4 sentences long. For each episode, reflexive memory was triggered an average of 21.5 times, episodic memory 0.8 times, and imprimer reflexive memory 4.2 times. To measure the effect of imprimers and episodic memories, we re-ran the experiment turning off imprimers only, episodic memory only, and both. Table 2 summarizes the results. Table 2. Performance of PERSONA that can be attributed to imprimers and episodic memory Imp ON, Epi ON Pleasure Arousal mean spread mean spread Dominance mean spread 0.35 0.22 0.43 Imp ON, Epi OFF 0.34 0.21 0.43 Imp OFF, Epi ON 0.40 0.28 0.44 Imp OFF, Epi OFF 0.41 0.29 0.45 (table 1 results sum’ed) These results suggest that the positive effect of episodic memory was negligible on the results. This certainly has to do with its low rate of triggering, and the fact that episodic memories were weighted only slightly more than reflexive memories. The low trigger rate of episodic memory can also be attributed to the strict criteria that three conceptual cues in an episode frame must trigger in order for the whole episode to trigger. These results also suggest that imprimers played a measurable role in improving performance, which is a very promising result. [2] Call, J. & Tomasello, M. (1996). "The effect of humans on Overall, the evaluation demonstrates that the proposed approach is promising, but needs further refinement. The randomized BASELINE 2 is a good comparison when considering possible entertainment applications, whose interaction is more fail-soft. PERSONA does quite well against the active BASELINE 2, and is within the performance range of these applications. However, the results also suggest that PERSONA may not be ready for deployment to a sociable computer just yet, because fallout (bad predictions) can be very costly in the realm of affective communication (Nass et al., 1994). Affective communication obeys certain social contracts, making them fail-hard applications. [5] Freud, S. (1991). The essentials of psycho-analysis: the de- 5. CONCLUSION Mindreading is a problem of interest to the cognitive science community, as well as the artificial intelligence community. While cognitive science is generally interested in attaining a deeper understanding of the problem, we argue that their primarily behavioral and neurological means of study will be insufficient to uncover the larger mechanistic picture of theory of mind and mindreading in humans. On the other hand, the artificial intelligence community is interested in applying mental modeling and mindreading to build sociable computers. However, their models of profiling and collaborative filtering are too weak. In this work, we have tried to address the need of both communities and walk a middle ground. In our study of affective mindreading, we proposed and built a system that attempts to model a person much deeper than present approaches in AI. The proposed episode-reflex model of human affective memory has interesting implications for psychology and cognitive sciences, providing a real way to be able to test classical theories such as associative learning, memory organizations, and formation of identity of the self, among others. The fact that attitudes can be considered independently of beliefs and goals suggests that in computationalizing mindreading, it may be possible to decompose mindreading into several processes, of which affective mindreading is one. And indeed, it is possible that humans may have a toolkit of processes which combine to be called mindreading. It is not out of the realm of possibilities that a special process exists for modeling and assessing just the attitudes of another person. Such a process may serve the role, we speculate, of contextually disambiguating processes for belief determination. 6. ACKNOWLEDGMENTS I would like to thank Barbara Barry, Push Singh, and Andrea Lockerd for their ideas, suggestions, and inspiration in the course of this work. 7. REFERENCES the cognitive development of apes". In Reaching into Thought (eds. A.E. Russou, K. A. Bard and S. T. Parker). Cambridge University Press, pp 371--403. [3] Dennett, D. (1987). The Intentional Stance. Cambridge, MA: Bradford Books/MIT Press. [4] Ekman, P. Facial expression of emotion. American Psychologist, 48, 384-392. 1993. finitive collection of Sigmund Freud's writing selected, with an introduction and commentaries, by Anna Freud. London: Penguin. [6] Gallese, V. et al. (1996) Premotor cortex and the recognition of motor actions Cognit. Brain Res. 3, 131-141. [7] Gallese, V. and A. Goldman. (1998). Mirror neurons and the simulation theory of mind-reading. Trends in Cognitive Sciences, 2(12). [8] Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7, pp 155-170. [9] Liu, H., Lieberman, H., and Selker, T. (2003) A model of textual affect sensing using real-world knowledge. In Proceedings of the Seventh International Conference on Intelligent User Interfaces, pages 125--132. [10] Liu, H. and Singh, P. (2003). OMCSNet: A commonsense inference toolkit. In Submission. Available http://web.media.mit.edu/~hugo/publications at: [11] Locke, J. (1689). Essay Concerning Human Understanding Hypertext by ITL at Columbia University, 1995. Print version ed. P.H. Nidditch. Oxford, 1975. [12] Mehrabian, A. (1995). for a comprehensive system of measures of emotional states: The PAD Model. (Available from Albert Mehrabian, 1130 Alta Mesa Road, Monterey, CA, USA 93940). [13] Meltzoff, A. and Gopnik, A. (1993) "The role of imitation in understanding persons and developing a theory of mind". In Understanding other minds: perspectives from autism (eds. S. Baron-cohen, H. Tager-Flusberg, D. Cohen) Oxford University Press, Chapter 16 pp 335--366. [14] Minsky, M., (forthcoming). The Emotion Machine, Pantheon, New York. Several chapters http://web.media.mit.edu/~minsky are available at: [15] Nass, C.I., Stener, J.S., and Tanber, E. (1994) Computers are social actors. In Proceedings of CHI ’94, (Boston, MA), pp. 72-78, April 1994. [16] Povinelli, D.J. and Preuss, T.M. (1995) Theory of mind: evolutionary history of a cognitive specialization. Trends in Cognitive Neurosciences, 18:418-424. [17] Resnick, P. Varian, H. R. (1997). “Recommender Systems”, guest editors, “special section: recommendation systems” in CACM Vol. 40, No. 3, pp 56-58. [1] Bloom, P., (2002). Mindreading, communication, and the [18] Singh, P. et al. (2002). Open Mind Common Sense: learning of the names for things. Mind and Language, 17, 37-54. Knowledge acquisition from the general public. In Proceedings of the First International Conference on Ontologies, Da- tabases, and Applications of Semantics for Large Scale Information Systems. Lecture Notes in Computer Science. Heidelberg: Springer-Verlag. [19] Sutton, R.S. 1984. Temporal credit assignment in reinforcement learning. University of Massachusetts. Departement of Computer and Information Science. Technical Report 84-2. Amherst, MA. [20] Tulving, E 1983 Elements of episodic memory. Oxford: New York.