Final talk Automatically Acquiring a Dictionary of EmotionProvoking Events Student: Hoa Vu-Trong – VNU Supervisor: Graham sensei - NAIST 1/20 Can Twitter benefit a dialogue system? Twitter users Dialog System Machine: Hello! User: Hello! User: A guy next to me today, are too noisy ! Machine: That's so annoying! User: 2/20 Motivation Text emotion classifier ● ● ● Emotion is not present in specific word. 1. I feel happy today 2. I met my friend today 4% of words imply emotion [1] Simple architecture of dialogue system with emotion adaption. [1] Pennebaker, J.W., Mehl, M.R., Niederhoffer, K.: Psychological aspects of natural language use: Our words, our selves. Annual Review of Psychology 54, 547–577 (2003) 3/20 Motivation • Arbitrarily large set of emotion-provoking events can be collected from Twitter You must be very happy 400M tweets/day 4/20 Method ● Emotion and Event have relation. ● Pattern learning is an effective way to harvest semantic relation – Espresso (Pantel and Pennacchiotti 06). Ex: “I'm happy that I have the support of my friends. I love all of them!” “I'm sad that tomorrow is Monday and I have to work. It's bad day” Pattern: I be EMOTION that EVENT Instances: happy – I have the support of my friends sad – tomorrow is Monday and I have to work 5/20 Espresso Algorithm ● ● ● Used in mining semantic relation (eg: is-a, has-a …) begins with some seed instances. Each iteration contains 3 phases: – Pattern Induction – Pattern ranking – Instance extraction Stopping criterion: enough patterns, average reliabilty of the patterns decrease t% or exeeds defined number of iterations. 6/20 Espresso Algorithm ● Pattern Induction: Infers all the patterns P that connect the seed instances. Ex: I'm happy that I have the support of my friends. I love all of them! I'm sad that tomorrow is Monday and I have to work. It's bad day I be EMOTION that EVENT . I love all of you I be EMOTION that EVENT . It be bad day I be EMOTION that EVENT - 2 times EMOTION that EVENT . - 2 times EMOTION that EVENT . I love all – 1 time … … 7/20 Espresso Algorithm ● ● Pattern ranking: Rank all the patterns and extract top K reliable ones. Reliable patterns: one that both highly precise and one that extract many instances (more in next slides). 8/20 Espresso Algorithm ● ● Instance Extraction: Retrieves top M reliable instances match K patterns extracted from previous phase. Reliable instance: one that highly associated with as many reliable patterns. (more in next slides) 9/20 Espresso Algorithm ● Strength of association between instance i(x,y) and pattern p is measured by PMI. 𝑝𝑚𝑖 𝑖, 𝑝 = log 𝑐𝑜𝑢𝑛𝑡 𝑖, 𝑝 𝑐𝑜𝑢𝑛𝑡 i × 𝑐𝑜𝑢𝑛𝑡 p 10/20 Espresso Algorithm ● Pattern reliability: 𝑖∈𝐼 𝑟 𝑝 = ● 𝑝𝑚𝑖 𝑖, 𝑝 𝑚𝑎𝑥𝑃𝑀𝐼 ∗ 𝑟 𝑖 𝑐𝑜𝑢𝑛𝑡 𝐼 0<𝑟 𝑝 ⩽1 Instance reliability: 𝑝∈𝑃 𝑟′ 𝑖 = 𝑝𝑚𝑖 𝑖, 𝑝 𝑚𝑎𝑥𝑃𝑀𝐼 ∗ 𝑟 𝑝 𝑐𝑜𝑢𝑛𝑡 𝑃 0 < 𝑟′ 𝑖 ⩽ 1 11/20 Grouping events ● ● ● Relieve sparsity issues to some extent by sharing statistics among the events in a single group allows humans to understand the events better, highlighting the important events shared by many people Using hierarchical agglomerative clustering and the single-linkage criterion using cosine similarity as a distance measure Experiments ● Data corpus: 30 million tweets from Neubig and Duh 13' [1] ● Tweet normalization by Han et al 12' [2] ● Stanford parser [3] was employed to make sure that event must be a sentence [1] Graham Neubig, Kevin Duh.How Much is Said in a Tweet? A Multilingual, Information-theoretic Perspective in AAAI Sprin [2] Han et al. Automatically Constructing a Normalisation Dictionary for Microblogs in EMLNP 2012 http://nlp.stanford.edu/software/lex-parser.shtml 13/20 Experiments ● 6 basic emotion classes defined by Ekman [1] : – Anger: angry, mad – Digust: digusted, terrible – Fear: afraid, scared – Happiness: happy, glad – Sadness: sad, upset – Surprise: surprised, astonished [1]Ekman, P.: Universals and cultural dierences in facial expressions of emotions. Nebraska Symposium on Motivation 19, 207{283 (1972)} 14/20 Experiments ● We start the system with the seed instances collected by the pattern: “I be EMOTION that EVENT” ● Reliability of seed instances is 1. ● Stopping criterion: limit iterations. 15/20 Result ● Happiness: 14027 events ● Sadness: 3909 events ● Fear: 8798 events ● Anger: 2133 events ● Surprise: 2466 events ● Disgust: 26 events 16/20 Result ● Some new patterns: I feel EMOTION when EVENT I be EMOTION because EVENT I be EMOTION EVENT I get so EMOTION when EVENT Make me EMOTION when EVENT Get really EMOTION that EVENT Be really EMOTION to hear that EVENT Be EMOTION to know that EVENT EMOTION at the fact that EVENT be EMOTION to death that EVENT … 17/20 Evaluation ● Using Mean Reciprocal Rank(MMR): 𝑀𝑅𝑅 = 1 ∣𝑄∣ ∣𝑄∣ 𝑖=1 1 𝑟𝑎𝑛𝑘𝑖 Predicted Human annotation Rank Reciprocal rank Surprised Happiness Surprise Sadness 2 1/2 18/20 Evaluation ● Measuring recall – Asking 30 people about 5 events that provoke each of five emotions Emotions Events happiness meeting friends buying/getting something I want going on a date sadness a plan gets cancelled someone dies/gets sick failing a test anger someone breaks a promise someone insults me someone breaks something of mine fear getting a sudden phone call seeing an insect walking at night surprise seeing a friend unexpectedly seeing a car suddenly appear hearing a loud noise 19/20 Evaluation ● Evaluation emotion-provoking events ● Human evaluation on top 100 groups. Methods MRR Recall Seed 51.8 5.21 Seed + clustering 66.1 Espresso Espresso + clustering Emotions MRR Recall Happiness 100 26.9 9.40 Sadness 82.3 10.0 51.5 8.55 Anger 82.4 15.8 74.7 16.2 fear 46.3 27.3 Surprise 58.3 0.0 20/20 Disscusion ● ● ● Recall is still relatively low Events extracted from Twitter were somewhat biased towards everyday events or events regarding love and dating for surprise we didn’t manage to extract any of the emotions created by the annotators at all 21/20 In Conclusion ● ● ● This work focus on acquiring emotion-provoking events Using Espresso algorithm to learn patterns and extract events then similar events are grouped to create a dictionary. Paper summited to EACL 2014 22/20 Arigato gozaimasu 23/20