From Sentiment Analysis to Preference Aggregation Umberto Grandi, Andrea Loreggia, Francesca Rossi and Vijay Saraswat University of Padova and IBM TJ Watson Research Center Umberto Grandi University of Padova What is the collective sentiment about ... ...Miami? Umberto Grandi University of Padova Aggregation of individual polarities Collective sentiment 40% 60% Umberto Grandi University of Padova A Problem: Multiple Alternatives If preferences are as follows: 21 voters 10 voters 4 voters Umberto Grandi a b b | | | a a b Sentiment analysis: blue! Preference aggregation: red! University of Padova Five Challenges 1. Information extraction: score and polarity 2. Representation: incompleteness and incomparability 3. Aggregation: counting paradoxes and Borda* rule Umberto Grandi 4. Strategic Behaviour and Influencers 5. Validation: truth-tracking and axiomatic analysis 6. Big data: how to aggregate in parallel University of Padova Challenge 1 What preferences/opinions can be extracted from text? partial scores 5-star ratings 4.5 binary comparisons -3 Umberto Grandi University of Padova Score and Polarities • Objective opinions: "Nikon is a good camera'' score Raw Data Extraction Comparative opinions: ``I prefer Canon to Nikon'' •Two forms of opinions can be extracted with existing NLP techniques: binary Comparative opinions: “I prefer Canon to Nikon” ! binary comparisons comparison • Objective opinions: ”Nikon is a good camera” ! score of a single entity • Definition The raw data extracted from individual expressions Ti is a tuple ( i , Pi , Ni ): • i : Di ! R to represent objective opinions on Di ✓ X N • subsets Pi and Ni of X preordered by 6P and 6 i i , representing positive and negative comparative opinions Umberto Grandi University of Padova Challenge 2 How to represent (compactly) the information extracted? Interpersonal incomparability Umberto Grandi Incompleteness University of Padova Pure Sentiment Data The approach taken by most sentiment analysis tools: Pure Sentiment Data Definition The pure sentiment data associated with raw data ( i , Pi , Ni ) is a function Si : {Di [ Pi [ Ni } ! {+, , 0} defined as: 8 sgn( i (c)) if c 2 Di \ (Pi [ Ni ) > > > <0 if i (c) = 0 Si (c) = > + if c 2 Pi > > : if c 2 Ni Example Pure sentiment Umberto Grandi data only deals with polarities: University of Padova Pure Preference Data The approach that preference aggregation take: Pure Preference Data Definition The pure preference data associated with raw data ( i , Pi , Ni ) is a preordered set (Di , 6D i ) where Di = Di [ Pi [ Ni and 8 P x 6 or > i y and x, y 2 Pi > > <x 6N y and x, y 2 N or i i D x 6i y , > x 2 Ni and y 2 Pi or > > : i (x) 6 i (y) and x, y 2 Di Example Pure preference data only deals with pairwise comparisons: Umberto Grandi University of Padova SP-Structures What we propose: Sentiment Preference Structures Definition AnDefinition SP-structure over X is a tuple (P, N , Z) such that: over X is a tuple (P, N , Z) such that: •AnP,SP-structure N and Z form a partition of X • P, N and Z form a partition of X • P and N are ordered respectively by preorders 6P and 6N • P and N are ordered respectively by preorders 6P and 6N An SP-structure (Pi , Ni , Zi ) can be extracted from raw data ( i , Pi , Ni ): • Pi is the union of Pi and the set of entities with positive score • Analogously for Ni . Zi is the set of entities with zero or no score • Preordered relations extracted from Umberto i and copied from Pi and Ni SP-structures combine (interpersonally non-comparable) scores with Grandi (incomplete) pairwise comparisons between entities University of Padova How do they look like? Agent 1 a | b⇠c Agent 2 Agent 3 b a | b P c Z a | c N Table: SP-structures extracted from the previous example. Umberto Grandi University of Padova Challenge 3 How to aggregate the individual information into a collective opinion? Umberto Grandi University of Padova Is it really so different? A basic sentiment paradox: • 2 candidates • preferences and polarity • Majority = Collective Sentiment | 90 voters 10 voter a b b | | a Majority rule winner: a Collective sentiment predictor: b Umberto Grandi University of Padova Counting Paradoxes 35% 33% Percentage of paradoxical profiles 31% 29% 27% 25% 23% 21% 19% 17% 15% Number of voters Figure: % of collective sentiment paradoxes 30% of total cases up to N=92 Umberto Grandi University of Padova Aggregating SP-Structure Aggregating SP-structures Definition The Borda⇤ score of entity c 2 X in SP-structure (P, N , Z) is defined as: 8 P we knowPabout Borda⇤ What > <2 ⇥ | down (c)| + | inc (c)| + |Z| + 1 if c 2 Pi s⇤ (c) = 2 ⇥ | upN (c)| | incN (c)| |Z| 1 if c 2 Ni > : preferential if all comparisons are positive (negative) for all A profile is purely 0 if c 62 Pi [ Ni individuals. A profile is purely sentimental if only positive/neutral sentiment is expressed and no pairwise comparison. Theorem Given a profile S of SP-structures, the most ⇤ popular candidates are the ones If a profile S is purely preferential, then B ⇤(S) = Borda(S). maximising the sum of the individual Borda⇤ score: If a profile S is purely sentimental, then B (S) = Approval(S). X ⇤ ⇤ B (S) = argmax si (c) c2X Choice Theory: Axiomatic properties adapted from Social i2I Axioms characterising Borda (classic version) still hold if adapted. Theorem The Borda⇤ rule satisfies consistency, faithfulness, neutrality and the Umberto Grandi property. University cancellation of Padova Challenge 4 Is it possible to identify influencers and prevent strategic behaviour? Umberto Grandi University of Padova Challenge 5 How should preference methods be validated? Against real events (predictive ability) Umberto Grandi Axiomatically (Social Choice Theory) University of Padova Challenge 6 How to deal with big data in sentiment/preference analysis? Umberto Grandi University of Padova Thank you! • Challenge 1: what can be extracted • Challenge II: (compact) representation • Challenge III: aggregation • Challenge IV: strategic behaviour • Challenge V: validation • Challenge VI: big data Umberto Grandi University of Padova