Umberto Grandi, Andrea Loreggia, Francesca Rossi and Vijay

advertisement
From Sentiment Analysis to
Preference Aggregation
Umberto Grandi, Andrea Loreggia,
Francesca Rossi and Vijay Saraswat
University of Padova and IBM TJ Watson Research Center
Umberto Grandi
University of Padova
What is the collective sentiment about ...
...Miami?
Umberto Grandi
University of Padova
Aggregation of individual polarities
Collective
sentiment
40%
60%
Umberto Grandi
University of Padova
A Problem: Multiple Alternatives
If preferences are as follows:
21 voters
10 voters
4 voters
Umberto Grandi
a
b
b
|
|
|
a
a
b
Sentiment analysis:
blue!
Preference aggregation:
red!
University of Padova
Five Challenges
1. Information extraction:
score and polarity
2. Representation:
incompleteness and
incomparability
3. Aggregation:
counting paradoxes and
Borda* rule
Umberto Grandi
4. Strategic Behaviour and
Influencers
5. Validation: truth-tracking
and axiomatic analysis
6. Big data: how to aggregate
in parallel
University of Padova
Challenge 1
What preferences/opinions can
be extracted from text?
partial scores
5-star ratings
4.5
binary comparisons
-3
Umberto Grandi
University of Padova
Score and Polarities
• Objective opinions: "Nikon is a good camera''
score
Raw Data Extraction
Comparative
opinions:
``I
prefer
Canon
to
Nikon''
•Two
forms of opinions can be extracted with existing NLP techniques:
binary
Comparative opinions: “I prefer Canon to Nikon” ! binary
comparisons
comparison
• Objective opinions: ”Nikon is a good camera” ! score of a single entity
•
Definition
The raw data extracted from individual expressions Ti is a tuple ( i , Pi , Ni ):
•
i
: Di ! R to represent objective opinions on Di ✓ X
N
• subsets Pi and Ni of X preordered by 6P
and
6
i
i , representing positive
and negative comparative opinions
Umberto Grandi
University of Padova
Challenge 2
How to represent (compactly)
the information extracted?
Interpersonal
incomparability
Umberto Grandi
Incompleteness
University of Padova
Pure Sentiment Data
The approach taken by most sentiment analysis tools:
Pure Sentiment Data
Definition
The pure sentiment data associated with raw data ( i , Pi , Ni ) is a function
Si : {Di [ Pi [ Ni } ! {+, , 0} defined as:
8
sgn( i (c)) if c 2 Di \ (Pi [ Ni )
>
>
>
<0
if i (c) = 0
Si (c) =
>
+
if c 2 Pi
>
>
:
if c 2 Ni
Example
Pure sentiment
Umberto
Grandi data only deals with polarities:
University of Padova
Pure Preference Data
The approach that preference aggregation take:
Pure Preference Data
Definition
The pure preference data associated with raw data ( i , Pi , Ni ) is a preordered
set (Di , 6D
i ) where Di = Di [ Pi [ Ni and
8
P
x
6
or
>
i y and x, y 2 Pi
>
>
<x 6N y and x, y 2 N
or
i
i
D
x 6i y ,
>
x 2 Ni and y 2 Pi
or
>
>
:
i (x) 6 i (y) and x, y 2 Di
Example
Pure preference data only deals with pairwise comparisons:
Umberto Grandi
University of Padova
SP-Structures
What we
propose:
Sentiment
Preference
Structures
Definition
AnDefinition
SP-structure over X is a tuple (P, N , Z) such that:
over X is a tuple (P, N , Z) such that:
•AnP,SP-structure
N and Z form a partition of X
• P, N and Z form a partition of X
• P and N are ordered respectively by preorders 6P and 6N
• P and N are ordered respectively by preorders 6P and 6N
An SP-structure (Pi , Ni , Zi ) can be extracted from raw data ( i , Pi , Ni ):
• Pi is the union of Pi and the set of entities with positive score
• Analogously for Ni . Zi is the set of entities with zero or no score
• Preordered relations extracted from
Umberto
i
and copied from Pi and Ni
SP-structures combine (interpersonally non-comparable) scores with
Grandi (incomplete) pairwise comparisons between entities
University of
Padova
How do they look like?
Agent 1
a
|
b⇠c
Agent 2
Agent 3
b
a
|
b
P
c
Z
a
|
c
N
Table: SP-structures extracted from the previous example.
Umberto Grandi
University of Padova
Challenge 3
How to aggregate the individual
information into a collective opinion?
Umberto Grandi
University of Padova
Is it really so different?
A basic sentiment paradox:
• 2 candidates
• preferences and polarity
• Majority = Collective Sentiment
|
90 voters
10 voter
a
b
b
|
|
a
Majority rule winner: a
Collective sentiment predictor: b
Umberto Grandi
University of Padova
Counting Paradoxes
35%
33%
Percentage of paradoxical profiles
31%
29%
27%
25%
23%
21%
19%
17%
15%
Number of voters
Figure: % of collective sentiment paradoxes
30% of total cases up to N=92
Umberto Grandi
University of Padova
Aggregating SP-Structure
Aggregating SP-structures
Definition
The Borda⇤ score of entity c 2 X in SP-structure (P, N , Z) is defined as:
8
P we knowPabout Borda⇤
What
>
<2 ⇥ | down (c)| + | inc (c)| + |Z| + 1 if c 2 Pi
s⇤ (c) =
2 ⇥ | upN (c)| | incN (c)| |Z| 1 if c 2 Ni
>
: preferential if all comparisons are positive (negative) for all
A profile is purely
0
if c 62 Pi [ Ni
individuals. A profile is purely sentimental if only positive/neutral sentiment is
expressed and no pairwise comparison.
Theorem
Given a profile S of SP-structures, the most
⇤ popular candidates are the ones
If a profile S is purely preferential, then B ⇤(S) = Borda(S).
maximising the sum of the individual Borda⇤ score:
If a profile S is purely sentimental, then B (S) = Approval(S).
X ⇤
⇤
B (S) = argmax
si (c)
c2X Choice Theory:
Axiomatic properties adapted from Social
i2I
Axioms characterising Borda (classic version) still hold if adapted.
Theorem
The Borda⇤ rule satisfies consistency, faithfulness, neutrality and the
Umberto
Grandi property.
University
cancellation
of Padova
Challenge 4
Is it possible to identify influencers
and prevent strategic behaviour?
Umberto Grandi
University of Padova
Challenge 5
How should preference methods be validated?
Against real events
(predictive ability)
Umberto Grandi
Axiomatically
(Social Choice Theory)
University of Padova
Challenge 6
How to deal with big data
in sentiment/preference analysis?
Umberto Grandi
University of Padova
Thank you!
• Challenge 1: what can be extracted
• Challenge II: (compact) representation
• Challenge III: aggregation
• Challenge IV: strategic behaviour
• Challenge V: validation
• Challenge VI: big data
Umberto Grandi
University of Padova
Download