Uploaded by Anish Singh

Comparison Analysis

advertisement
Homework Assignment 1
 There are many published papers on the topic of P1 and the authors cite some of them. Explain
what the authors do differently from these papers they cite, and how do they show/argue that
their approach is a better choice?
Comparison with paper [3] Muhammad et al.

In paper [3] author does not provide explanation in natural language. It compares
the recommended item on how it is better than other alternatives in percentage%.
But in the paper [1] author provides a natural language justification using automatic
text summarization using relevant aspects and evaluation results showed that
people like long justification about recommendation and find it more trustworthy
and engaging.

In paper [3] authors took both positive and negative reviews to present explanation for
recommendations and provided pros and cons of particular item with other alternatives. In
paper [1] author only considers positive sentiments of people as normally we see
recommendation of items that are good and gives justification about that recommendation.
Comparison with paper [4] Chang et al.

Paper [4] also provides natural language explanation for justification of
recommendation but uses crowd-sourcing platform to write explanation and
manually annotate sentences. They use unsupervised learning to generate topical
dimension and data mine relevant quotes for selected dimension. Based on user
interest of topical dimension justification is provided in natural language. In paper
[1] author provides an automated pipeline for providing natural language
justification instead of manually annotating sentences by crowd.
 The approach in P1 has several components. Briefly explain these components and draw a
picture that shows what inputs they take, how they interact with each other and what output is
finally produced.
Input
Review Set R = {r1,r2,…rn}
Aspect Extraction
KL Divergence method to extract aspects from reviews
image source – [1]
Tuple Output ={𝑟𝑖, 𝑎𝑖𝑗, 𝑟𝑒𝑙(𝑎𝑖𝑗, 𝑟𝑖), 𝑠𝑒𝑛𝑡(𝑎𝑖𝑗, 𝑟𝑖)}
Aspect Ranking
Global score for each aspect is calculated for ranking
image source- [1]
Top-k main aspects
Sentence Filtering
Review 𝑟𝑖 ∈ 𝑅 in sentences 𝑠𝑖1 … 𝑠𝑖𝑚
Keep sentences matching the criteria:
o si contains main aspect a1,a2….ak
o si is no longer than 5 tokens
o si expresses a positive sentiment
o si does not contain first-person personal or possessive
pronuns
Potential candidate sentences
o
o
o
Text Summarization
Centroid Vector Building
Sentence Scoring
Sentence Selection
Output
Summary for justification of a recommendation
In the methodology part author have mentioned 4 phases of pipeline used to natural language
justification for recommendation for items. These are explained below:
Aspect Extraction
o
o
o
o
Goal of this phase was to find aspects with positive sentiments which distinguishing
properties of an item and are worth to be included in recommendation justification.
When the review ri is given as input all the nouns were extracted using POS tagging. Then
using KL divergence is calculated for each noun by using set of movie reviews corpora and
British National Corpus.
Nouns whose KL divergence score is greater than a threshold ∈ are marked as aspect and
their KL divergence score is used as relevance score rel(aij, ri) for that aspect.
Finally using sentiment analysis algorithm sentiment sent(aij, ri) relate to each aspect aij in
review ri is generated and stored. This phase outputs a set of 4 tuples (ri, aij, rel(aij,ri),
sent(aij,ri).
Aspect Ranking
o
o
o
Goal is to identify most relevant aspect that describes an item and merge information
extracted from every review.
For each aspect a global score is calculated using the formula given in figure 1 (where naj ,ri is
the number of occurrences of aspect aij in review ri) and it gives higher score to the aspects
that are mentioned often with a positive sentiment.
The aspects are ranked using global score and top-k aspects are labelled as main aspect and
is given as output to next phase.
Sentence Filtering
o
o
o
Goal is to filter out non-compliant sentence that will not be useful for final justification for
recommendation.
Each review ri is split into sentences si1,…sim and for each sentence si compliancy is verified
using criteria mentioned in figure 1. We keep only those sentences that are compliant.
In justification we only include relevant aspect and expressing a positive sentiment about
the item. Filter out short, non-informative and those using first person.
Text Summarization
o
o
o
o
Goal is to generate unique summary to be used as justification for recommendation which
covers main contents in reviews about a particular item.
Centroid Vector Building: Information coming from reviews are summarized in a centroid
vector. Centroid vector is built in 2 steps. First, the most meaningful words are selected
using tf-idf weighing scheme. Second, centroid embeddings are calculated using sum of
embedding of top ranked words in the reviews.
Sentence Scoring: We need to identify sentences which we need to include in final
justification for particular recommendation. We calculate cosine similarity between centroid
and each sentence.
Sentence Selection: Sentences are sorted in descending order of their similarity and we
select top ranked sentence and add to the summary until we have reached the limit of
maximum number of terms.
 To evaluate their method, the authors of P1 consider various factors that affect performance
and set up alternative configurations. List and explain the evaluation criteria and
configurations considered by the authors, and elaborate on the winner configuration.
o
There were 2 experiments conducted to evaluate the methods. Goal of 1st
experiment was to test different configurations for justification on text
summarization and the goal of 2nd experiment was to compare methodology
mentioned in paper to baseline that make use of user’s reviews without text
summarization.
o
For evaluating different configurations author used combination of varying length of
justification (short justification of 50 words and long justification of 100 words) and
number of aspects (top-10 and top-30 aspects)
o
Evaluation criteria for experiments [2]:
Transparency: Explain how a particular justification is generated
Persuasiveness: Convince users to try the system
Engagement: How much a user is interacting with the system
Trust: How much surety this system provides about a justification
Effectiveness: Help users to take good decisions.
For Experiment 1 between-subject
Each user was selected to randomly test different configuration of pipeline with varying
justification length and number of aspects and user evaluated the justification. Results were
evaluated on 5-point scale (1-strongly disagree,5-strongly agree). Results are in figure 3.
Figure 2—image source- [1]
Results: Values in figure 2 for metrics are average scores for that particular metric. Users
liked longer justification provided by the system as it provided clearer understanding of
suggestion and results were more satisfying. In relation to number of aspects best results
were for top-10 aspects meaning using 10 relevant aspects system was able to provide
quality suggestions to users.
For Experiment 2 within-subject:
In this experiment evaluation was done by comparing 2 different styles of justification.
Summary generated by the system discussed in paper and review based baseline was
presented on the same screen and results from best performing configuration (Aspect top10 and length -long) are displayed in below figure 3.
Figure 3- image source [1]
Results: Most of the users preferred technology discussed in paper which is based on
automatic text summarization and have higher trust, engagement and motivated to use this
system compared to baseline review-based technique.
References:
[1] Musto C, Rossielo G, de Gemmis M, Lops P, Semeraro G. Combining Text Summarization
and Aspect-based Sentiment Analysis of Users’ Reviews to Justify Recommendations. In
Proceedings of the RECSYS 2019 Int. Conference (pp. 383-387)
[2] Nava Tintarev and Judith Masthof. 2012. Evaluating the effectiveness of explanations for
recommender systems. UMUAI 22, 4-5 (2012), 399–439.
[3] Khalil Ibrahim Muhammad, Aonghus Lawlor, and Barry Smyth. 2016. A Live-User Study of
Opinionated Explanations for Recommender Systems. In Proceedings of the 21st
International Conference on Intelligent User Interfaces. ACM, 256–260
[4] Shuo Chang, F Maxwell Harper, and Loren Gilbert Terveen. 2016. Crowd-based
personalized natural language explanations for recommendations. In Proceedings of the
10th ACM Conference on Recommender Systems. ACM, 175–182.
Related documents
Download