P2P: Recommender Systems

advertisement
P2P RECOMMENDER SYSTEMS:
A (SMALL) SURVEY
Giulio Rossetti
Talk Outline
Problem Definition
What is a recommender
System?
Why recommender Systems?
Centralized RS
Well-known families of
approaches
Collaborative Filtering (Idea)
Decentralized RS
Why P2P?
A (small) survey
What are Recommender Systems?
RSs are a class of information filtering system that seek to predict:
•
•
the rating or,
preference
that user would give to
•
•
an item (such as music, books, or movies) or
social element (e.g. people or groups)
they had not yet considered,
using a model built from the characteristics of
•
•
items (content-based approaches) or
user's social environment (collaborative filtering approaches)
Why recommender sistems?
Nowadays the amount of information we are retrieving have become
increasingly enormous (Big Data)
What we really need is a technology that can assist us find resources of
interest among the overwhelming data available
“[…] a personalized information filtering used to either predict
whether a particular user will like a particular item (prediction
problem) or to identify a set of N items that will be of interest to a
certain user.”
Well-known families of approaches
Random
Prediction
Frequent
sequences
Collaborative
filtering algorithms
Content based
algorithms
• randomly choosing of items from the set of available ones
to recommends them to the user
• if a customer frequently rates items we can exploit his
frequent rating pattern to recommend other items (with
similar rate) to him.
• requires the recommendation seekers to express their
preferences by rating items: more users rate items (or
categories) more accurate the recommendation becomes.
• attempt to recommend items that are similar to items the
user liked in the past.
Centralized Approaches
Two main family of metodologies were studied in recent years:
• User-based CF
• are CF algorithms that work on the assumption that each user belongs to a
group of similar behaving users. The basis for the recommendation is
composed by items that are liked by users. Items are recommended based on
users tastes. The algorithm considers that users who are similar (have similar
attributes) will be interested on same items.
• Item-based CF
• are a CF algorithms that look at the similarity between items to make a
prediction. The idea is that users are most likely to purchase items that are
similar to the ones already bought in the past; so by analyzing the purchasing
information we can have an idea about what he may want in the future.
P2P: Motivations
The need for efficient decentralized recommender systems has been
appreciated for some time, both for the intrinsic advantages of
decentralization and the necessity of integrating recommender
systems into P2P applications.
The two main advantages gathered are:
1.
2.
the predictions can be distributed among all users, removing the need
for a costly central server and enhancing scalability
a decentralized recommender improves the privacy of the users for
there is no central entity storing owning the private information of the
users.
P2P Recommender systems:
a small survey
User-based CF:
Buddicast, kNN (Random Samples & T-MAN)
P2PRec:
a social based recommender system
SoCS:
Social Graph Embedding
Random Walks
User-Based Collaborative Filtering
Ormándi, I. Hegedas and M. Jelasity
Node Balancing issue: Overlay topologies defined by node similarity
have often highly unbalanced degree distributions (i.e. power-law).
Overlay management: how can be builded and maintained the best
possible overlay for computing recommendation scores (taking care
bandwith of usage at the nodes)?
Desiderata: a minimal, uniform load from overlay management even
when the in-degree distribution of the expected overlay graph is
unbalanced
Approaches: BuddiCast, kNN (Random Sampling & T-MAN)
BuddiCast
• Each node local view contains a full descriptor of the node’s neighbors (i.e.
ratings). Computing reccomendations do not load the network (local information
approach).
• Load balancing:
• Block list: If a node communicates with another peer, it is put on the block list for few hours.
• Candidate list: contains close peers for potential communication
• Random list: contains random samples from the network.
• For overlay maintenance, each node connects to the best node from the candidate
list with probability α, and to a random list with probability 1−α, and exchanges its
buddy list with the selected peer.
kNN: Random Samples
• Every node has a local view of size k that contains node descriptors.
• Each node is initialized with k random samples from the network, which iteratively
approximate the kNN graph.
• The convergence is based on an iterative random sampling process.
• Random nodes are inserted into the view (which is implemented as a bounded priority
queue)
• The queue’s priority is based on the similarity function provided by the recommender module.
kNN: T-Man sampling
• Overlay managed with the T-MAN algorithm:
• T-MAN periodically updates the node’s view (of size k) by:
1. selecting a peer node to communicate with
2. exchanging its view with the peer
3. merging the two views and keeping the closest k descriptors
• Peer (communitication) selection methods:
• Global: selects the node from the whole network randomly
• View: selects the node from the view uniformly at random
• Proportional: selects a node from view but with different probability distribution
• Best: selects the most similar node without any restriction
User-based CF: Observations
1.
In unbalanced distribution cases is not optimal to use the kNN (T-Man
Best) view (a more relaxed one can give better recommendation
performance)
2.
Overlay construction converges reasonably fast even in the case of
random updates or with T-MAN
3.
T-MAN with Global selection is a good choice:
1.
it has a fully uniform load distribution combined with an acceptable convergence speed,
which is better than that of the random view update
P2PRec:
a social based P2P recommender system
Draidi and Pacitti
The idea:
recommend high quality documents related to query topics and contents hold
by friends (or FOAF), who are expert on the topics related to the query.
Assumptions:
• each node represents a peer labelled with the contents it stores and its topics of
interests;
• expertise is deduced based on the contents stored by a user;
• the topics each peer is interested in are calculated by analyzing the documents he
holds;
• to disseminate information about experts is adopted a semantic-based gossip
algorithms that provide scalability, robustness and load balancing.
How P2Prec works
1.
Latent Dirichlet Allocation (LDA) is used to automatically model the
topics in the system
Training - Global level: identification of the complete set of topics
Inference - local (node) level: extraction of the topics of interest for the user
1.
2.
2.
Dissemination of local information by a gossip algorithm
FOAF descriptor: topics of interest, trust level
At each gossip exchange, each user u checks its local-view for relevant similar
peer with respect topics of interests and friendship networks:
1.
2.
•
3.
If founded, a demand of friendship is launched.
Querying
1.
A key-word query q is associated a TTL and is routed recursively in a P2P top-k
manner
Social Graph Embedding
A. Kermarrec, V. Leroy and G. Trédan
A proximity metric between users enable to predict potential relevant
future relationships (Link Prediction)
SoCS (Social Coordinate System)
• Fully distribuited algorithm that embeds a social graph in an Eucliedean space
• Nodes gets assigned coordinate w.r.t. their social position
• Community structure is preserved
Force-based embedding (FBE):
Edges represent springs and nodes represent electrically equally charged particles.
Edges (springs) attract the vertices they link, whereas vertices (particles) repulse each other.
The embedding is achieved once the system reaches an equilibrium.
SoCS Algorithm
Social Neighbors: Nodes that have close social positions.
Graph neighbors and social neighbors of a node are not necessarily the same.
Each node regularly updates its position in the social space:
1.
2.
3.
4.
first gathers the positions of its graph and social neighbors
using these positions computes the forces that are applied to it, and derives its
updated social position
a gossip protocol provides to the node a list of its new social neighbors
this list is then used to compute new positions
Similarity metrics:
SoCS will recommend to a node its closest social neighbors that are not already graph
neighbors.
• Common Neighbors, Jaccard, Adamic\Adar, Path Length, Katz…
SoCS Algorithm (2)
SoCS relies on gossip to discover the social neighbors.
Each node runs a clustering algorithm (Neighbors Peer Sampling - NPS) in
order to maintain and update its social neighbors list.
Gossip protocols have been shown to be cheap, robust against churn, and to
converge quickly
Decentralized Random Walks
A. Kermarrec, V. Leroy, A. Moin and C. Thraves
The application of random walks to decentralized environments is different
from the centralized version.
• Centralized RS:
Random walks are used as clustering mechanism (e.g. community discovery)
• Decentralized RS:
CD infeasible: the knowledge of each peer about the P2P network is limited to its
neighborhood.
Proposed Approach
Each peer is provided with a neighborhood composed of a small set of similar
peers by means of an epidemic (gossip) protocol;
2. Ratings for unknown items are estimated by a random walk on the
neighborhood.
• Once peers have stabilized their neighborhood they can calculate
recommendations indipendently
1.
• Similarity measure: Pearson Correlation, Jaccard
Random Walks observed properties
The users in the neighborhood are modeled as Markov Chain graph vertices,
and a random walk is applied on this graph.
• A Markov chain can be represented by a directed graph where vertices are the states of the chain and edges
represent the transition probabilities from one state to another.
Results:
• Random walk works well when the data is so sparse that classic similarity measures
fail to detect meaningful relation between users;
• Increasing the neighborhood size the accuracy increase;
• decentralized user-based approaches perform better (low complexyty, high
precision) than their item-based counterparts in P2P recommender applications;
• Cosine similarity performed better in decentralized item-based algorithms, while
Pearson correlation worked better for decentralized user-based algorithms
Conclusions
• P2P Recommender systems are needed in order to overcome
scalability and privacy issues
• Several approaches were analyzed
• Each one relying (to some extent) to gossip algorithm in order to maintain
and update the overlay network
• Allmost all the discussed approaches takle the problem with a user-based
similarity strategy exploiting classical network theory approaches;
• Unsupervised Link Prediction
• Community Discovery
• Force directed embedding
Bibliography
•
D. Almazro and G. Shahatah. A survey paper on recommender systems (2010)
•
F. Draidi and E. Pacitti. Demo of P2Prec: a Social-based P2P Recommendation System. (2011)
•
A. Kermarrec, V. Leroy, A. Moin and C. Thraves. Application of random walks to decentralized recommender
systems. (2010)
•
A. Kermarrec, V. Leroy and G. Trédan. Distributed social graph embedding. (2011)
•
R. Ormándi, I. Hegedas and M. Jelasity. Overlay management for fully distributed user-based collaborative filtering.
(2010)
…questions?
Download