Decentralized Recommendation Protocols for File Sharing

1st Gossple Workshop on Social Networking (december 2010) Large-scale data sharing by exploiting gossiping Esther Pacitti SOPHIA ANTIPOLIS - MÉDITERRANÉE Context: P2P Data Sharing • We consider P2P online communities where participants can be – Professionals (researchers, engineers, support staff, etc.) who use web-scale collaboration in their workplace – Large scale of users and data (clouds, grids, internet) • Example of applications: – P2P Recommendation Systems • Useful for processing scientific workflows among participants’ peers – P2P Query Reformulation • Clinical case sharing among doctors or physicians – P2P CDN • Projects: – ANR DataRing (2009-2012, P2P online communities ) – Datluge (2010-2012, with UFRJ, Brazil on P2P scientific workflows) MOTIVATIONS Chemistry, Materials Science and Physics Bioinformatics Computer Science P2PRec: document recommender • Hudge graph:G = (D,U,E,T), where – D is the set of shared documents – U is the set of users in the system – E is the set of edges between the users such that there is an edge e(u,v) if users u and v are friends – T is the set of users’ topics of intrest. • Problem: Given a query, recommend the most relevant documents • Our approach – Reduce the research space by indentifing relevant users – Identify relevant users • Users that stores/downloads enough high-quality documents, and become kind of providers in specific topics • Recommended by trusted friends • P2P Overlay : Semantic-Gossiping • Disseminate relevant users and their topics of intrests P2PRec*: document recommender • • • • Topics of intrest – With respect to the documents a user store – Extracted automatically Friendship network – Explicit friendship (maybe laveraged with implicit) – Expresses users trusts – Implemented is FOAF files (friend of friends files, machine-readable vocabulary serialized in RDF/XML ) Key-word Queries – Mapped to topics – Mostly related to the user topics of intrest Mesure to – Check the similarity of users wrt to their topics (Dice coefficient) – Relevance of a user *Joint work with F. Draidi, P. Valduriez, B. Kemme, to appear as Inria report Semantic-Gossiping u11’s local-view view after before gossip gossip u1 FOAF u1 topics: t1,t2 Friends: link to u5 FOAF user Gossip information information Gossip u5 tt11,, tt22 u6 tt22,t,t33 u4 t1 If distance between uu and uv > τ ask for friendship u5 topics Dice coefficient u1 u t3 1 t1 u2 u6 t2,t3 u5 t1,t2 u4 t1 u55’s gossip ’s local-view local-view before after gossip user Gossip information u4 t1 u6 t2,t3 If friendship is accepted add uv to FOAF file Relevant Users • Users topics of intrest are automatically extracted using LDA* – by inspecting the documents topic vector • A user is considered relevant on a topic tTu, if a percentage of its documents have high quality in topic t • Each document doc at user u has – A rate given to doc : ratedoc – doc topic Vector (extracted using LDA) • Vdoc={wdoct1,…..,wdoctd} • doc is considered a high quality in a topic t qualityt(doc,u) • If wdoct *ratedoc > a threshold value • A user can be relevant in more than one topic *Latent Dirichlet Allocation (topic classifier) Query Processing • Implements Recommendation • Input: Key words • Output: – Links to a set of good quality documents. May include links to documents on the topic of intrests of a friend (query expansion) – Popularity and Similarity info • Example: doctors studing the behavior of a gene X may be glad to learn about the deseases it can cause and check some experimental data sets Query Processing query q requester q.t = t1, q.TTL=2 Summary of Docs similarity and classification info Compute sim(doc,q) t3 u1 t1 u7 q.TTL=1 q.TTL=1 u2 q.TTL=0 u6 t2,t3 u1 FOAF u1 topics of intrests Friends: link to u5 FOAF u5 topics u3 t2 u5 t1,t2 Compute sim(doc,q) query Rec. docs u4 t1 Compute sim(doc,q) 1) Query q is mapped to a topic or topics Tq 2) Select Top-k friends in the FOAF wrt to the query topics (cosine similarity) 3) Redirect Query 4) Do 2) and 3) Recursively until TTL Conclusions P2PRec • P2PRec (BDA2010) – Find friends (relevant users on similar topics) while gossiping – Query processing exploits relevant users wrt to the query topics, recursively (FOAF friends) – Perf. Evaluation • Recall x Precision x Response Times – Limitation of LDA: needs some centralization for training, but good to validate our general approach – However there are other possibilities: • Ontology based automatic annotation • This exists for biomedical documents P2P Query Reformulation* • P2P Data Management System (PDMS) • Each peer has: – Its own schema (and data) – 1 or more mapping acquaintances to/from which at least 1 mapping rule exists • Goal: Given a query, exploit mapping acquaintances as much a possible to enhance query responses. ?= Hospital(x, “San Francisco”) data Schema A _____ _____ B Mb,a A Schema B _____ _____ data *Joint work with A. Bonifati, G. Summa, P. Valduriez, to appear as Inria report Concepts Hospital($X, “San Francisco”) HealtCareInst($X, “San Francisco”, $Z) ?= Q B Schema _____ _____ data Mb,a ALONG Source Hospital [0..*] name location Grant [0..*] amount istitution manager Doctor [0..*] name salary A Schema _____ _____ data atoms Target HealthCareInst [0..*] name city id Grant [0..*] amount scientist MAPPING RULE Mb,a Hospital(x, y) ⇢ HealthCareInst(x, y, z) BODY HEAD Mapping Relevance • Each time a query gets translated by exploiting a mapping we got a Relevant Rewriting • The relevance can be Forward (along) or Backward (against) depending on the matched side of the mapping • Goal: – Collect as many rewriting as possible – Find the most intresting paths to take (avoid useless paths) ?= Hospital(x, “San Francisco”) M1 Hospital(x, y) ⇢ HealthCareInst(x, y, z) M2 Institution(x, y, z) ⇢ Hospital(x, y) Problem ?= Q B Mb,a A ALONG Mc,a AGAINST D 1) How to choose the most relevant paths to undertake in the reformulation task? 2) Are there other peers in the network which can be contacted? C Acquaintances • Gossiping acquaintances – Potential friends that dynamically appears in the local semantic view (LSV) • Mapping acquaintance – There is at least 1 direct mapping towards it (friend) – Established manually • Social acquaintance (FOAF friend) – No direct mapping is needed towards it – There are some common interests – Established explicitly Our Approach • Gossip to disseminate mapping rules information to find friends • Users topics of intrest – are expressed according to the schema information or past queries topics • Measure to – Compute the relevance of a mapping wrt to a query – Compute similarity between users • Exploits recursively (to translate a query) – Mapping acquaintances – Social acquaintances Gossiping Acquaintances Social Acquaintances • Friend – Share common topics of – interests • Interests – Formulated by queries – Elements of peer’s schema • Approach: use the semantic view to discover friends ?= Hospital(x, “San Francisco”) ?= State( y, z, “California”) ?= Doctor( w, k) ?= Patology(“heart”, x) ……… Schema _____ _____ Compute Relevance Goal: Given an Query and a mapping rule, determine if the mapping is relevant to the query Method (Standard Match Semantics) – Atom Label matching – Parameters compatibility ?= Hospital(x, “San Francisco”) M1 Hospital(x, y) AND State (x,z) ⇢ HealthCareInst(x, y, z) M2 Hospital(x, y,w) AND State (x,z) ⇢ HealthCareInst(x, y, z) M3 Ospedale(x,y) AND State (x,z) ⇢ HealthCareInst(x, y, z) Compute Relevance • AF-IMF Measure, inspired by TF-IDF* • AF (Atom Frequency) – Local measure, establishing the importance of the query atom in the current mapping • IMF (Inverse Mapping Frequency) – Distributed measure, establishing the overall importance of the query atom • Relevance of a mapping wrt to q is AF * IMF *term frequency-inverse document frequency Compute Relevance (AF)  About the applied measure ◦ To increase the effectiveness of the measure we distinguish, again, Forward/Backward relevance FORWARD MEASURE body BACKWARD MEASURE head ?= Hospital(x, “San Francisco”) M1 Hospital(x, y) AND State (x,z) ⇢ HealthCareInst(x, y, z) M2 Institution(x, y, z) ⇢ Hospital(x, y) Compute Relevance (IMF) • IMF requires a way to get a value for – The total number of mappings – The total number of mappings containing that atom • To do that, we can inspect the semantic view of the peer – Also by sending inquiries to peers in the FOAF Translate-Query • Compute Relevance on Local Mappings wrt Q – Choose the TopK Mappings – Apply the translation semantics, along/against the mapping direction – Trigger Translate-Query on the mapping acquaintance, recursively (until TTL) • Select FOAF friends to be contacted – By looking at the best Mapping summaries wrt Q – Trigger query Translate-Query on the social acquaintance, recursively (until TTL) Performance Evaluation Baseline – No gossiping, original query propagated • Baseline+ – No gossiping, translated query propagated • Baseline# – No gossiping, translated query propagated, local measure to sort mappings (by using AF only) • Full– Gossiping, translated query propagated, AF-IMF measure to sort mappings, no FOAF links (only local mappings) • Full (P2PRec) – Gossiping, translated query propagated, AF-IMF measure to sort mappings, FOAF links exploited 100.0% Baseline Baseline+ Baseline# Full- Full 90.0% 80.0% 70.0% 60.0% Recall • Recall 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% TopK Mapping Threshold Effectiveness of AF-IMF, LSV and gossiping Conclusions P2P Query Reformulation – Gossiping is used to disseminated mappings rules information – Exploits recursively relevant mappings • Mapping acquaintances • Social acquaintances – Initial Perf. Resuts: • Very good recall results (over 90%) • Linear scale-up • Trade-off of Recall and Responses Times – Previous work uses • DHTs or a centralized mediation model. About Montpellier Best quality of life in France Important laboratories (LIRMM) and research instituts (INRA, CIRAD, etc) University of Montpellier is part of the « opération campus » Soon we will have a direct TGV line to Barcelona (1 hour)

Decentralized Recommendation Protocols for File Sharing

Related documents

Products

Support

Decentralized Recommendation Protocols for File Sharing

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib