Social Ranking: Uncovering Relevant Content Us ing T ag-b as ed Recom m end er Sy s tem s Valentina Zanardi L ic ia C ap ra D ep t. o f C o m p u ter S c ienc e U niv ers ity C o lleg e L o ndo n G o w er S treet, L o ndo n W C 1 E 6 B T , U K D ep t. o f C o m p u ter S c ienc e U niv ers ity C o lleg e L o ndo n G o w er S treet, L o ndo n W C 1 E 6 B T , U K V.Zanardi@ c s .u c l.ac .u k L .C ap ra@ c s .u c l.ac .u k ABSTRACT 1. Social (or folksonomic) tagging has become a very popular w ay to d escribe, categorise, search, d iscover and navigate content w ithin W eb 2 .0 w ebsites. U nlike tax onomies, w hich overimpose a hierarchical categorisation of content, folksonomies empow er end users by enabling them to freely create and choose the categories (in this case, tags) that best d escribe some content. H ow ever, as tags are informally d efi ned , continually changing, and ungoverned , social tagging has often been criticised for low ering, rather than increasing, the effi ciency of searching, d ue to the number of synonyms, homonyms, polysemy, as w ell as the heterogeneity of users and the noise they introd uce. In this paper, w e propose Soc ia l R a n k in g , a method that ex ploits recommend er system techniq ues to increase the effi ciency of searches w ithin W eb 2 .0 . W e measure users’ similarity based on their past tag activity. W e infer tags’ relationships based on their association to content. W e then propose a mechanism to answ er a user’s q uery that ranks (recommend s) content based on the inferred semantic d istance of the q uery to the tags associated to such content, w eighted by the similarity of the q uerying user to the users w ho created those tags. A thorough evaluation cond ucted on the C iteU L ike d ataset d emonstrates that Social R anking neatly improves coverage, w hile not compromising on accuracy. T he ad vent of W eb 2 .0 has transformed users from passive consumers to active prod ucers of content. T his has tremend ously increased the amount of information that is available to users (from vid eos on sites like Y ouT ube and M ySpace, to pictures on F lickr, to music on L ast.fm, and so on). T his content is no longer categorised accord ing to pre-d efi ned tax onomies. R ather, a new trend called soc ia l (or folk son om ic ) ta g g in g has emerged and q uickly become the most popular w ay to d escribe, categorise, search, d iscover and navigate content w ithin W eb 2 .0 w ebsites. U nlike tax onomies, w hich overimpose a hierarchical categorisation of content, folksonomies empow er end users by enabling them to p e rson a lly and free ly create and choose the categories (in this case, tags) that best d escribe a piece of information (a picture, a blog entry, a vid eo clip, etc.). T ag cloud s are then w id ely used to visualise a set of related tags that best d escribe either ind ivid ual items or the content of a w ebsite as a w hole, w ith the most freq uently used tags being given more importance either in font siz e or color. O ther visualisation techniq ues have been stud ied , in ord er to give more importance to tags’ relationships rather than popularity [4 , 1 1 ]. W hen users w ant to fi nd content, they navigate, via hyperlinks, from a tag to a collection of items that are associated w ith that tag. H ow ever, as tags are informally d efi ned , continually changing, and ungoverned , social tagging has often been criticiz ed for low ering, rather than increasing, the effi ciency of searching [2 ]. T his is d ue to the number of synonyms, homonyms, polysemy, as w ell as the heterogeneity of users, contex ts, and the noise that they introd uce. In ord er to ‘connect’ users w ith content that they d eem relevant w ith respect to their interests, effi cient searching techniq ues have to be d eveloped for this novel and uniq ue d omain. B y effi cient, w e mean that the searching techniq ue should be both a cc u ra te (i.e., the returned content d oes satisfy users’ interests), and com p le te (i.e., if there is relevant content in the system, this should be found ). In this paper, w e propose a techniq ue called Soc ia l R a n k in g that aims to effi ciently fi nd , w ithin a potentially huge d ataset, content that is relevant to a user’s q uery. In typical W eb 2 .0 fashion, w e assume such content to have been d escribed w ith an arbitrary number of tags and by an arbitrary number of users. Social R anking answ ers a user’s q uery by ex ploiting trad itional recommend er system techniq ues (Section 2 ): it measures users’ similarity based on their past tag activity; it infers tags’ relationships based on their association to content; fi nally, it ranks (recommend ) content based Ca te g o r ie s a n d Su b je c t D e s c r ip to r s H .3 .3 [Information Search and Retrieval]: Information fi ltering; H .3 .3 [Information Search and Retrieval]: Q uery formulation; H .3 .5 [O nline Information Services]: W ebbased services G e n e r a l Te r m s A lgorithms, P erformance K eyw ord s T ags, Similarity, W eb 2 .0 , R ecommend er Systems Permission to make digital or hard copies of all or part of this work for personal or classroom u se is granted withou t fee prov ided that copies are not made or distrib u ted for profi t or commercial adv antage and that copies b ear this notice and the fu ll citation on the fi rst page. T o copy otherwise, to repu b lish, to post on serv ers or to redistrib u te to lists, req u ires prior specifi c permission and/or a fee. RecSys’08, O ctob er 2 3 – 2 5 , 2 0 0 8 , L au sanne, S witz erland. C opy right 2 0 0 8 A C M 9 7 8 -1 -6 0 5 5 8 -0 9 3 -7 /0 8 /1 0 ...$ 5 .0 0 . I N TRO D U CTI O N on the inferred distance of the query to the tags associated to such content, w eighted b y the sim ilarity of the querying user to the users w ho created those tags. W e p resent the results of an ex tensiv e ex p erim ental study w e hav e conducted on the C iteU L ik e dataset (http :/ / w w w .citeulik e.org/ ), dem onstrating how S ocial R ank ing neatly im p rov es cov erage, w ithout com p rom ising on accuracy (S ection 3 ). W e p osition ourselv es w ith resp ect to other w ork s in the area in S ection 4 , b efore p resenting our conclusions and future directions of research (S ection 5 ). 2. 2.1 MODEL Da ta s e t A n a ly s is In order to understand the k ey characteristics of the target scenario, and thus dev elop a query m odel that is grounded on its p eculiarities, w e hav e analysed C iteU L ik e, a typ ical W eb 2 .0 w eb site. C iteU L ik e is a social b ook m ark ing w eb site that aim s to p rom ote and dev elop the sharing of scientifi c references am ongst researchers. S im ilarly to the cataloging of w eb p ages w ithin del.icio.us, and of p hotograp hs w ithin F lick r, C iteU L ik e enab les scientists to organiz e their lib raries w ith freely chosen tags w hich p roduce a folk sonom y of academ ic interests. C iteU L ik e runs a daily p rocess w hich p roduces a snap shot sum m ary of w hat articles hav e b een p osted b y w hom and w ith w hat tags. W e dow nloaded one such archiv e in D ecem b er 2 0 0 7 . T he archiv e contained roughly 2 8 ,0 0 0 users, w ho had tagged 8 2 0 ,0 0 0 p ap ers ov erall, using 2 4 0 ,0 0 0 distinct tags. A p re-analysis of the archiv e rev ealed the p resence of a v ast am ount of p ap ers and a v ast am ount of tags b ook m ark ed/ used b y one user only. In order to m ak e the dataset m ore m anageab le, w e p runed it so to rem ov e those p ap ers and tags that had b een b ook m ark ed/ used only once ov er the entire dataset. W e w ere thus left w ith roughly 1 0 0 ,0 0 0 p ap ers, 5 5 ,0 0 0 distinct tags, and 2 8 ,0 0 0 users. W e then analysed this dataset m ore carefully in term s of users’ activ ity, p ap ers’ p op ularity, and tags’ usage. D etailed results are rep orted in a p relim inary v ersion of this p ap er [2 7 ]. W ith resp ect to the p rob lem of fi nding and recom m ending content in W eb 2 .0 w eb sites, the follow ing insights can b e draw n: Long Tail of Tags: a p ow er law distrib ution curv e em erges for tags’ usage, identifying a sm all p ortion of frequently used tags, and a long tail (roughly 7 0 % ) of tags b eing used b y 2 0 users (i.e., 0 .0 8 % of the w hole user set) or less instead. M oreov er, p ap ers w ere describ ed b y no m ore than ten diff erent tags (and usually less then fi v e). T his suggests that fi nding content using standard k eyw ord b ased searches is lik ely to fail, due to em p ty ov erlap s b etw een the tags used in the query and those associated to p ap ers. Long Tail of P ap e rs: a rather steep p ow er law distrib ution curv e em erges for p ap ers p op ularity too, identifying a sm all p ortion of p ap ers b eing b ook m ark ed (and tagged) b y at least fi v e diff erent users, and a huge tail (m ore than 8 5 % ) of p ap ers b eing b ook m ark ed b y less than fi v e users instead (i.e., 0 .0 2 % of the w hole user set). T his suggests that standard recom m ender system s techniques w ould lik ely p erform p oorly in term s of accuracy and cov erage, b ecause of alm ost-em p ty ov erlap s of users’ p rofi les. A content search/ recom m ender technique for W eb 2 .0 w eb sites should thus b e dev elop ed, tak ing into account these intrinsic characteristics of the target scenario. W e found the follow ing tw o p rop erties to b e p rom ising to tack le b oth accuracy and cov erage: C lu ste ring of U se rs for Im p rov e d A c c u rac y : although users v ary a lot in term s of activ ity, ev en the m ost activ e users b ook m ark a rather tiny p ortion of the w hole p ap er set. T his suggests that users hav e clearly defi ned interests that m ap to a sm all p rop ortion of the w hole C iteU L ik e content. T his is confi rm ed b y tags’ usage: each user m asters a sm all sub set of the w hole folk sonom y, and users sharing p art of the folk sonom y form fairly sm all clusters. W e form ulate the hyp othesis that, b y look ing at users’ tag activ ity, users’ sim ila rity can b e quantifi ed and ex p loited to answ er content searches m ore accurately. C lu ste ring of Tags for Im p rov e d C ov e rage : desp ite the em ergence of a rather b road folk sonom y, each p ap er w as describ ed b y just a handful of tags. T his w ould suggest that there is a core of shared k now ledge ab out tags w ithin the com m unities w ho use them , and these are recurrently used to describ e related p ap ers. W e form ulate the hyp othesis that, b y look ing at w hat tags w ere associated to w hat p ap ers, ta g s’ sim ila rity (or, rather, ‘relationship ’) can b e quantifi ed and ex p loited to uncov er relev ant item s during content searches. B ased on these ob serv ations, w e hav e dev elop ed a content search and recom m endation technique called S oc ia l R a nk ing . 2.2 S o c ia l R a n k in g L et us consider a user u w ho is interested in retriev ing som e content of interest (in our sp ecifi c case, p ap ers). U ser u could ex p licitly sub m it a query qu consisting of query tags t1 , t2 , . . . , tn ; alternativ ely, in a m ore typ ical recom m ender system fashion, the system could im p licitly run a query, using the set of tags t1 , t2 , . . . , tn associated b y the user to his latest b ook m ark ed p ap er, or the set of his m ost frequently used tags ov erall, etc. In b oth cases, the system answ ering the query w ould norm ally rank results according to the follow ing tw o criteria: the higher the num b er of query tags associated to the resource, the higher its rank ing; and, the higher the num b er of users ui w ho tagged the resource using (som e of the) query tags, the higher its rank ing. Intuitiv ely sp eak ing, the fi rst criterion caters for accuracy of the result, the second caters for confi dence in it. T he form ula is: X R(p) = (1 ) (# tx used b y ui on p | tx ∈ qu ) , ui that is, the rank ing of p ap er p is com p uted as the num b er of tags tx that users ui w ho b ook m ark ed p used and that b elonged to the query set qu . A s w e shall dem onstrate ex p erim entally in S ection 3 , w hile this sim p le technique w ork s w ell to fi nd p op ular content describ ed w ith p op ular tags, it fails to address queries that look for the v ery long tail of m edium -to-low p op ularity content, as a large am ount of low -score results are returned. A ccuracy is not the only p rob lem : if the user running the query also uses tags that b elong to the long tail of tags, chances are that relev ant content is not found at all, and cov erage then b ecom es the m ost p ressing issue. rce Users Users s Tags Tags u so Re Resources Tags Tags Users Users Tags Figure 1: Transformation of the dataset To address these problems, we propose Social Ranking, a techniq u e inspired by traditional C ollaborativ e F iltering mechanisms [2 2 ]: fi rst, we identify the u sers with similar interests to the q u ery ing u ser u; according to ou r analy sis, su ch commu nity shou ld be easily identifi ed by stu dy ing u sers’ tag activ ity . C ontent tagged by these u sers shou ld be scored higher in a way that is proportional to the q u antifi ed similarity . Second, ev en thou gh tags can be broadly clu stered in domains of knowledge, people tend to u se slightly diff erent su bsets of them within each domain. W e thu s identify the tags that are similar (or, rather, related) to the q u ery tags, thu s ex panding the q u ery to this enlarged set. W e believ e, and ou r ev alu ation will confi rm, that users’ similarity imp ro v es acc urac y of the resu lts, while tag s’ similarity (i.e., q uery ex p an sio n ) imp ro v es co v erag e. In the remainder of this section, we illu strate how we compu te u sers’ similarity (Section 2 .2 .1 ), how we compu te tags’ similarity (Section 2 .2 .2 ), and how we combine these two techniq u es together (Section 2 .2 .3 ). 2.2.1 Users’ Similarity Social tagging ty pically prov ides a 3 -dimensional relationship between u sers, resou rces and tags (u sers bookmark resou rces with a certain nu mber of tags). D iff erent defi nitions of u sers’ similarity can be deriv ed; here we consider a simple y et eff ectiv e one: the more tags two u sers hav e u sed in common, the more similar they are, regardless of what resou rces they u sed it on. This defi nition projects ou r 3 -dimensional space onto a 2 -dimensional one, throwing away information abou t ‘resou rces’, and keeping only information abou t what tags a u ser has u sed and how often (F igu re 1 , top). W hile one may argu e that, in so doing, we discard important information, we believ e that, in scenarios where tags are clu stered arou nd topics, the information lost is not signifi cant. W e thu s describe each u ser ui with a v ector vi where vi [j] cou nts the nu mber of times that u sers ui u sed tag tj . G iv en two u sers ui and uj , we then q u antify u sers’ similarity s im(ui , uj ) as the cosine of the angle between their v ectors: s im(ui , uj ) = c o s (vi , vj ) = vi · vj ||vi || ∗ ||vj || V ariou s similarity measu res can be u sed other than the cosinebased similarity [5 ]. F or ex ample, concordance-based similarity [1 ] cou ld be u sed, so that the more tags two u sers share, the more similar they are (regardless of how many times they hav e u sed them). H owev er, we believ e tag freq u ency to be an important piece of information to determine a u ser’s interests. A lternativ ely , P earson C orrelation (and its v ariations - e.g., weighted P earson [1 9 , 5 ]) cou ld be u sed; as shown in [1 4 ], diff erent similarity measu res perform differently , both in terms of accu racy and cov erage; we chose cosine-based similarity for its constantly good performance, althou gh we plan to stu dy the impact of other similarity measu res in the fu tu re. 2.2.2 T ag s’ Similarity W e defi ne tags’ similarity as follows: the more resou rces hav e been tagged with the same pair of tags, the more similar (related) these tags are, regardless of the u sers who u sed them. This defi nition projects ou r 3 -dimensional space onto a 2 -dimensional one, as shown in F igu re 1 , bottom part. Similarly to what we said before, in scenarios where u sers’ interests are a rather small and consistent su bset of the broader range of topics in the whole website, we believ e that the information thrown away du ring the projection is not signifi cant. W e thu s describe each tag ti with a v ector wi where wi [j] cou nts the nu mber of times that tag ti was associated to paper pj . G iv en two tags ti and tj , we then q u antify tags’ similarity s im(ti , tj ) as the cosine of the angle between their v ectors: wi · wj s im(ti , tj ) = c o s (wi , wj ) = ||wi || ∗ ||wj || 2.2.3 T w o -Step Q u ery M o d el The q u ery model we propose ex ploits the two similarity measu res discu ssed abov e (on u sers and on tags) in the following way . W hen u ser u su bmits a q u ery qu = {t1 , t2 , . . . , tn } to discov er content that can be described by q u ery tags t1 , t2 , . . . , tn , two steps take place: 1 . Q uery E x p ansion: the set of q u ery tags qu is ex panded so to inclu de, besides {ti | ti ∈ qu } (for which s im(ti , ti ) = 1 ), those tags tn+1 , . . . , tn+m that are deemed most similar to the q u ery tags (for which 0 < s im(ti , tj ) ≤ 1 , with i ∈ [1 , n] and j ∈ [n + 1 , n + m]). This set, which we call q ∗ , is constru cted so to inclu de, for each ti ∈ qu , its top k most similar tags, in a fashion similar to the top k N earest N eighbou r (kN N ) strategy in recommender sy stems. A thorou gh analy sis of the impact of k on both accu racy and cov erage will be presented in Section 3 . 2 . R ank ing: all resou rces that hav e been tagged with at least one tag from the ex tended q u ery set are retriev ed. Their ranking depends on a combination of: the relev ance of the tags associated to the paper with respect to the q u ery tags (papers tagged with ti , i ∈ [1 , n] shou ld cou nt more than those tagged with tj , j ∈ [n + 1 , n + m]); and, the similarity of the taggers with respect to the q u ery ing u ser u (papers tagged by similar u sers shou ld be ranked higher, as these u sers are more likely to share interests with u than others, and thu s are in a better position to recommend relev ant content). The ranking of a paper p wou ld then be compu ted as: R(p) = X ui X {tx |ui tagged p w ith tx }, tj ∈q ∗ s im(tx , tj ) ∗(s im(u, ui )+1 ) (2 ) 3. EVALUATION W e hav e thoroug hly an aly sed the p erform an ce of S ocial R an k in g on the C iteU L ik e d ataset, b oth in term s of accuracy an d cov erag e (S ection 3 .3 ). B efore d iscussin g these results, we b riefl y illustrate the p ortion of the d ataset we hav e b een ex p erim en tin g with (S ection 3 .1 ), an d d escrib e how we hav e con d ucted the ex p erim en ts (S ection 3 .2 ). 3.1 Th e D a ta s e t B ased on our p re-an aly sis of the C iteU L ik e d ataset (S ection 2 .1 ), we hav e p erform ed a cut, in ord er to ob tain a sm all y et m ean in g ful sub set to ex p erim en t with. In p articular, we hav e con sid ered on ly those tag s that hav e b een used on at least 1 5 d iff eren t p ap ers, an d b y at least 2 0 users. T his has left us with a d ataset con sistin g of roug hly 1 2 ,0 0 0 users, 8 3 ,0 0 0 p ap ers, an d 1 6 ,0 0 0 tag s. N ote that the lon g tail p hen om en on still v astly d om in ates in the p run ed d ataset: Long tail of users’ similarity: as shown in F ig ure 2 , the v ast m ajority of users’ p airs hav e v ery low v alue of sim ilarity (b elow 0 .1 ), while there ex ists a lon g tail of hig her sim ilarity p airs. T his would sug g est users are hig hly focused (an d clustered ) aroun d top ics, an d thus on ly a relativ ely sm all p ortion of users are in d eed g ood recom m en d ers to each other. Long tail of tags’ similarity: as shown in F ig ure 3 , each tag is related to on ly a v ery sm all sub set of other tag s, ag ain sug g estin g that on ly a relativ ely sm all p ortion of tag s are used (an d thus n eed to b e learn ed ) to d escrib e sp ecifi c categ ories of con ten t. W e b eliev e that the results we are g oin g to p resen t in this section g en erally hold for d atasets that ex hib it sim ilar characteristics. 3.2 S im u la tio n S e tu p In ord er to q uan tify accuracy an d cov erag e of S ocial R an k in g , we hav e con d ucted the followin g b asic ex p erim en t: we Pairs of Users 2500000 2024024 2000000 1500000 1000000 500000 146978 54481 26263 18509 12715 6464 7295 4792 5832 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Cosine Similarity F igure 2 : D istrib ution of users’ similarity Pairs of Tags P where, for each user ui who tag g ed p, s im (tx , tj ) q uan tifi es how relev an t the tag s tx associated b y ui to p are with resp ect to the tag s tj b elon g in g to the ex p an d ed q uery set q ∗ ; n ote that, in the b asic case of form ula 1 , this sim p ly m ean t coun tin g how m an y tag s from qu user ui associated to p. M oreov er, the relev an ce is then m ag n ifi ed (i.e., p ap ers are p ushed hig her up in the ran k in g ) in a way that is p rop ortion al to user’s sim ilarity s im (u, ui ). A ssum in g that users’ sim ilarity s im (ui , uj ) an d tag s’ sim ilarity s im (ti , tj ) are com p uted offl in e (i.e., d aily , week ly , etc.), then the com p lex ity of an swerin g a q uery con tain in g T tag s is O(k · T · P · N ), where P is the n um b er of p ap ers in the sy stem an d N is the n um b er of users. H owev er, this is a g ross ov erestim ation : as our d ataset p re-an aly sis has shown , each tag is used on av erag e on at m ost 4 0 p ap ers (with 4 0 < < P ), an d each p ap er has b een tag g ed on av erag e b y less than 5 users (with 5 < < N ), so that the tim e to an swer a q uery is sim p ly p rop ortion al to the n um b er of tag s in the ex p an d ed q uery set (i.e., k · T ). W e call this ap p roach S ocial R an k in g , as it ex p loits in form ation com in g from the em erg en t social n etwork of users an d social n etwork of tag s to ran k con ten t in a way that is m ean in g ful to the q uery in g user. In the n ex t section , we p resen t the results ob tain ed when ev aluatin g this ap p roach. 5000000 4000000 3000000 2000000 1000000 0 4756729 359087 110865 48145 25928 15048 7863 6546 4779 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 2188 1 Cosine Similarity F igure 3 : D istrib ution of tags’ similarity p ick ed up a user u, “ hid ” on e of his b ook m ark ed p ap ers p as well as the tag s that u had associated to p; we then p erform ed a q uery q with such tag s. S in ce p was b ook m ark ed b y u (b efore we hid it), u is ob v iously in terested in it, so a recom m en d er sy stem should b e ab le to return p (cov erag e). N ote that, in our p run ed d ataset, it was alway s the case that, ev en after hid in g u’s b ook m ark for p, at least an other b ook m ark m ad e b y a user u0 for p ex isted , as we on ly k ep t in the d ataset those p ap ers that had b een b ook m ark ed b y m ore than on e user; it should thus b e p ossib le, in p rin cip le, to locate an d return p. M oreov er, the hig hest the ran k in g of p in the list of return ed p ap ers (i.e., the closest to the top ), the b etter the accuracy of the ran k in g alg orithm . G iv en the hig h v ariab ility of users’ b ehav iour an d p ap ers’ p op ularity in the d ataset, we hav e id en tifi ed 6 d iff eren t categ ories of ex p erim en ts, b ased on : - the lev el of activ ity of the q uery in g user, d istin g uishin g heav y tag g ers H T (users who tag g ed m ore than 5 0 p ap ers), m ed ium tag g ers M T (users who tag g ed b etween 1 0 an d 5 0 p ap ers), an d low tag g ers L T (users who tag g ed less than 1 0 p ap ers); - the lev el of p op ularity of the hid d en b ook m ark , d istin g uishin g p op ular p ap ers P P (those that had b een b ook m ark ed b y at least 5 users), an d un p op ular on es U P (those that had b een b ook m ark ed b y less than 5 users). F or each user in each g roup (heav y / m ed ium / low tag g ers), three b ook m ark s where chosen at ran d om within each p ap er categ ory (p op ular/ un p op ular), hid d en an d their corresp on d in g tag s searched . S in ce the n um b er of users in each g roup v aries, so d oes the total n um b er of q ueries p erform ed (from 1 ,8 0 0 for the sm all g roup of H T / P P , to 1 3 ,1 0 0 for the m uch larg er g roup of L T / U P ). R esults are rep orted for each categ ory . In all ex p erim en ts, we com p are the outp ut of our S ocial R an k in g alg orithm (form ula 2 ) with the sim p le b en chm ark p resen ted in S ection 2 .2 (form ula 1 ). H e a vy t a gge r s, popu la r pa pe r s D iffe r e n ce of posit ion s 150 100 454 50 1032 0 -50 -100 200 D iffe r e n ce of posit ion s D iffe r e n ce of posit ion s 150 200 100 50 2750 1012 0 -50 -100 -150 -150 100 50 -100 -150 N u m be r of qu e r ie s ( t ot a l u n cove r e d: 3 3 6 4 ) (a) N u m be r of qu e r ie s ( t ot a l u n cove r e d: 3 5 2 4 ) (b) H e a vy t a gge r s, u n popu la r pa pe r s 50 1536 0 -50 -100 -150 -200 -250 Low t a gge r s, u n popu la r pa pe r s 150 D iffe r e n ce of posit ion s 100 D iffe r e n ce of posit ion s 150 100 1174 (c) M e diu m t a gge r s, u n popu la r pa pe r s 150 50 2414 2954 0 -50 -100 -150 -200 -250 100 50 5092 -50 -100 -150 -200 N u m be r of qu e r ie s ( t ot a l u n cove r e d: 3 7 2 2 ) (d) N u m be r of qu e r ie s ( t ot a l u n cove r e d: 7 9 2 9 ) (e) H e a vy t a gge r s, u n popu la r pa pe r s (f) Socialranking Socialranking M e diu m t a gge r s, u n popu la r pa pe r s Basic Socialranking Low t a gge r s, u n popu la r pa pe r s Basic 100 80 80 80 40 60 40 20 20 0 0 10 25 50 100 200 > 200 Pe r ce n t a ge 100 Pe r ce n t a ge 100 60 6245 0 -250 N u m be r of qu e r ie s ( t ot a l u n cove r e d: 1 8 0 9 ) 5 2971 1084 0 -50 -250 -200 N u m be r of qu e r ie s ( t ot a l u n cove r e d: 1 5 5 9 ) Pe r ce n t a ge 150 -200 -200 -250 D iffe r e n ce of posit ion s Low t a gge r s, popu la r pa pe r s M e diu m t a gge r s, popu la r pa pe r s 250 Basic 60 40 20 0 5 10 25 Posit ion (g) 50 100 Posit ion 200 > 200 5 10 25 50 100 200 > 200 Posit ion (h) (i) Figure 4: Social Ranking (without query expansion) vs. Basic Model 3.3 3.3.1 Results Impact of Users’ Similarity on Accuracy The first set of experiments we conducted aimed to analy se the impact of users’ similarity alone on the rank ing of results. W e thus compared the b asic q uery model with the adv anced q uery model where tag expansion had b een disab led. F or each q uery , the list of returned papers is thus the same (i.e., the search happens using the same q uery tag s), b ut ordered diff erently (i.e., users’ similarity in S ocial R ank ing causes reshuffl es). F or each q uery that uncov ered the hidden paper, we hav e computed the position of such paper in the rank ed list of results produced b y S ocial R ank ing minus its position in the rank ed list of results produced b y the b asic model: the lower the diff erence, the b etter the accuracy of S ocial R ank ing , and v icev ersa. F ig ure 4 plots the results, sorted b y the measured diff erence, for all six categ ories (first row for p o p ula r p a p e rs and second row for unp o p ula r p a p e rs); the two x v alues hig hlig hted in each chart represent the first and last q uery for which the two approaches perform the same (i.e., the diff erence in rank ing is z ero). A s shown, the rank ing of results is slig htly b etter when using the b asic q uery model in the first scenario: when focusing on mainstream content (i.e., the hidden paper has b een tag g ed many times b y diff erent users), simple searches b ased on exact tag matching work well enoug h. H owev er, in all other scenarios, the adv anced q uery model outperforms the b asic one (i.e., it returns the hidden paper at a hig her rank ing in the v ast majority of cases). The improv ement is more dramatic when considering unpopular papers (second row), thus confirming the importance of weig hting the recommendations coming from similar users more, when look ing for less ‘mainstream’ content. If we tak e a closer look at the ‘unpopular papers’ set of results, we can notice that, on heav y tag g ers (F ig ure 4 (d)), 2 5 % of the hidden papers are returned at positions that are b etween 1 0 and 2 0 5 positions b etter using S ocial R ank ing than when using the b asic model, ag ainst only 7 % of cases where the b asic model result is b etter rank ed (b etween 1 0 and 1 3 0 positions g ap); on medium tag g ers (F ig ure 4 (e)), 3 1 % of results are b etter rank ed (with a g ap b etween 1 0 and 2 4 2 ) ag ainst 8 % (with g ap [1 0 , 1 4 4 ]); finally , on low tag g ers (F ig ure 4 (f)), the ratio is 2 8 % ag ainst 8 % , and similar rank ing g ap. In order to b etter appreciate the improv ement ob tained in terms of accuracy , we hav e also plotted the cumulativ e distrib ution of the rank ing of the “ hidden” papers, using the adv anced model without q uery expansion and the b asic model, for unpopular papers. F ig ures 4 (g ) (h) and (i) (third row) illustrate the results: as shown, S ocial R ank ing neatly improv es the ab solute rank ing of the hidden paper, and it does so more ev idently for heav y and medium tag g ers, that is, for users whose similarity can b e b etter assessed thank s to their activ ity within the sy stem. F or example, ab out 3 0 % of the hidden papers are found in the top 5 positions using S ocial R ank ing on heav y tag g ers, ag ainst 2 0 % using the b asic model. This first set of experiments thus demonstrates our hy pothesis that users’ similarity is eff ectiv ely exploited b y S ocial R ank ing to improv e accuracy , and this is particularly important when try ing to dig out unpopular content. L et us now focus on cov erag e. The column lab eled ‘k = 0 ’ in Tab le 1 summarises the percentag e of papers that remained hidden when tag expansion was not used. A s shown, Category H T /P P M T /P P LT /P P H T /U P M T /U P LT /U P Q u eries 1882 4 09 4 4171 2 4 00 5835 13130 k=0 17% 18% 16% 24% 36% 4 0% k=5 8% 8% 8% 14% 22% 26% N ot F ou n d k = 10 k = 20 6% 4% 6% 5% 6% 4% 12% 9% 17% 14% 23% 18% k = 50 2% 2% 3% 5% 8% 13% Table 1: Percentage of queries remaining hidden. this percentage is approximately 16-18% for popular papers, and it q uik cly increases up to 4 0 % for unpopular ones. G iv en that all papers in our d ataset hav e b een b ook mark ed b y more than one user, low cov erage is an ind ication that differen t u sers bo o k m a rk th e sa m e reso u rces differen tly . S earching techniq ues b ased on user-specifi ed q uery-tags only are thus unab le to uncov er unpopular yet relev ant resources; in the next section, w e d emonstrate how cov erage can b e improv ed b y expand ing user-d efi ned q uery tags to includ e semantically related ones. 3.3.2 Impact of Tags’ Similarity on Coverage T he second set of experiments w e hav e cond ucted aimed at comparing the full S ocial R ank ing mod el against the b asic one. D uring q uery expansion, S ocial R ank ing extend s each q uery tag w ith the top kN N tags. W e hav e b een experimenting w ith d iff erent v alues of k = 5 , 10 , 2 0 , 5 0 ; w e hav e b een measuring the impact of the full mod el on b oth accuracy and cov erage. O ur goal w as to neatly improv e cov erage, especially w hen d ealing w ith unpopular content, w ithout sev erely impacting on accuracy. T ab le 1 reports the percentage of q ueries for w hich the target paper still remained hid d en, across all v alues of k (includ ing k = 0 , that is, no q uery expansion). A s show n, ev en small v alues of k cause the numb er of not-found items to q uick ly d rop. F or example, w hen k = 5 , the numb er of u n p o p u la r items not found falls from 2 4 % for heav y taggers, 3 6% for med ium taggers, and 4 0 % for low taggers, do w n to 14 %, 2 2 %, and 2 6% for the three users’ categories respectiv ely. F or k = 10 , there is an av erage 5 0 % red uction of not-found items, w ith respect to the case of no q uery expansion (k = 0 ). C ov erage k eeps improv ing, although less d ramatically, for higher v alues of k. In ord er to assess the impact of q uery expansion on accuracy, w e report tw o separate sets of results. F or those q ueries that w ere uncov ered b y b oth S ocial rank ing and the b asic q uery mod el, w e hav e computed the p ercen tiles of the rank ing of the “ hid d en” paper. T ab le 2 show s results across all 6 scenarios (S ocial R ank ing positions on the left of each cell, and b asic q uery mod el positions on the right). W e only report results for k = 10 and k = 2 0 for space reasons. W hen b oth approaches uncov er a paper, accuracy is comparab le: for instance, 10 % of the hid d en papers are found in the top 5 positions; half of the hid d en papers (5 0 th percentile) are found in the top 10 positions in the case of popular papers, and in the top 4 0 positions in the case of unpopular papers, b y b oth approaches. T his confi rms that the improv ement ob tained on cov erage v ia q uery expansion d oes not compromise accuracy for v alues of k up to 2 0 ; this is aligned w ith our pre-analysis of the d ataset, w hich rev ealed that the v ast majority of papers w ere tagged w ith no more than 10 d ifferent tags: increasing k much b eyond that v alue increases noise instead (w ith only a small improv ement on cov erage, as confi rmed b y T ab le 1). F inally, for the set of q ueries uncov ered b y S ocial R ank ing only, w e hav e computed the cumulativ e d istrib ution of their rank ing. O nce again, for space concerns, w e only d isplay the results for the critical case of u n p o p u la r p a p ers and for k = 10 (F igure 5 ). A s the charts illustrate, more than 4 0 % of the papers that could not b e found using the b asic mod el, are now returned in the top 10 0 positions (and b etw een 2 0 % and 3 0 % of them in the top 5 0 ). T his second set of experiments thus d emonstrates that tags’ similarity can ind eed b e exploited , not just to uncov er relev ant content, b ut also to recommend it highly, so to b ring it to the attention of the end user. 4. RELATED WORK R esearch in the area of social tagging has proliferated in recent years, d ue to the increasing popularity of such systems. S tud ies hav e b een cond ucted b oth to und erstand tag usage and ev olution (e.g., [2 3 , 3 ]), and to learn and exploit their hid d en semantics. In [7 ], a large stud y of social tagging on the popular d el.icio.us b ook mark ing system is presented , aimed at characteriz ing users’ activ ity, pages’ popularity, and tags’ d istrib ution; the k now led ge b ase (in this case, the w hole W eb ) is so large and d ynamic that the authors are q uite pessimistic on the b enefi ts that social b ook mark ing can b ring to w eb searches. In [6], the same authors hav e show n how searches on d el.icio.us can b e improv ed if a nav igab le hierarchical taxonomy of tags is d eriv ed from tag usage, to help users b road ening/ narrow ing the set of tags that b est d escrib e their interests. O ur approach tak es a d iff erent stance, and rather than off ering users an organised tag nav igation system, it aims to transparently improv e users’ searches b ased on emergent tags semantics and q uery expansion. In [18], tags are related b ack to a fi xed ontology of concepts, thus exploiting b oth techniq ues to enhance information retriev al capab ilities. D iff erently from this approach, our goal is to autonomically d eriv e tags’ relationships, w hich can then b e fi tted into an eff ectiv e q uery search algorithm, w ithout relying on a prefi xed ontology. In [2 0 ], semantics that specifi cally relate to places and ev ents are inferred for resources w ithin the F lick r d ataset; their approach is highly tied to location information, and thus not easily generaliz ab le to other d omains. In [2 5 ], a prob ab ilistic generativ e mod el is proposed to d escrib e users’ annotation b ehav iour, and to automatically d eriv e tags emergent semantics; d uring searches, their approach is capab le of grouping together synonymous tags, w hile it calls for user’s interv ention w hen highly amb iguous tags are found . V ery early w ork , b ut w ith similar goals, is presented in [2 6], w here a simpler techniq ue, b ased on an analysis of the relationship b etw een users, tags and resources, is proposed to d isamb iguate tags. T ag systems hav e recently rev ealed their susceptib ility to tag spam, that is, malicious annotations generated to confuse users. T he prob lem has b een w ell analysed in [13 ], w here the authors tried to id entify misused tags, and q uantify the extent to w hat tagging systems are rob ust against spam. R ob ust solutions to tag spamming are still b eing inv estigated . R esearch has b een v ery activ e also in relating tag activ ity to users, in ord er to d iscov er their interests and conseq uently users’ communities. W ork w ithin the S emantic W eb d omain has tried to classify users into categories and d escrib e the k ey features of such categories [15 ]. M ore recently, users Category 5 1|1 1|1 1|1 2|2 3|2 2|2 H T /P P M T /P P LT /P P H T /U P M T /U P LT /U P 10 1|1 1|1 1|1 3|3 5|4 4|4 P erc en tiles (k = 10 ) 25 50 75 2|2 6|5 27 | 23 1|1 4|3 15 | 13 1|1 3|3 12 | 10 10 | 7 35 | 25 80 | 67 12 | 9 31 | 27 76 | 67 9|8 26 | 23 71 | 61 95 10 2 | 88 70 | 64 82 | 71 186 | 162 20 7 | 170 20 9 | 169 5 1|1 1|1 1|1 2|2 3|2 2|2 10 1|1 1|1 1|1 4|3 5|4 4|3 P erc en tiles (k = 20 ) 25 50 75 2|2 7|5 29 | 23 2|1 5|3 17 | 13 1|1 4|3 15 | 10 12 | 7 39 | 25 86 | 67 13 | 9 35 | 27 85 | 67 10 | 8 30 | 23 80 | 61 95 112 | 88 80 | 64 88 | 71 215 | 162 245 | 170 257 | 169 Table 2: Percentiles of the ranking of results, for Social Ranking vs. Basic Model (a) (b) 100 80 80 80 60 40 Pe r ce n t a ge 100 Pe r ce n t a ge Pe r ce n t a ge (c) Low t a gge r s, u n popu la r pa pe r s ( UN H I D D EN ) M e diu m t a gge r s, u n popu la r pa pe r s ( UN H I D D EN ) H e a vy t a gge r s, u n popu la r pa pe r s ( UN H I D D EN ) 100 60 40 20 20 5 10 25 50 100 200 > 200 40 20 0 0 60 0 5 10 25 Posit ion 50 100 200 > 200 5 10 25 50 100 200 > 200 Posit ion Posit ion F igure 5 : A ccuracy of q ueries uncovered by Social Ranking (k = 10) h a v e b e e n c la ssifi e d a c c o rd in g to th e ir e x p lic itly sta te d p ro fi le [9 ], b a se d o n a p ro b a b ilistic m o d e l w h ich ta k e s in to a c c o u n t u se rs’s in te re st to to p ic s [2 8 ], a n d b a se d o n th e ir le v e l o f ta g g in g a c tiv ity a n d b re a d th o f in te re sts [12 ]. In [16 ], u se rs’ c o m m o n in te re sts a re d isc o v e re d b a se d o n p a tte rn s o f fre q u e n tly c o -o c c u rrin g ta g s, u sin g a c la ssic a l a sso c ia tio n ru le a lg o rith m , w h ich h o w e v e r d o e s n o t ta k e in to a c c o u n t c o n sid e ra tio n s a b o u t u se r’s a c tiv ity . A ll th e se w o rk s, in c lu d in g o u r a tte m p t to fi n d sim ila r u se rs, a re b a se d o n th e o b se rv a tio n th a t re a l w o rld n e tw o rk s e x h ib it a so -c a lle d c o m m u n ity stru c tu re [2 1]; d e fi n in g th e se t o f ch a ra c te ristic s th a t w o u ld e n a b le th e b e st fi ttin g a n d n a tu ra l c lu ste rin g o f ta g g e rs a n d ta g s is a n o p e n re se a rch q u e stio n . In th is p a p e r, w e h a v e b e e n c o m b in in g th e tw o re se a rch stre a m s h ig h lig h te d a b o v e (i.e ., a u to m a tic le a rn in g o f ta g se m a n tic s a n d u se rs’ in te re sts) in o rd e r to im p ro v e q u e ry se a rch e s a n d ra n k in g . O th e r re se a rch g ro u p s h a v e b e e n c o n d u c tin g re se a rch in th e sa m e a re a . In [2 4 , 17 ], th e in te g ra tio n o f ta g in fo rm a tio n w ith in sta n d a rd re c o m m e n d e r sy ste m ’s a lg o rith m s h a s b e e n p ro p o se d , in o rd e r to g iv e b e tte r re c o m m e n d a tio n s to u se rs; a lth o u g h v e ry p ro m isin g , a t p re se n t su ch w o rk s d o n o t ta k e in to a c c o u n t th e ‘a c tiv ity ’ o f u se rs, in te rm s o f a m o u n t o f re so u rc e s b e in g ta g g e d , a n d n u m b e r o f ta g s b e in g u se d . W e b e lie v e th is in fo rm a tio n to b e c ru c ia l to e x tra c t u se rs’ in te re sts a n d th u s im p ro v e th e e ffi c ie n c y o f se a rch e s. T a g a c tiv ity h a s b e e n c o m b in e d w ith a P a g e R a n k -lik e a lg o rith m , in o rd e r to im p ro v e th e ra n k in g m e ch a n ism , in situ a tio n s w h e re re so u rc e s a re n o t lin k e d to g e th e r a s in a ty p ic a l w e b g ra p h stru c tu re [8 ]; th e ir a p p ro a ch , c a lle d F o lk R a n k , p ro v id e s g o o d re su lts w h e n q u e ry in g th e fo lk so n o m y fo r to p ic a lly re la te d e le m e n ts, w h ile it is e a sily su b v e rte d if le ss re la te d / p o p u la r ta g s a re b e in g u se d , d u e to th e siz e a n d sp a rsity o f fo lk so n o m ie s o n th e w e b . In [10], u se rs’ sim ila rity is e x p lo ite d fi rst to g e n e ra te a se t o f ta g s o f re le v a n c e to th e ta rg e t u se r, th e n to re c o m m e n d h im ite m s d e sc rib e d b y su ch ta g s; a s fo r F o lk R a n k , th is a p p ro a ch is m o stly ta ilo re d to sc e n a rio s w ith h ig h u se rs’ a c tiv ity a n d lo w ta g n o ise . S o c ia l R a n k in g fo c u se s o n sc e n a rio s w h e re th e re is o n ly little in fo rm a tio n sh a re d b e tw e e n u se rs in ste a d . In th e se se ttin g s, w e h a v e d e m o n stra te d h o w a c o m b in a tio n o f u se rs’ a n d ta g s’ sim ila rity c a n a m e lio ra te th e sp a rsity p ro b le m . F u rth e r im p ro v e m e n ts c o u ld b e a ch ie v e d b y c lu ste rin g u se rs w ith in b e tte r sc o p e d c o m m u n itie s; w e in te n d to e x p lo re th is a sp e c t n e x t. 5. CONCLUSION In th is p a p e r w e h a v e p re se n te d S o c ia l R a n k in g , a te ch n iq u e th a t a im s to im p ro v e c o n te n t se a rch e s in W e b 2 .0 sc e n a rio s, b y e x p lo itin g u se rs’ sim ila rity a n d ta g s’ sim ila rity . T h e fo rm e r is u se d to im p ro v e a c c u ra c y : th e h ig h e r th e sim ila rity b e tw e e n th e q u e ry in g u se r a n d th e u se r th a t h a s b o o k m a rk e d it, th e h ig h e r th e ch a n c e s th a t th e p a p e r is o f re le v a n c e , th u s re d u c in g th e a m o u n t o f u n in te re stin g c o n te n t b e in g p re se n te d to u se rs. T h e la tte r is u se d to im p ro v e c o v e ra g e : b y im p lic itly le a rn in g ta g s’ sim ila rity fro m th e ir u sa g e , a la rg e r a m o u n t o f re le v a n t y e t u n p o p u la r c o n te n t c a n b e u n c o v e re d , th u s a m e lio ra tin g th e p ro b le m o f h e te ro g e n e ity , sp a rsity a n d la ck o f stru c tu re in fo lk so n o m y . O u r o n g o in g w o rk sp a n s d iff e re n t d ire c tio n s. F irst, w e a re re fi n in g th e te ch n iq u e s w e u se to fi n d sim ila r u se rs a n d sim ila r ta g s. F o r th e fo rm e r, w e h a v e sta rte d a n a ly sin g th e im p a c t o f a v a rie ty o f c lu ste rin g te ch n iq u e s to id e n tify c o m m u n itie s o f u se rs w ith sh a re d in te re sts; b e y o n d sim ila rity in th e ta g s’ u sa g e , th e re e x ist o th e r p a ra m e te rs o f re le v a n c e , in c lu d in g le v e l o f a c tiv ity (to d istin g u ish a c tiv e u se rs w h o c o n trib u te to th e k n o w le d g e b a se , fro m p a ssiv e c o n su m e rs), v a rie ty o f ta g s u se d (u n p o p u la r ta g s m a y re v e a l m o re a b o u t a u se r’s in te re sts th a n p o p u la r o n e s), a n d so o n . F o r th e la tte r, w e a re e n rich in g q u e ry e x p a n sio n w ith w o rd s th a t a re se m a n tic a lly c lo se , a s d e fi n e d b y d ic tio n a ry -b a se d a p p ro a ch e s lik e W o rd N e t (h ttp :/ / w o rd n e t.p rin c e to n .e d u / ). F u rth e r e v a lu a tio n is a lso re q u ire d , u sin g d iff e re n t W e b 2 .0 d a ta se ts (e .g ., L a st.fm , B ib so n o m y , d e l.ic io .u s, e tc .), d iff e re n t sim ila rity m e a su re s (e .g ., P e a rso n , c o n c o rd a n c e , e tc .), a n d c o m p a rin g a g a in st le ss n a iv e a p p ro a ch e s (e .g , [8 ]). A cknow ledgm ents. T h e a u th o rs w o u ld lik e to th a n k S o n ia B e n M o k h ta r, F ra n c o R a im o n d i, N e a l L a th ia a n d M a tte o D e ll’A m ic o fo r th e ir c o n tin u o u s h e lp a n d th e u se fu l d isc u ssio n s w h ich le a d to th e p u b lic a tio n o f th is w o rk . 6. REFERENCES [1] A. Agresti. Analysis of Ordinal Categorical Data. J oh n W iley and S ons, 19 8 4 . [2 ] S . G older and B . A. H u b erm an. U sage p atterns of collab orativ e tagging system s. Journal of Information S c ie nce , 3 2 (2 ):19 8 – 2 0 8 , 2 0 0 6 . [3 ] H . H alp in, V . R ob u , and H . S h ep h erd. T h e com p lex dynam ics of collab orativ e tagging. In P roc . of th e 1 6 th Intl. C onfe re nce on W orld W id e W e b , p ages 2 11– 2 2 0 , N ew Y ork , N Y , U S A, 2 0 0 7 . [4 ] Y . H assan-M ontero and V . H errero-S olana. Im p rov ing tag-clou ds as v isu al inform ation retriev al interfaces. In Intl. C onfe re nce on M ultid isc ip linary Information S c ie nce s and T ec h nolog ie s, M erida, S p ain, Octob er 2006. [5 ] J . L . H erlock er, J . A. K onstan, A. B orch ers, and J . R iedl. An Algorith m ic F ram ew ork for P erform ing Collab orativ e F iltering. In P roc . of th e 2 2 nd Intl. A C M S IG IR C onfe re nce on R e searc h and D e v e lop me nt in Information R e trie v al, p ages 2 3 0 – 2 3 7 , 19 9 9 . [6 ] P . H eym ann and H . G arcia-M olina. Collab orativ e Creation of Com m u nal H ierarch ical T ax onom ies in S ocial T agging S ystem s. T ech nical R ep ort 2 0 0 6 -10 , S tanford U niv ersity, Ap ril 2 0 0 6 . [7 ] P . H eym ann, G . K ou trik a, and H . G arcia-M olina. Can S ocial B ook m ark ing Im p rov e W eb S earch ? R e source S h e lf, N ov em b er 2 0 0 7 . [8 ] A. H oth o, R . J äsch k e, C. S ch m itz , and G . S tu m m e. Inform ation R etriev al in F olk sonom ies: S earch and R ank ing In P roc . of th e 3 rd E urop ean S e mantic W e b C onfe re nce , p ages 4 11– 4 2 6 , 2 0 0 6 . [9 ] W . H . H su , J . L ancaster, M . S . P aradesi, and T . W eninger. S tru ctu ral L ink Analysis from U ser P rofi les and F riends N etw ork s: A F eatu re Constru ction Ap p roach . In Intl. C onfe re nce on W e b log s and S oc ial M ed ia, M arch 2 0 0 7 . [10 ] A.-T . J i, C. Y eon, H .-N . K im and, and G .-S . J o. Collab orativ e T agging in R ecom m ender S ystem s. In A d v ance s in A rtifi c ial Inte llig e nce , M arch 2 0 0 7 . [11] O. K aser and D. L em ire. T ag-Clou d Draw ing: Algorith m s for Clou d V isu aliz ation, T agging and M etadata for S ocial Inform ation Organiz ation. In Intl. C onfe re nce on th e W orld W id e W e b , Alb erta, Canada, Octob er 2 0 0 7 . [12 ] S . K elk ar, A. J oh n, and D. S eligm ann. An Activ ity-b ased P ersp ectiv e of Collab orativ e T agging. In Intl. C onfe re nce on W e b log s and S oc ial M ed ia, M arch 2 0 0 7 . [13 ] G . K ou trik a, F . A. E ff endi, Z . G yöngyi, P . H eym ann, and H . G arcia-M olina. Com b ating sp am in tagging system s. In P roc . of th e 3 rd Intl. W ork sh op on A d v e rsarial Information R e trie v al on th e W e b , p ages 5 7 – 6 4 , N ew Y ork , N Y , U S A, 2 0 0 7 . ACM P ress. [14 ] N . L ath ia, S . H ailes, and L . Cap ra. T h e eff ect of correlation coeffi cients on com m u nities of recom m enders. In P roc . of 2 3 rd A C M S y mp osium on A p p lied C omp uting , 2 0 0 8 . [15 ] K . F . L aw rence and M . C. S ch raefel. B ringing Com m u nities to th e S em antic W eb and th e S em antic W eb to Com m u nities. In P roc . of th e 1 5 th Intl. C onfe re nce on W orld W id e W e b , 2 0 0 6 . [16 ] X . L i, L . G u o, and Y . E . Z h ao. T ag-b ased S ocial Interest Discov ery. In P roc . of th e 1 7 th Intl. W orld W id e W e b C onfe re nce , 2 0 0 8 . [17 ] R . N ak am oto, S . N ak ajim a, J . M iyaz ak i, and S . U em u ra. T ag-b ased Contex tu al Collab orativ e F iltering. In 1 8 th IE IC E D ata E ng inee ring W ork sh op , 2007. [18 ] A. P assant. U sing Ontologies to S trength en F olk sonom ies and E nrich Inform ation R etriev al in W eb logs. In P roc . of Intl. C onfe re nce on W e b log s and S oc ial M ed ia, 2 0 0 7 . [19 ] H . P olat and W . Du . P riv acy-P reserv ing Collab orativ e F iltering u sing R andom iz ed P ertu rb ation T ech niq u es. In T h e 3 rd IE E E Intl. C onfe re nce on D ata M ining (IC D M ), M elb ou rne, F L , N ov em b er 2 0 0 3 . [2 0 ] T . R attenb u ry, N . G ood, and M . N aam an. T ow ards au tom atic ex traction of ev ent and p lace sem antics from fl ick r tags. In P roc . of th e 3 0 th Intl. A C M S IG IR C onfe re nce on R e searc h and D e v e lop me nt in Information R e trie v al, p ages 10 3 – 110 , N ew Y ork , N Y , U S A, 2 0 0 7 . [2 1] J . R u an and W . Z h ang. Identifying netw ork com m u nities w ith a h igh resolu tion. P h y sical R e v ie w E (S tatistical, N onlinear, and S oft M atte r P h y sic s), 7 7 (1), 2 0 0 8 . [2 2 ] J . S ch afer, J . A. K onstan, and J . R iedl. R ecom m ender S ystem s in E -com m erce. In A C M C onfe re nce on E lec tronic C omme rce , p ages 15 8 – 16 6 , 19 9 9 . [2 3 ] S . S en, S . K . L am , A. M . R ash id, D. Cosley, D. F rank ow sk i, J . Osterh ou se, M . F . H arp er, and J . R iedl. tagging, com m u nities, v ocab u lary, ev olu tion. In P roc . of th e 2 0 th C onfe re nce on C omp ute r S up p orted C oop e rativ e W ork , p ages 18 1– 19 0 , N ew Y ork , N Y , U S A, 2 0 0 6 . [2 4 ] K . H . L . T so-S u tter, L . B . M arinh o, and L . S ch m idt-T h iem e. T ag-aw are R ecom m ender S ystem s b y F u sion of Collab orativ e F iltering Algorith m s. In P roc . of 2 3 rd A C M S y mp osium on A p p lied C omp uting , p ages 16 – 2 0 , 2 0 0 8 . [2 5 ] X . W u , L . Z h ang, and Y . Y u . E x p loring social annotations for th e sem antic w eb . In P roc . of th e 1 5 th Intl. C onfe re nce on W orld W id e W e b , p ages 4 17 – 4 2 6 , N ew Y ork , N Y , U S A, 2 0 0 6 . [2 6 ] C. M . A. Y eu ng, N . G ib b ins, and N . S h adb olt. M u tu al Contex tu aliz ation in T rip artite G rap h s of F olk sonom ies. In P roc . of th e 6 th Intl. S e mantic W e b C onfe re nce and 2 nd A sian S e mantic W e b C onfe re nce (IS W C / A S W C 2 0 0 7 ), B usan, S outh K orea, v olu m e 4 8 2 5 of L N C S , p ages 9 6 0 – 9 6 4 , N ov em b er 2 0 0 7 . [2 7 ] V . Z anardi and L . Cap ra. S ocial R ank ing: F inding R elev ant Content in W eb 2 .0 . In Intl. W ork sh op on R ecomme nd e r S y ste ms. In conjunc tion w ith th e 1 8 th E urop ean C onfe re nce on A rtifi c ial Inte llig e nce (E C A I), P atras, G reece, J u ly 2 0 0 8 . [2 8 ] D. Z h ou , E . M anav oglu , J . L i, L . C. G iles, and H . Z h a. P rob ab ilistic m odels for discov ering e-com m u nities. In P roc . of th e 1 5 th Intl. C onfe re nce on W orld W id e W e b , p ages 17 3 – 18 2 , N ew Y ork , N Y , U S A, 2 0 0 6 .