Conference PDF - UCL Computer Science

advertisement
Social Ranking: Uncovering Relevant Content
Us ing T ag-b as ed Recom m end er Sy s tem s
Valentina Zanardi
L ic ia C ap ra
D ep t. o f C o m p u ter S c ienc e
U niv ers ity C o lleg e L o ndo n
G o w er S treet, L o ndo n W C 1 E 6 B T , U K
D ep t. o f C o m p u ter S c ienc e
U niv ers ity C o lleg e L o ndo n
G o w er S treet, L o ndo n W C 1 E 6 B T , U K
V.Zanardi@ c s .u c l.ac .u k
L .C ap ra@ c s .u c l.ac .u k
ABSTRACT
1.
Social (or folksonomic) tagging has become a very popular w ay to d escribe, categorise, search, d iscover and navigate content w ithin W eb 2 .0 w ebsites. U nlike tax onomies,
w hich overimpose a hierarchical categorisation of content,
folksonomies empow er end users by enabling them to freely
create and choose the categories (in this case, tags) that best
d escribe some content. H ow ever, as tags are informally d efi ned , continually changing, and ungoverned , social tagging
has often been criticised for low ering, rather than increasing,
the effi ciency of searching, d ue to the number of synonyms,
homonyms, polysemy, as w ell as the heterogeneity of users
and the noise they introd uce. In this paper, w e propose Soc ia l R a n k in g , a method that ex ploits recommend er system
techniq ues to increase the effi ciency of searches w ithin W eb
2 .0 . W e measure users’ similarity based on their past tag
activity. W e infer tags’ relationships based on their association to content. W e then propose a mechanism to answ er a
user’s q uery that ranks (recommend s) content based on the
inferred semantic d istance of the q uery to the tags associated
to such content, w eighted by the similarity of the q uerying
user to the users w ho created those tags. A thorough evaluation cond ucted on the C iteU L ike d ataset d emonstrates that
Social R anking neatly improves coverage, w hile not compromising on accuracy.
T he ad vent of W eb 2 .0 has transformed users from passive
consumers to active prod ucers of content. T his has tremend ously increased the amount of information that is available
to users (from vid eos on sites like Y ouT ube and M ySpace,
to pictures on F lickr, to music on L ast.fm, and so on). T his
content is no longer categorised accord ing to pre-d efi ned tax onomies. R ather, a new trend called soc ia l (or folk son om ic )
ta g g in g has emerged and q uickly become the most popular
w ay to d escribe, categorise, search, d iscover and navigate
content w ithin W eb 2 .0 w ebsites.
U nlike tax onomies, w hich overimpose a hierarchical categorisation of content, folksonomies empow er end users by
enabling them to p e rson a lly and free ly create and choose the
categories (in this case, tags) that best d escribe a piece of
information (a picture, a blog entry, a vid eo clip, etc.). T ag
cloud s are then w id ely used to visualise a set of related tags
that best d escribe either ind ivid ual items or the content of
a w ebsite as a w hole, w ith the most freq uently used tags being given more importance either in font siz e or color. O ther
visualisation techniq ues have been stud ied , in ord er to give
more importance to tags’ relationships rather than popularity [4 , 1 1 ]. W hen users w ant to fi nd content, they navigate,
via hyperlinks, from a tag to a collection of items that are
associated w ith that tag.
H ow ever, as tags are informally d efi ned , continually changing, and ungoverned , social tagging has often been criticiz ed
for low ering, rather than increasing, the effi ciency of searching [2 ]. T his is d ue to the number of synonyms, homonyms,
polysemy, as w ell as the heterogeneity of users, contex ts, and
the noise that they introd uce.
In ord er to ‘connect’ users w ith content that they d eem
relevant w ith respect to their interests, effi cient searching
techniq ues have to be d eveloped for this novel and uniq ue
d omain. B y effi cient, w e mean that the searching techniq ue
should be both a cc u ra te (i.e., the returned content d oes satisfy users’ interests), and com p le te (i.e., if there is relevant
content in the system, this should be found ).
In this paper, w e propose a techniq ue called Soc ia l R a n k in g that aims to effi ciently fi nd , w ithin a potentially huge
d ataset, content that is relevant to a user’s q uery. In typical
W eb 2 .0 fashion, w e assume such content to have been d escribed w ith an arbitrary number of tags and by an arbitrary
number of users. Social R anking answ ers a user’s q uery by
ex ploiting trad itional recommend er system techniq ues (Section 2 ): it measures users’ similarity based on their past tag
activity; it infers tags’ relationships based on their association to content; fi nally, it ranks (recommend ) content based
Ca te g o r ie s a n d Su b je c t D e s c r ip to r s
H .3 .3 [Information Search and Retrieval]: Information
fi ltering; H .3 .3 [Information Search and Retrieval]: Q uery
formulation; H .3 .5 [O nline Information Services]: W ebbased services
G e n e r a l Te r m s
A lgorithms, P erformance
K eyw ord s
T ags, Similarity, W eb 2 .0 , R ecommend er Systems
Permission to make digital or hard copies of all or part of this work for
personal or classroom u se is granted withou t fee prov ided that copies are
not made or distrib u ted for profi t or commercial adv antage and that copies
b ear this notice and the fu ll citation on the fi rst page. T o copy otherwise, to
repu b lish, to post on serv ers or to redistrib u te to lists, req u ires prior specifi c
permission and/or a fee.
RecSys’08, O ctob er 2 3 – 2 5 , 2 0 0 8 , L au sanne, S witz erland.
C opy right 2 0 0 8 A C M 9 7 8 -1 -6 0 5 5 8 -0 9 3 -7 /0 8 /1 0 ...$ 5 .0 0 .
I N TRO D U CTI O N
on the inferred distance of the query to the tags associated to
such content, w eighted b y the sim ilarity of the querying user
to the users w ho created those tags. W e p resent the results
of an ex tensiv e ex p erim ental study w e hav e conducted on
the C iteU L ik e dataset (http :/ / w w w .citeulik e.org/ ), dem onstrating how S ocial R ank ing neatly im p rov es cov erage, w ithout com p rom ising on accuracy (S ection 3 ). W e p osition ourselv es w ith resp ect to other w ork s in the area in S ection 4 ,
b efore p resenting our conclusions and future directions of
research (S ection 5 ).
2.
2.1
MODEL
Da ta s e t A n a ly s is
In order to understand the k ey characteristics of the target
scenario, and thus dev elop a query m odel that is grounded
on its p eculiarities, w e hav e analysed C iteU L ik e, a typ ical
W eb 2 .0 w eb site. C iteU L ik e is a social b ook m ark ing w eb site that aim s to p rom ote and dev elop the sharing of scientifi c references am ongst researchers. S im ilarly to the cataloging of w eb p ages w ithin del.icio.us, and of p hotograp hs
w ithin F lick r, C iteU L ik e enab les scientists to organiz e their
lib raries w ith freely chosen tags w hich p roduce a folk sonom y of academ ic interests. C iteU L ik e runs a daily p rocess
w hich p roduces a snap shot sum m ary of w hat articles hav e
b een p osted b y w hom and w ith w hat tags. W e dow nloaded
one such archiv e in D ecem b er 2 0 0 7 . T he archiv e contained
roughly 2 8 ,0 0 0 users, w ho had tagged 8 2 0 ,0 0 0 p ap ers ov erall, using 2 4 0 ,0 0 0 distinct tags. A p re-analysis of the archiv e
rev ealed the p resence of a v ast am ount of p ap ers and a v ast
am ount of tags b ook m ark ed/ used b y one user only. In order to m ak e the dataset m ore m anageab le, w e p runed it
so to rem ov e those p ap ers and tags that had b een b ook m ark ed/ used only once ov er the entire dataset. W e w ere
thus left w ith roughly 1 0 0 ,0 0 0 p ap ers, 5 5 ,0 0 0 distinct tags,
and 2 8 ,0 0 0 users.
W e then analysed this dataset m ore carefully in term s
of users’ activ ity, p ap ers’ p op ularity, and tags’ usage. D etailed results are rep orted in a p relim inary v ersion of this
p ap er [2 7 ]. W ith resp ect to the p rob lem of fi nding and
recom m ending content in W eb 2 .0 w eb sites, the follow ing
insights can b e draw n:
Long Tail of Tags: a p ow er law distrib ution curv e em erges
for tags’ usage, identifying a sm all p ortion of frequently
used tags, and a long tail (roughly 7 0 % ) of tags b eing
used b y 2 0 users (i.e., 0 .0 8 % of the w hole user set)
or less instead. M oreov er, p ap ers w ere describ ed b y
no m ore than ten diff erent tags (and usually less then
fi v e). T his suggests that fi nding content using standard k eyw ord b ased searches is lik ely to fail, due to
em p ty ov erlap s b etw een the tags used in the query
and those associated to p ap ers.
Long Tail of P ap e rs: a rather steep p ow er law distrib ution curv e em erges for p ap ers p op ularity too, identifying a sm all p ortion of p ap ers b eing b ook m ark ed (and
tagged) b y at least fi v e diff erent users, and a huge tail
(m ore than 8 5 % ) of p ap ers b eing b ook m ark ed b y less
than fi v e users instead (i.e., 0 .0 2 % of the w hole user
set). T his suggests that standard recom m ender system s techniques w ould lik ely p erform p oorly in term s
of accuracy and cov erage, b ecause of alm ost-em p ty
ov erlap s of users’ p rofi les.
A content search/ recom m ender technique for W eb 2 .0 w eb sites should thus b e dev elop ed, tak ing into account these
intrinsic characteristics of the target scenario. W e found
the follow ing tw o p rop erties to b e p rom ising to tack le b oth
accuracy and cov erage:
C lu ste ring of U se rs for Im p rov e d A c c u rac y : although
users v ary a lot in term s of activ ity, ev en the m ost activ e users b ook m ark a rather tiny p ortion of the w hole
p ap er set. T his suggests that users hav e clearly defi ned interests that m ap to a sm all p rop ortion of the
w hole C iteU L ik e content. T his is confi rm ed b y tags’
usage: each user m asters a sm all sub set of the w hole
folk sonom y, and users sharing p art of the folk sonom y
form fairly sm all clusters. W e form ulate the hyp othesis
that, b y look ing at users’ tag activ ity, users’ sim ila rity can b e quantifi ed and ex p loited to answ er content
searches m ore accurately.
C lu ste ring of Tags for Im p rov e d C ov e rage : desp ite the
em ergence of a rather b road folk sonom y, each p ap er
w as describ ed b y just a handful of tags. T his w ould
suggest that there is a core of shared k now ledge ab out
tags w ithin the com m unities w ho use them , and these
are recurrently used to describ e related p ap ers. W e
form ulate the hyp othesis that, b y look ing at w hat tags
w ere associated to w hat p ap ers, ta g s’ sim ila rity (or,
rather, ‘relationship ’) can b e quantifi ed and ex p loited
to uncov er relev ant item s during content searches.
B ased on these ob serv ations, w e hav e dev elop ed a content
search and recom m endation technique called S oc ia l R a nk ing .
2.2
S o c ia l R a n k in g
L et us consider a user u w ho is interested in retriev ing
som e content of interest (in our sp ecifi c case, p ap ers). U ser
u could ex p licitly sub m it a query qu consisting of query tags
t1 , t2 , . . . , tn ; alternativ ely, in a m ore typ ical recom m ender
system fashion, the system could im p licitly run a query, using the set of tags t1 , t2 , . . . , tn associated b y the user to his
latest b ook m ark ed p ap er, or the set of his m ost frequently
used tags ov erall, etc. In b oth cases, the system answ ering the query w ould norm ally rank results according to the
follow ing tw o criteria: the higher the num b er of query tags
associated to the resource, the higher its rank ing; and, the
higher the num b er of users ui w ho tagged the resource using
(som e of the) query tags, the higher its rank ing. Intuitiv ely
sp eak ing, the fi rst criterion caters for accuracy of the result,
the second caters for confi dence in it. T he form ula is:
X
R(p) =
(1 )
(# tx used b y ui on p | tx ∈ qu ) ,
ui
that is, the rank ing of p ap er p is com p uted as the num b er
of tags tx that users ui w ho b ook m ark ed p used and that
b elonged to the query set qu .
A s w e shall dem onstrate ex p erim entally in S ection 3 , w hile
this sim p le technique w ork s w ell to fi nd p op ular content describ ed w ith p op ular tags, it fails to address queries that
look for the v ery long tail of m edium -to-low p op ularity content, as a large am ount of low -score results are returned.
A ccuracy is not the only p rob lem : if the user running the
query also uses tags that b elong to the long tail of tags,
chances are that relev ant content is not found at all, and
cov erage then b ecom es the m ost p ressing issue.
rce
Users
Users
s
Tags
Tags
u
so
Re
Resources
Tags
Tags
Users
Users
Tags
Figure 1: Transformation of the dataset
To address these problems, we propose Social Ranking,
a techniq u e inspired by traditional C ollaborativ e F iltering
mechanisms [2 2 ]: fi rst, we identify the u sers with similar
interests to the q u ery ing u ser u; according to ou r analy sis, su ch commu nity shou ld be easily identifi ed by stu dy ing
u sers’ tag activ ity . C ontent tagged by these u sers shou ld be
scored higher in a way that is proportional to the q u antifi ed
similarity . Second, ev en thou gh tags can be broadly clu stered in domains of knowledge, people tend to u se slightly
diff erent su bsets of them within each domain. W e thu s identify the tags that are similar (or, rather, related) to the q u ery
tags, thu s ex panding the q u ery to this enlarged set. W e believ e, and ou r ev alu ation will confi rm, that users’ similarity
imp ro v es acc urac y of the resu lts, while tag s’ similarity (i.e.,
q uery ex p an sio n ) imp ro v es co v erag e.
In the remainder of this section, we illu strate how we compu te u sers’ similarity (Section 2 .2 .1 ), how we compu te tags’
similarity (Section 2 .2 .2 ), and how we combine these two
techniq u es together (Section 2 .2 .3 ).
2.2.1 Users’ Similarity
Social tagging ty pically prov ides a 3 -dimensional relationship between u sers, resou rces and tags (u sers bookmark resou rces with a certain nu mber of tags). D iff erent defi nitions
of u sers’ similarity can be deriv ed; here we consider a simple
y et eff ectiv e one: the more tags two u sers hav e u sed in common, the more similar they are, regardless of what resou rces
they u sed it on. This defi nition projects ou r 3 -dimensional
space onto a 2 -dimensional one, throwing away information
abou t ‘resou rces’, and keeping only information abou t what
tags a u ser has u sed and how often (F igu re 1 , top). W hile
one may argu e that, in so doing, we discard important information, we believ e that, in scenarios where tags are clu stered
arou nd topics, the information lost is not signifi cant.
W e thu s describe each u ser ui with a v ector vi where
vi [j] cou nts the nu mber of times that u sers ui u sed tag tj .
G iv en two u sers ui and uj , we then q u antify u sers’ similarity
s im(ui , uj ) as the cosine of the angle between their v ectors:
s im(ui , uj ) = c o s (vi , vj ) =
vi · vj
||vi || ∗ ||vj ||
V ariou s similarity measu res can be u sed other than the cosinebased similarity [5 ]. F or ex ample, concordance-based similarity [1 ] cou ld be u sed, so that the more tags two u sers
share, the more similar they are (regardless of how many
times they hav e u sed them). H owev er, we believ e tag freq u ency to be an important piece of information to determine
a u ser’s interests. A lternativ ely , P earson C orrelation (and
its v ariations - e.g., weighted P earson [1 9 , 5 ]) cou ld be u sed;
as shown in [1 4 ], diff erent similarity measu res perform differently , both in terms of accu racy and cov erage; we chose
cosine-based similarity for its constantly good performance,
althou gh we plan to stu dy the impact of other similarity
measu res in the fu tu re.
2.2.2
T ag s’ Similarity
W e defi ne tags’ similarity as follows: the more resou rces
hav e been tagged with the same pair of tags, the more similar (related) these tags are, regardless of the u sers who
u sed them. This defi nition projects ou r 3 -dimensional space
onto a 2 -dimensional one, as shown in F igu re 1 , bottom
part. Similarly to what we said before, in scenarios where
u sers’ interests are a rather small and consistent su bset of
the broader range of topics in the whole website, we believ e
that the information thrown away du ring the projection is
not signifi cant.
W e thu s describe each tag ti with a v ector wi where wi [j]
cou nts the nu mber of times that tag ti was associated to
paper pj . G iv en two tags ti and tj , we then q u antify tags’
similarity s im(ti , tj ) as the cosine of the angle between their
v ectors:
wi · wj
s im(ti , tj ) = c o s (wi , wj ) =
||wi || ∗ ||wj ||
2.2.3
T w o -Step Q u ery M o d el
The q u ery model we propose ex ploits the two similarity
measu res discu ssed abov e (on u sers and on tags) in the following way . W hen u ser u su bmits a q u ery qu = {t1 , t2 , . . . , tn }
to discov er content that can be described by q u ery tags
t1 , t2 , . . . , tn , two steps take place:
1 . Q uery E x p ansion: the set of q u ery tags qu is ex panded so to inclu de, besides {ti | ti ∈ qu } (for which
s im(ti , ti ) = 1 ), those tags tn+1 , . . . , tn+m that are
deemed most similar to the q u ery tags (for which 0 <
s im(ti , tj ) ≤ 1 , with i ∈ [1 , n] and j ∈ [n + 1 , n + m]).
This set, which we call q ∗ , is constru cted so to inclu de,
for each ti ∈ qu , its top k most similar tags, in a fashion
similar to the top k N earest N eighbou r (kN N ) strategy in recommender sy stems. A thorou gh analy sis of
the impact of k on both accu racy and cov erage will be
presented in Section 3 .
2 . R ank ing: all resou rces that hav e been tagged with
at least one tag from the ex tended q u ery set are retriev ed. Their ranking depends on a combination of:
the relev ance of the tags associated to the paper with
respect to the q u ery tags (papers tagged with ti , i ∈
[1 , n] shou ld cou nt more than those tagged with tj , j ∈
[n + 1 , n + m]); and, the similarity of the taggers with
respect to the q u ery ing u ser u (papers tagged by similar u sers shou ld be ranked higher, as these u sers are
more likely to share interests with u than others, and
thu s are in a better position to recommend relev ant
content).
The ranking of a paper p wou ld then be compu ted as:


R(p) =
X


ui
X
{tx |ui tagged p w ith tx },
tj ∈q ∗

s im(tx , tj )
∗(s im(u, ui )+1 )
(2 )
3.
EVALUATION
W e hav e thoroug hly an aly sed the p erform an ce of S ocial
R an k in g on the C iteU L ik e d ataset, b oth in term s of accuracy
an d cov erag e (S ection 3 .3 ). B efore d iscussin g these results,
we b riefl y illustrate the p ortion of the d ataset we hav e b een
ex p erim en tin g with (S ection 3 .1 ), an d d escrib e how we hav e
con d ucted the ex p erim en ts (S ection 3 .2 ).
3.1
Th e D a ta s e t
B ased on our p re-an aly sis of the C iteU L ik e d ataset (S ection 2 .1 ), we hav e p erform ed a cut, in ord er to ob tain a sm all
y et m ean in g ful sub set to ex p erim en t with. In p articular, we
hav e con sid ered on ly those tag s that hav e b een used on at
least 1 5 d iff eren t p ap ers, an d b y at least 2 0 users. T his
has left us with a d ataset con sistin g of roug hly 1 2 ,0 0 0 users,
8 3 ,0 0 0 p ap ers, an d 1 6 ,0 0 0 tag s. N ote that the lon g tail p hen om en on still v astly d om in ates in the p run ed d ataset:
Long tail of users’ similarity: as shown in F ig ure 2 , the
v ast m ajority of users’ p airs hav e v ery low v alue of
sim ilarity (b elow 0 .1 ), while there ex ists a lon g tail of
hig her sim ilarity p airs. T his would sug g est users are
hig hly focused (an d clustered ) aroun d top ics, an d thus
on ly a relativ ely sm all p ortion of users are in d eed g ood
recom m en d ers to each other.
Long tail of tags’ similarity: as shown in F ig ure 3 , each
tag is related to on ly a v ery sm all sub set of other tag s,
ag ain sug g estin g that on ly a relativ ely sm all p ortion of
tag s are used (an d thus n eed to b e learn ed ) to d escrib e
sp ecifi c categ ories of con ten t.
W e b eliev e that the results we are g oin g to p resen t in
this section g en erally hold for d atasets that ex hib it sim ilar
characteristics.
3.2
S im u la tio n S e tu p
In ord er to q uan tify accuracy an d cov erag e of S ocial R an k in g , we hav e con d ucted the followin g b asic ex p erim en t: we
Pairs of Users
2500000
2024024
2000000
1500000
1000000
500000
146978
54481
26263
18509
12715
6464
7295
4792
5832
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
Cosine Similarity
F igure 2 : D istrib ution of users’ similarity
Pairs of Tags
P
where, for each user ui who tag g ed p,
s im (tx , tj ) q uan tifi es how relev an t the tag s tx associated b y ui to p are with
resp ect to the tag s tj b elon g in g to the ex p an d ed q uery set
q ∗ ; n ote that, in the b asic case of form ula 1 , this sim p ly
m ean t coun tin g how m an y tag s from qu user ui associated
to p. M oreov er, the relev an ce is then m ag n ifi ed (i.e., p ap ers are p ushed hig her up in the ran k in g ) in a way that is
p rop ortion al to user’s sim ilarity s im (u, ui ).
A ssum in g that users’ sim ilarity s im (ui , uj ) an d tag s’ sim ilarity s im (ti , tj ) are com p uted offl in e (i.e., d aily , week ly ,
etc.), then the com p lex ity of an swerin g a q uery con tain in g
T tag s is O(k · T · P · N ), where P is the n um b er of p ap ers
in the sy stem an d N is the n um b er of users. H owev er, this
is a g ross ov erestim ation : as our d ataset p re-an aly sis has
shown , each tag is used on av erag e on at m ost 4 0 p ap ers
(with 4 0 < < P ), an d each p ap er has b een tag g ed on av erag e b y less than 5 users (with 5 < < N ), so that the tim e to
an swer a q uery is sim p ly p rop ortion al to the n um b er of tag s
in the ex p an d ed q uery set (i.e., k · T ).
W e call this ap p roach S ocial R an k in g , as it ex p loits in form ation com in g from the em erg en t social n etwork of users
an d social n etwork of tag s to ran k con ten t in a way that
is m ean in g ful to the q uery in g user. In the n ex t section , we
p resen t the results ob tain ed when ev aluatin g this ap p roach.
5000000
4000000
3000000
2000000
1000000
0
4756729
359087
110865
48145
25928
15048
7863
6546
4779
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
2188
1
Cosine Similarity
F igure 3 : D istrib ution of tags’ similarity
p ick ed up a user u, “ hid ” on e of his b ook m ark ed p ap ers p
as well as the tag s that u had associated to p; we then p erform ed a q uery q with such tag s. S in ce p was b ook m ark ed
b y u (b efore we hid it), u is ob v iously in terested in it, so a
recom m en d er sy stem should b e ab le to return p (cov erag e).
N ote that, in our p run ed d ataset, it was alway s the case
that, ev en after hid in g u’s b ook m ark for p, at least an other
b ook m ark m ad e b y a user u0 for p ex isted , as we on ly k ep t
in the d ataset those p ap ers that had b een b ook m ark ed b y
m ore than on e user; it should thus b e p ossib le, in p rin cip le,
to locate an d return p. M oreov er, the hig hest the ran k in g of
p in the list of return ed p ap ers (i.e., the closest to the top ),
the b etter the accuracy of the ran k in g alg orithm .
G iv en the hig h v ariab ility of users’ b ehav iour an d p ap ers’
p op ularity in the d ataset, we hav e id en tifi ed 6 d iff eren t categ ories of ex p erim en ts, b ased on :
- the lev el of activ ity of the q uery in g user, d istin g uishin g heav y tag g ers H T (users who tag g ed m ore than 5 0
p ap ers), m ed ium tag g ers M T (users who tag g ed b etween 1 0 an d 5 0 p ap ers), an d low tag g ers L T (users
who tag g ed less than 1 0 p ap ers);
- the lev el of p op ularity of the hid d en b ook m ark , d istin g uishin g p op ular p ap ers P P (those that had b een
b ook m ark ed b y at least 5 users), an d un p op ular on es
U P (those that had b een b ook m ark ed b y less than 5
users).
F or each user in each g roup (heav y / m ed ium / low tag g ers),
three b ook m ark s where chosen at ran d om within each p ap er categ ory (p op ular/ un p op ular), hid d en an d their corresp on d in g tag s searched . S in ce the n um b er of users in each
g roup v aries, so d oes the total n um b er of q ueries p erform ed
(from 1 ,8 0 0 for the sm all g roup of H T / P P , to 1 3 ,1 0 0 for
the m uch larg er g roup of L T / U P ). R esults are rep orted for
each categ ory . In all ex p erim en ts, we com p are the outp ut
of our S ocial R an k in g alg orithm (form ula 2 ) with the sim p le
b en chm ark p resen ted in S ection 2 .2 (form ula 1 ).
H e a vy t a gge r s, popu la r pa pe r s
D iffe r e n ce of posit ion s
150
100
454
50
1032
0
-50
-100
200
D iffe r e n ce of posit ion s
D iffe r e n ce of posit ion s
150
200
100
50
2750
1012
0
-50
-100
-150
-150
100
50
-100
-150
N u m be r of qu e r ie s ( t ot a l u n cove r e d: 3 3 6 4 )
(a)
N u m be r of qu e r ie s ( t ot a l u n cove r e d: 3 5 2 4 )
(b)
H e a vy t a gge r s, u n popu la r pa pe r s
50
1536
0
-50
-100
-150
-200
-250
Low t a gge r s, u n popu la r pa pe r s
150
D iffe r e n ce of posit ion s
100
D iffe r e n ce of posit ion s
150
100
1174
(c)
M e diu m t a gge r s, u n popu la r pa pe r s
150
50
2414
2954
0
-50
-100
-150
-200
-250
100
50
5092
-50
-100
-150
-200
N u m be r of qu e r ie s ( t ot a l u n cove r e d: 3 7 2 2 )
(d)
N u m be r of qu e r ie s ( t ot a l u n cove r e d: 7 9 2 9 )
(e)
H e a vy t a gge r s, u n popu la r pa pe r s
(f)
Socialranking
Socialranking
M e diu m t a gge r s, u n popu la r pa pe r s
Basic
Socialranking
Low t a gge r s, u n popu la r pa pe r s
Basic
100
80
80
80
40
60
40
20
20
0
0
10
25
50
100
200
> 200
Pe r ce n t a ge
100
Pe r ce n t a ge
100
60
6245
0
-250
N u m be r of qu e r ie s ( t ot a l u n cove r e d: 1 8 0 9 )
5
2971
1084
0
-50
-250
-200
N u m be r of qu e r ie s ( t ot a l u n cove r e d: 1 5 5 9 )
Pe r ce n t a ge
150
-200
-200
-250
D iffe r e n ce of posit ion s
Low t a gge r s, popu la r pa pe r s
M e diu m t a gge r s, popu la r pa pe r s
250
Basic
60
40
20
0
5
10
25
Posit ion
(g)
50
100
Posit ion
200
> 200
5
10
25
50
100
200
> 200
Posit ion
(h)
(i)
Figure 4: Social Ranking (without query expansion) vs. Basic Model
3.3
3.3.1
Results
Impact of Users’ Similarity on Accuracy
The first set of experiments we conducted aimed to analy se the impact of users’ similarity alone on the rank ing of results. W e thus compared the b asic q uery model with the adv anced q uery model where tag expansion had b een disab led.
F or each q uery , the list of returned papers is thus the same
(i.e., the search happens using the same q uery tag s), b ut
ordered diff erently (i.e., users’ similarity in S ocial R ank ing
causes reshuffl es). F or each q uery that uncov ered the hidden paper, we hav e computed the position of such paper in
the rank ed list of results produced b y S ocial R ank ing minus
its position in the rank ed list of results produced b y the b asic model: the lower the diff erence, the b etter the accuracy
of S ocial R ank ing , and v icev ersa. F ig ure 4 plots the results,
sorted b y the measured diff erence, for all six categ ories (first
row for p o p ula r p a p e rs and second row for unp o p ula r p a p e rs);
the two x v alues hig hlig hted in each chart represent the first
and last q uery for which the two approaches perform the
same (i.e., the diff erence in rank ing is z ero).
A s shown, the rank ing of results is slig htly b etter when
using the b asic q uery model in the first scenario: when focusing on mainstream content (i.e., the hidden paper has
b een tag g ed many times b y diff erent users), simple searches
b ased on exact tag matching work well enoug h. H owev er,
in all other scenarios, the adv anced q uery model outperforms the b asic one (i.e., it returns the hidden paper at a
hig her rank ing in the v ast majority of cases). The improv ement is more dramatic when considering unpopular papers
(second row), thus confirming the importance of weig hting
the recommendations coming from similar users more, when
look ing for less ‘mainstream’ content. If we tak e a closer
look at the ‘unpopular papers’ set of results, we can notice
that, on heav y tag g ers (F ig ure 4 (d)), 2 5 % of the hidden
papers are returned at positions that are b etween 1 0 and
2 0 5 positions b etter using S ocial R ank ing than when using
the b asic model, ag ainst only 7 % of cases where the b asic
model result is b etter rank ed (b etween 1 0 and 1 3 0 positions
g ap); on medium tag g ers (F ig ure 4 (e)), 3 1 % of results are
b etter rank ed (with a g ap b etween 1 0 and 2 4 2 ) ag ainst 8 %
(with g ap [1 0 , 1 4 4 ]); finally , on low tag g ers (F ig ure 4 (f)),
the ratio is 2 8 % ag ainst 8 % , and similar rank ing g ap.
In order to b etter appreciate the improv ement ob tained
in terms of accuracy , we hav e also plotted the cumulativ e
distrib ution of the rank ing of the “ hidden” papers, using
the adv anced model without q uery expansion and the b asic
model, for unpopular papers. F ig ures 4 (g ) (h) and (i) (third
row) illustrate the results: as shown, S ocial R ank ing neatly
improv es the ab solute rank ing of the hidden paper, and it
does so more ev idently for heav y and medium tag g ers, that
is, for users whose similarity can b e b etter assessed thank s to
their activ ity within the sy stem. F or example, ab out 3 0 % of
the hidden papers are found in the top 5 positions using S ocial R ank ing on heav y tag g ers, ag ainst 2 0 % using the b asic
model. This first set of experiments thus demonstrates our
hy pothesis that users’ similarity is eff ectiv ely exploited b y
S ocial R ank ing to improv e accuracy , and this is particularly
important when try ing to dig out unpopular content.
L et us now focus on cov erag e. The column lab eled ‘k = 0 ’
in Tab le 1 summarises the percentag e of papers that remained hidden when tag expansion was not used. A s shown,
Category
H T /P P
M T /P P
LT /P P
H T /U P
M T /U P
LT /U P
Q u eries
1882
4 09 4
4171
2 4 00
5835
13130
k=0
17%
18%
16%
24%
36%
4 0%
k=5
8%
8%
8%
14%
22%
26%
N ot F ou n d
k = 10
k = 20
6%
4%
6%
5%
6%
4%
12%
9%
17%
14%
23%
18%
k = 50
2%
2%
3%
5%
8%
13%
Table 1: Percentage of queries remaining hidden.
this percentage is approximately 16-18% for popular papers,
and it q uik cly increases up to 4 0 % for unpopular ones. G iv en
that all papers in our d ataset hav e b een b ook mark ed b y
more than one user, low cov erage is an ind ication that differen t u sers bo o k m a rk th e sa m e reso u rces differen tly . S earching
techniq ues b ased on user-specifi ed q uery-tags only are thus
unab le to uncov er unpopular yet relev ant resources; in the
next section, w e d emonstrate how cov erage can b e improv ed
b y expand ing user-d efi ned q uery tags to includ e semantically
related ones.
3.3.2
Impact of Tags’ Similarity on Coverage
T he second set of experiments w e hav e cond ucted aimed
at comparing the full S ocial R ank ing mod el against the b asic one. D uring q uery expansion, S ocial R ank ing extend s
each q uery tag w ith the top kN N tags. W e hav e b een experimenting w ith d iff erent v alues of k = 5 , 10 , 2 0 , 5 0 ; w e
hav e b een measuring the impact of the full mod el on b oth
accuracy and cov erage. O ur goal w as to neatly improv e
cov erage, especially w hen d ealing w ith unpopular content,
w ithout sev erely impacting on accuracy.
T ab le 1 reports the percentage of q ueries for w hich the
target paper still remained hid d en, across all v alues of k
(includ ing k = 0 , that is, no q uery expansion). A s show n,
ev en small v alues of k cause the numb er of not-found items
to q uick ly d rop. F or example, w hen k = 5 , the numb er of
u n p o p u la r items not found falls from 2 4 % for heav y taggers,
3 6% for med ium taggers, and 4 0 % for low taggers, do w n to
14 %, 2 2 %, and 2 6% for the three users’ categories respectiv ely. F or k = 10 , there is an av erage 5 0 % red uction of
not-found items, w ith respect to the case of no q uery expansion (k = 0 ). C ov erage k eeps improv ing, although less
d ramatically, for higher v alues of k.
In ord er to assess the impact of q uery expansion on accuracy, w e report tw o separate sets of results. F or those q ueries
that w ere uncov ered b y b oth S ocial rank ing and the b asic
q uery mod el, w e hav e computed the p ercen tiles of the rank ing of the “ hid d en” paper. T ab le 2 show s results across all 6
scenarios (S ocial R ank ing positions on the left of each cell,
and b asic q uery mod el positions on the right). W e only report results for k = 10 and k = 2 0 for space reasons. W hen
b oth approaches uncov er a paper, accuracy is comparab le:
for instance, 10 % of the hid d en papers are found in the top
5 positions; half of the hid d en papers (5 0 th percentile) are
found in the top 10 positions in the case of popular papers,
and in the top 4 0 positions in the case of unpopular papers,
b y b oth approaches. T his confi rms that the improv ement
ob tained on cov erage v ia q uery expansion d oes not compromise accuracy for v alues of k up to 2 0 ; this is aligned w ith
our pre-analysis of the d ataset, w hich rev ealed that the v ast
majority of papers w ere tagged w ith no more than 10 d ifferent tags: increasing k much b eyond that v alue increases
noise instead (w ith only a small improv ement on cov erage,
as confi rmed b y T ab le 1).
F inally, for the set of q ueries uncov ered b y S ocial R ank ing
only, w e hav e computed the cumulativ e d istrib ution of their
rank ing. O nce again, for space concerns, w e only d isplay
the results for the critical case of u n p o p u la r p a p ers and for
k = 10 (F igure 5 ). A s the charts illustrate, more than 4 0 %
of the papers that could not b e found using the b asic mod el,
are now returned in the top 10 0 positions (and b etw een 2 0 %
and 3 0 % of them in the top 5 0 ). T his second set of experiments thus d emonstrates that tags’ similarity can ind eed b e
exploited , not just to uncov er relev ant content, b ut also to
recommend it highly, so to b ring it to the attention of the
end user.
4.
RELATED WORK
R esearch in the area of social tagging has proliferated in
recent years, d ue to the increasing popularity of such systems. S tud ies hav e b een cond ucted b oth to und erstand tag
usage and ev olution (e.g., [2 3 , 3 ]), and to learn and exploit
their hid d en semantics. In [7 ], a large stud y of social tagging
on the popular d el.icio.us b ook mark ing system is presented ,
aimed at characteriz ing users’ activ ity, pages’ popularity,
and tags’ d istrib ution; the k now led ge b ase (in this case, the
w hole W eb ) is so large and d ynamic that the authors are
q uite pessimistic on the b enefi ts that social b ook mark ing
can b ring to w eb searches. In [6], the same authors hav e
show n how searches on d el.icio.us can b e improv ed if a nav igab le hierarchical taxonomy of tags is d eriv ed from tag usage, to help users b road ening/ narrow ing the set of tags that
b est d escrib e their interests. O ur approach tak es a d iff erent stance, and rather than off ering users an organised tag
nav igation system, it aims to transparently improv e users’
searches b ased on emergent tags semantics and q uery expansion. In [18], tags are related b ack to a fi xed ontology of
concepts, thus exploiting b oth techniq ues to enhance information retriev al capab ilities. D iff erently from this approach,
our goal is to autonomically d eriv e tags’ relationships, w hich
can then b e fi tted into an eff ectiv e q uery search algorithm,
w ithout relying on a prefi xed ontology. In [2 0 ], semantics
that specifi cally relate to places and ev ents are inferred for
resources w ithin the F lick r d ataset; their approach is highly
tied to location information, and thus not easily generaliz ab le to other d omains. In [2 5 ], a prob ab ilistic generativ e
mod el is proposed to d escrib e users’ annotation b ehav iour,
and to automatically d eriv e tags emergent semantics; d uring searches, their approach is capab le of grouping together
synonymous tags, w hile it calls for user’s interv ention w hen
highly amb iguous tags are found . V ery early w ork , b ut w ith
similar goals, is presented in [2 6], w here a simpler techniq ue,
b ased on an analysis of the relationship b etw een users, tags
and resources, is proposed to d isamb iguate tags. T ag systems hav e recently rev ealed their susceptib ility to tag spam,
that is, malicious annotations generated to confuse users.
T he prob lem has b een w ell analysed in [13 ], w here the authors tried to id entify misused tags, and q uantify the extent
to w hat tagging systems are rob ust against spam. R ob ust
solutions to tag spamming are still b eing inv estigated .
R esearch has b een v ery activ e also in relating tag activ ity
to users, in ord er to d iscov er their interests and conseq uently
users’ communities. W ork w ithin the S emantic W eb d omain
has tried to classify users into categories and d escrib e the
k ey features of such categories [15 ]. M ore recently, users
Category
5
1|1
1|1
1|1
2|2
3|2
2|2
H T /P P
M T /P P
LT /P P
H T /U P
M T /U P
LT /U P
10
1|1
1|1
1|1
3|3
5|4
4|4
P erc en tiles (k = 10 )
25
50
75
2|2
6|5
27 | 23
1|1
4|3
15 | 13
1|1
3|3
12 | 10
10 | 7
35 | 25
80 | 67
12 | 9
31 | 27
76 | 67
9|8
26 | 23
71 | 61
95
10 2 | 88
70 | 64
82 | 71
186 | 162
20 7 | 170
20 9 | 169
5
1|1
1|1
1|1
2|2
3|2
2|2
10
1|1
1|1
1|1
4|3
5|4
4|3
P erc en tiles (k = 20 )
25
50
75
2|2
7|5
29 | 23
2|1
5|3
17 | 13
1|1
4|3
15 | 10
12 | 7
39 | 25
86 | 67
13 | 9
35 | 27
85 | 67
10 | 8
30 | 23
80 | 61
95
112 | 88
80 | 64
88 | 71
215 | 162
245 | 170
257 | 169
Table 2: Percentiles of the ranking of results, for Social Ranking vs. Basic Model
(a)
(b)
100
80
80
80
60
40
Pe r ce n t a ge
100
Pe r ce n t a ge
Pe r ce n t a ge
(c)
Low t a gge r s, u n popu la r pa pe r s ( UN H I D D EN )
M e diu m t a gge r s, u n popu la r pa pe r s ( UN H I D D EN )
H e a vy t a gge r s, u n popu la r pa pe r s ( UN H I D D EN )
100
60
40
20
20
5
10
25
50
100
200
> 200
40
20
0
0
60
0
5
10
25
Posit ion
50
100
200
> 200
5
10
25
50
100
200
> 200
Posit ion
Posit ion
F igure 5 : A ccuracy of q ueries uncovered by Social Ranking (k = 10)
h a v e b e e n c la ssifi e d a c c o rd in g to th e ir e x p lic itly sta te d p ro fi le [9 ], b a se d o n a p ro b a b ilistic m o d e l w h ich ta k e s in to a c c o u n t u se rs’s in te re st to to p ic s [2 8 ], a n d b a se d o n th e ir le v e l
o f ta g g in g a c tiv ity a n d b re a d th o f in te re sts [12 ]. In [16 ],
u se rs’ c o m m o n in te re sts a re d isc o v e re d b a se d o n p a tte rn s
o f fre q u e n tly c o -o c c u rrin g ta g s, u sin g a c la ssic a l a sso c ia tio n
ru le a lg o rith m , w h ich h o w e v e r d o e s n o t ta k e in to a c c o u n t
c o n sid e ra tio n s a b o u t u se r’s a c tiv ity . A ll th e se w o rk s, in c lu d in g o u r a tte m p t to fi n d sim ila r u se rs, a re b a se d o n th e
o b se rv a tio n th a t re a l w o rld n e tw o rk s e x h ib it a so -c a lle d c o m m u n ity stru c tu re [2 1]; d e fi n in g th e se t o f ch a ra c te ristic s th a t
w o u ld e n a b le th e b e st fi ttin g a n d n a tu ra l c lu ste rin g o f ta g g e rs a n d ta g s is a n o p e n re se a rch q u e stio n .
In th is p a p e r, w e h a v e b e e n c o m b in in g th e tw o re se a rch
stre a m s h ig h lig h te d a b o v e (i.e ., a u to m a tic le a rn in g o f ta g
se m a n tic s a n d u se rs’ in te re sts) in o rd e r to im p ro v e q u e ry
se a rch e s a n d ra n k in g . O th e r re se a rch g ro u p s h a v e b e e n
c o n d u c tin g re se a rch in th e sa m e a re a . In [2 4 , 17 ], th e in te g ra tio n o f ta g in fo rm a tio n w ith in sta n d a rd re c o m m e n d e r
sy ste m ’s a lg o rith m s h a s b e e n p ro p o se d , in o rd e r to g iv e b e tte r re c o m m e n d a tio n s to u se rs; a lth o u g h v e ry p ro m isin g , a t
p re se n t su ch w o rk s d o n o t ta k e in to a c c o u n t th e ‘a c tiv ity ’
o f u se rs, in te rm s o f a m o u n t o f re so u rc e s b e in g ta g g e d , a n d
n u m b e r o f ta g s b e in g u se d . W e b e lie v e th is in fo rm a tio n to
b e c ru c ia l to e x tra c t u se rs’ in te re sts a n d th u s im p ro v e th e
e ffi c ie n c y o f se a rch e s. T a g a c tiv ity h a s b e e n c o m b in e d w ith
a P a g e R a n k -lik e a lg o rith m , in o rd e r to im p ro v e th e ra n k in g
m e ch a n ism , in situ a tio n s w h e re re so u rc e s a re n o t lin k e d to g e th e r a s in a ty p ic a l w e b g ra p h stru c tu re [8 ]; th e ir a p p ro a ch ,
c a lle d F o lk R a n k , p ro v id e s g o o d re su lts w h e n q u e ry in g th e
fo lk so n o m y fo r to p ic a lly re la te d e le m e n ts, w h ile it is e a sily
su b v e rte d if le ss re la te d / p o p u la r ta g s a re b e in g u se d , d u e to
th e siz e a n d sp a rsity o f fo lk so n o m ie s o n th e w e b . In [10],
u se rs’ sim ila rity is e x p lo ite d fi rst to g e n e ra te a se t o f ta g s o f
re le v a n c e to th e ta rg e t u se r, th e n to re c o m m e n d h im ite m s
d e sc rib e d b y su ch ta g s; a s fo r F o lk R a n k , th is a p p ro a ch is
m o stly ta ilo re d to sc e n a rio s w ith h ig h u se rs’ a c tiv ity a n d lo w
ta g n o ise . S o c ia l R a n k in g fo c u se s o n sc e n a rio s w h e re th e re
is o n ly little in fo rm a tio n sh a re d b e tw e e n u se rs in ste a d . In
th e se se ttin g s, w e h a v e d e m o n stra te d h o w a c o m b in a tio n o f
u se rs’ a n d ta g s’ sim ila rity c a n a m e lio ra te th e sp a rsity p ro b le m . F u rth e r im p ro v e m e n ts c o u ld b e a ch ie v e d b y c lu ste rin g
u se rs w ith in b e tte r sc o p e d c o m m u n itie s; w e in te n d to e x p lo re th is a sp e c t n e x t.
5.
CONCLUSION
In th is p a p e r w e h a v e p re se n te d S o c ia l R a n k in g , a te ch n iq u e th a t a im s to im p ro v e c o n te n t se a rch e s in W e b 2 .0 sc e n a rio s, b y e x p lo itin g u se rs’ sim ila rity a n d ta g s’ sim ila rity .
T h e fo rm e r is u se d to im p ro v e a c c u ra c y : th e h ig h e r th e
sim ila rity b e tw e e n th e q u e ry in g u se r a n d th e u se r th a t h a s
b o o k m a rk e d it, th e h ig h e r th e ch a n c e s th a t th e p a p e r is o f
re le v a n c e , th u s re d u c in g th e a m o u n t o f u n in te re stin g c o n te n t b e in g p re se n te d to u se rs. T h e la tte r is u se d to im p ro v e
c o v e ra g e : b y im p lic itly le a rn in g ta g s’ sim ila rity fro m th e ir
u sa g e , a la rg e r a m o u n t o f re le v a n t y e t u n p o p u la r c o n te n t
c a n b e u n c o v e re d , th u s a m e lio ra tin g th e p ro b le m o f h e te ro g e n e ity , sp a rsity a n d la ck o f stru c tu re in fo lk so n o m y .
O u r o n g o in g w o rk sp a n s d iff e re n t d ire c tio n s. F irst, w e
a re re fi n in g th e te ch n iq u e s w e u se to fi n d sim ila r u se rs a n d
sim ila r ta g s. F o r th e fo rm e r, w e h a v e sta rte d a n a ly sin g
th e im p a c t o f a v a rie ty o f c lu ste rin g te ch n iq u e s to id e n tify
c o m m u n itie s o f u se rs w ith sh a re d in te re sts; b e y o n d sim ila rity in th e ta g s’ u sa g e , th e re e x ist o th e r p a ra m e te rs o f
re le v a n c e , in c lu d in g le v e l o f a c tiv ity (to d istin g u ish a c tiv e
u se rs w h o c o n trib u te to th e k n o w le d g e b a se , fro m p a ssiv e
c o n su m e rs), v a rie ty o f ta g s u se d (u n p o p u la r ta g s m a y re v e a l
m o re a b o u t a u se r’s in te re sts th a n p o p u la r o n e s), a n d so o n .
F o r th e la tte r, w e a re e n rich in g q u e ry e x p a n sio n w ith w o rd s
th a t a re se m a n tic a lly c lo se , a s d e fi n e d b y d ic tio n a ry -b a se d
a p p ro a ch e s lik e W o rd N e t (h ttp :/ / w o rd n e t.p rin c e to n .e d u / ).
F u rth e r e v a lu a tio n is a lso re q u ire d , u sin g d iff e re n t W e b 2 .0
d a ta se ts (e .g ., L a st.fm , B ib so n o m y , d e l.ic io .u s, e tc .), d iff e re n t sim ila rity m e a su re s (e .g ., P e a rso n , c o n c o rd a n c e , e tc .),
a n d c o m p a rin g a g a in st le ss n a iv e a p p ro a ch e s (e .g , [8 ]).
A cknow ledgm ents. T h e a u th o rs w o u ld lik e to th a n k S o n ia B e n M o k h ta r, F ra n c o R a im o n d i, N e a l L a th ia a n d M a tte o
D e ll’A m ic o fo r th e ir c o n tin u o u s h e lp a n d th e u se fu l d isc u ssio n s w h ich le a d to th e p u b lic a tio n o f th is w o rk .
6.
REFERENCES
[1] A. Agresti. Analysis of Ordinal Categorical Data.
J oh n W iley and S ons, 19 8 4 .
[2 ] S . G older and B . A. H u b erm an. U sage p atterns of
collab orativ e tagging system s. Journal of Information
S c ie nce , 3 2 (2 ):19 8 – 2 0 8 , 2 0 0 6 .
[3 ] H . H alp in, V . R ob u , and H . S h ep h erd. T h e com p lex
dynam ics of collab orativ e tagging. In P roc . of th e 1 6 th
Intl. C onfe re nce on W orld W id e W e b , p ages 2 11– 2 2 0 ,
N ew Y ork , N Y , U S A, 2 0 0 7 .
[4 ] Y . H assan-M ontero and V . H errero-S olana. Im p rov ing
tag-clou ds as v isu al inform ation retriev al interfaces. In
Intl. C onfe re nce on M ultid isc ip linary Information
S c ie nce s and T ec h nolog ie s, M erida, S p ain, Octob er
2006.
[5 ] J . L . H erlock er, J . A. K onstan, A. B orch ers, and
J . R iedl. An Algorith m ic F ram ew ork for P erform ing
Collab orativ e F iltering. In P roc . of th e 2 2 nd Intl.
A C M S IG IR C onfe re nce on R e searc h and
D e v e lop me nt in Information R e trie v al, p ages 2 3 0 – 2 3 7 ,
19 9 9 .
[6 ] P . H eym ann and H . G arcia-M olina. Collab orativ e
Creation of Com m u nal H ierarch ical T ax onom ies in
S ocial T agging S ystem s. T ech nical R ep ort 2 0 0 6 -10 ,
S tanford U niv ersity, Ap ril 2 0 0 6 .
[7 ] P . H eym ann, G . K ou trik a, and H . G arcia-M olina. Can
S ocial B ook m ark ing Im p rov e W eb S earch ? R e source
S h e lf, N ov em b er 2 0 0 7 .
[8 ] A. H oth o, R . J äsch k e, C. S ch m itz , and G . S tu m m e.
Inform ation R etriev al in F olk sonom ies: S earch and
R ank ing In P roc . of th e 3 rd E urop ean S e mantic W e b
C onfe re nce , p ages 4 11– 4 2 6 , 2 0 0 6 .
[9 ] W . H . H su , J . L ancaster, M . S . P aradesi, and
T . W eninger. S tru ctu ral L ink Analysis from U ser
P rofi les and F riends N etw ork s: A F eatu re
Constru ction Ap p roach . In Intl. C onfe re nce on
W e b log s and S oc ial M ed ia, M arch 2 0 0 7 .
[10 ] A.-T . J i, C. Y eon, H .-N . K im and, and G .-S . J o.
Collab orativ e T agging in R ecom m ender S ystem s. In
A d v ance s in A rtifi c ial Inte llig e nce , M arch 2 0 0 7 .
[11] O. K aser and D. L em ire. T ag-Clou d Draw ing:
Algorith m s for Clou d V isu aliz ation, T agging and
M etadata for S ocial Inform ation Organiz ation. In Intl.
C onfe re nce on th e W orld W id e W e b , Alb erta, Canada,
Octob er 2 0 0 7 .
[12 ] S . K elk ar, A. J oh n, and D. S eligm ann. An
Activ ity-b ased P ersp ectiv e of Collab orativ e T agging.
In Intl. C onfe re nce on W e b log s and S oc ial M ed ia,
M arch 2 0 0 7 .
[13 ] G . K ou trik a, F . A. E ff endi, Z . G yöngyi, P . H eym ann,
and H . G arcia-M olina. Com b ating sp am in tagging
system s. In P roc . of th e 3 rd Intl. W ork sh op on
A d v e rsarial Information R e trie v al on th e W e b , p ages
5 7 – 6 4 , N ew Y ork , N Y , U S A, 2 0 0 7 . ACM P ress.
[14 ] N . L ath ia, S . H ailes, and L . Cap ra. T h e eff ect of
correlation coeffi cients on com m u nities of
recom m enders. In P roc . of 2 3 rd A C M S y mp osium on
A p p lied C omp uting , 2 0 0 8 .
[15 ] K . F . L aw rence and M . C. S ch raefel. B ringing
Com m u nities to th e S em antic W eb and th e S em antic
W eb to Com m u nities. In P roc . of th e 1 5 th Intl.
C onfe re nce on W orld W id e W e b , 2 0 0 6 .
[16 ] X . L i, L . G u o, and Y . E . Z h ao. T ag-b ased S ocial
Interest Discov ery. In P roc . of th e 1 7 th Intl. W orld
W id e W e b C onfe re nce , 2 0 0 8 .
[17 ] R . N ak am oto, S . N ak ajim a, J . M iyaz ak i, and
S . U em u ra. T ag-b ased Contex tu al Collab orativ e
F iltering. In 1 8 th IE IC E D ata E ng inee ring W ork sh op ,
2007.
[18 ] A. P assant. U sing Ontologies to S trength en
F olk sonom ies and E nrich Inform ation R etriev al in
W eb logs. In P roc . of Intl. C onfe re nce on W e b log s and
S oc ial M ed ia, 2 0 0 7 .
[19 ] H . P olat and W . Du . P riv acy-P reserv ing Collab orativ e
F iltering u sing R andom iz ed P ertu rb ation T ech niq u es.
In T h e 3 rd IE E E Intl. C onfe re nce on D ata M ining
(IC D M ), M elb ou rne, F L , N ov em b er 2 0 0 3 .
[2 0 ] T . R attenb u ry, N . G ood, and M . N aam an. T ow ards
au tom atic ex traction of ev ent and p lace sem antics
from fl ick r tags. In P roc . of th e 3 0 th Intl. A C M S IG IR
C onfe re nce on R e searc h and D e v e lop me nt in
Information R e trie v al, p ages 10 3 – 110 , N ew Y ork , N Y ,
U S A, 2 0 0 7 .
[2 1] J . R u an and W . Z h ang. Identifying netw ork
com m u nities w ith a h igh resolu tion. P h y sical R e v ie w E
(S tatistical, N onlinear, and S oft M atte r P h y sic s),
7 7 (1), 2 0 0 8 .
[2 2 ] J . S ch afer, J . A. K onstan, and J . R iedl. R ecom m ender
S ystem s in E -com m erce. In A C M C onfe re nce on
E lec tronic C omme rce , p ages 15 8 – 16 6 , 19 9 9 .
[2 3 ] S . S en, S . K . L am , A. M . R ash id, D. Cosley,
D. F rank ow sk i, J . Osterh ou se, M . F . H arp er, and
J . R iedl. tagging, com m u nities, v ocab u lary, ev olu tion.
In P roc . of th e 2 0 th C onfe re nce on C omp ute r
S up p orted C oop e rativ e W ork , p ages 18 1– 19 0 , N ew
Y ork , N Y , U S A, 2 0 0 6 .
[2 4 ] K . H . L . T so-S u tter, L . B . M arinh o, and
L . S ch m idt-T h iem e. T ag-aw are R ecom m ender S ystem s
b y F u sion of Collab orativ e F iltering Algorith m s. In
P roc . of 2 3 rd A C M S y mp osium on A p p lied
C omp uting , p ages 16 – 2 0 , 2 0 0 8 .
[2 5 ] X . W u , L . Z h ang, and Y . Y u . E x p loring social
annotations for th e sem antic w eb . In P roc . of th e 1 5 th
Intl. C onfe re nce on W orld W id e W e b , p ages 4 17 – 4 2 6 ,
N ew Y ork , N Y , U S A, 2 0 0 6 .
[2 6 ] C. M . A. Y eu ng, N . G ib b ins, and N . S h adb olt. M u tu al
Contex tu aliz ation in T rip artite G rap h s of
F olk sonom ies. In P roc . of th e 6 th Intl. S e mantic W e b
C onfe re nce and 2 nd A sian S e mantic W e b C onfe re nce
(IS W C / A S W C 2 0 0 7 ), B usan, S outh K orea, v olu m e
4 8 2 5 of L N C S , p ages 9 6 0 – 9 6 4 , N ov em b er 2 0 0 7 .
[2 7 ] V . Z anardi and L . Cap ra. S ocial R ank ing: F inding
R elev ant Content in W eb 2 .0 . In Intl. W ork sh op on
R ecomme nd e r S y ste ms. In conjunc tion w ith th e 1 8 th
E urop ean C onfe re nce on A rtifi c ial Inte llig e nce
(E C A I), P atras, G reece, J u ly 2 0 0 8 .
[2 8 ] D. Z h ou , E . M anav oglu , J . L i, L . C. G iles, and H . Z h a.
P rob ab ilistic m odels for discov ering e-com m u nities. In
P roc . of th e 1 5 th Intl. C onfe re nce on W orld W id e
W e b , p ages 17 3 – 18 2 , N ew Y ork , N Y , U S A, 2 0 0 6 .
Download