Using ODP Metadata to Personalize Search Presented by Lan Nie 09/21/2005, Lehigh University

advertisement
Using ODP Metadata to
Personalize Search
Presented by Lan Nie
09/21/2005, Lehigh University
Introduction

ODP metadata
4
million sites, 590,000 categories
 Tree Structure



Categories: inner node
Pages: leaf node, high quality, representative
Using ODP Metadata to personalize Search
 4 billion vs. 4 million
 Using ODP Metadata for
 Is biasing possible in the
personalized search
ODP context?
Extend ODP classifications from its current 4 million to a 4 billion
Web automatically by biasing
Using ODP Metadata For Personalized Search


User Profile: several topics from ODP selected by user
Personalized Search



Send Q to a search Engine S(E.g., Google, ODP Search)
Res=URLs returned by S
For i= 1 to size(Res)
Dist[i]=Distance(Res[i], Prof)


Representation


Resort Res based on Dist
Both user profile and URL(50% in Google directory) can be
represented as a set of nodes in the directory tree
Distance ( Profile, URL)

Minimum distance between the 2 set of nodes.

Naïve Distances
Minimum tree distance


Intra-topic links
Subsumer
Graph shortest path


Inter-topic links
Complex Distance
The bigger the subsumer’s depth is, the more related are the nodes
s ( a, b)  ((1   )e
,

l1
  .e
l2
e h  e  h
). h
e  e  h
Combing with Google PageRank
Some Google Results are not annotated
s ,, ( a, b)  
1
 (1   ). PageRank (b)
,
1  s ( a , b)
Experimental Results
Extending ODP Annotations To The Web
 Manual
annotation for the whole web is impossible
 Biasing is an implicit way for extending annotations to
the Web
 Is basing possible in the ODP context?
Are ODP entries good biasing sets to obtain relevant results:
generate rankings which are different enough from the nonbiased ranking
 When
does biasing make a difference?
Find the characteristics the biasing set has to exhibit in order to
obtain relevant results
Experimental Setup


Compare the similarity between top 100 non-biased
PageRank results and biased results
Similarity Measure

OSIM: degree of overlap between the top n elements of two rank
lists
Topn ( 1 )  Topn ( 1 )
n

KSim: degree of agreement on ordering between the two rank lists
(u, v ) :  1 , 2' agree
'
and
on
uv
|U | . |U  1 |
order (u, v )

Choice of Biasing Sets




Top [0-10]% PageRank pages
Top[0-2]% PageRank pages
Randomly selected pages
Low PageRank pages

Varied the sum of score within the set between 0.000005%
and 10% of the total sum over all pages (TOT).

Experiments are done on a crawl of 3 million pages, and
then applied on Stanford WebBase crawl.
Biasing set consists of good pages
Biasing set consists of random selected pages
According to the random model of biasing, every set with TOT
below 0.015% is good for biasing.
 Results are not influence by the crawl size
(3 million crawl vs 120 million WebBase crawl)
 Entries in ODP have TOT below than 0.015% thus biasing is
possible in the ODP context

Conclusions


A Personalized search algorithm to rank urls based on
the distance between user profile and url in the ODP
taxonomy.
Biasing on ODP entries will take effect, thus it is feasible
to extend the manual ODP classification to the Web is
feasible
Download