Personalized Ontologies for Web Search and Caching

advertisement
Personalized Ontologies for Web
Search and Caching
Susan Gauch
Information and Telecommunications
Technology Center
Electrical Engineering and Computer Science
The University of Kansas
Department of Electrical Engineering and Computer Science
ITTC
Outline
Motivation
User profiles
 creation and maintenance
 evaluation
Applications
 re-ranking (and filtering) search results
 Web caching
Conclusions
Professor Susan Gauch
December 1999
Department of Electrical Engineering and Computer Science
ITTC
Motivation
Decrease access time for Web pages
 Server approaches
- use access logs to decrease access times for popular
pages
- not tailored to individuals
- doesn’t decrease network traffic
 Network approaches
- cache popular pages multiple places in the network
- not tailored to individuals
Professor Susan Gauch
December 1999
Department of Electrical Engineering and Computer Science
ITTC
Personalization
Different information needs for different users
 can we learn user’s interest?
- Explicitly?
- Implicitly
 can we use this information?
- improved search
- improved browsing
- faster Web page access
Professor Susan Gauch
December 1999
Department of Electrical Engineering and Computer Science
ITTC
Intelligent Web Caching
Improved (and faster) search results
 pre-caching all search results expensive
- Internet search engines return 50% irrelevant pages
 improved knowledge of user’s likely behavior
- intelligent pre-caching
- use past behaviors to predict future behaviors
- pre-cache “best” pages close to individuals
Professor Susan Gauch
December 1999
Department of Electrical Engineering and Computer Science
ITTC
Context
ProFusion: www.profusion.com
OBIWAN: distributed content based IR
 Web clustered into regions
 clustering criteria: content, location, company
 search: query brokered to “best” regions; within
region brokered to most promising sites
 browsing a region means browsing its sites
simultaneously
 www.ittc.ukases.edu/obiwan
Professor Susan Gauch
December 1999
Department of Electrical Engineering and Computer Science
ITTC
User Profiles
Applications
 Usenet news filtering
 recommendation services: web browsing, books
 intelligent pre-caching
Should
 accurately reflect actual interests
 require as little feedback as possible
 be dynamic
Professor Susan Gauch
December 1999
Department of Electrical Engineering and Computer Science
ITTC
User profiles: Creation
Obvious and often used: keywords
 not structured (ambiguous)
 static
 have to be explicitly mentioned
Our approach
 watch over a user's shoulder while surfing
 automatically determine documents’ content
 central: large ontology (concept hierarchy)
Professor Susan Gauch
December 1999
Department of Electrical Engineering and Computer Science
ITTC
Document Classification
Documents as weighted
keyword vectors:
 n different words
-> n dimensions
 weights based on
word frequency and rarity
Browsing hierarchy: 10 web pages per node
Concatenate them -> keyword vector
Content
of
a
page:
most
similar
vector
Professor Susan Gauch
December 1999
Department of Electrical Engineering and Computer Science
ITTC
Updating profiles
Static: document related
 content: weights of top nodes for surfed document
 length of page
Dynamic: time spent
Combine them
 for instance:
weight * (time/length)
 changes in interest in the five categories
User profile: weighted ontology
Professor Susan Gauch
December 1999
Department of Electrical Engineering and Computer Science
ITTC
Profile evaluation
Accordance with actual user interests
 10/20 interest categories describe actual interests
 describe interests
“pretty well”: 3.5/5
Convergence
 stabilization of # of
categories over time?
 do converge after 320
surfed pages!
Professor Susan Gauch
December 1999
Department of Electrical Engineering and Computer Science
ITTC
Profiles: Summary
Stored as weighted ontologies
Profiles represent actual interests quite well
Up to 150 top categories
Two adjustment functions make profiles converge
 after 320 pages
 length of page doesn't really matter, but time spent
does
Professor Susan Gauch
December 1999
Department of Electrical Engineering and Computer Science
ITTC
Personalizing Search Results
50% of top 20 results irrelevant
Same search mechanism for 200 million people?
Goal:
 identify relevant documents and put them on top of the
result list
 (pre-fetch relevant results)
Difficult problem: 10% increase is very good
Professor Susan Gauch
December 1999
Department of Electrical Engineering and Computer Science
ITTC
Re-Ranking
Ranking a function of:
 search engine's original ranking
 extents to which top 5 categories describe document's
content
 personal interest in each of these top categories
“More relevant items on top of result list”:
 system’s ability to
present all relevant items
 system’s ability to present
only relevant items
Professor Susan Gauch
December 1999
ITTC
Department of Electrical Engineering and Computer Science
Recall and Precision
Combination: Recall/Precision graphs
Example: ranked documents 1,…,20
Professor Susan Gauch
0.6
0.5
precision
 relevant 2,5,10,14,19
 recall points 1/5, 2/5,
3/5, 4/5, 5/5
 precisions 1/2, 2/5, 3/10,
4/14, 5/19
0.4
0.3
0.2
0.1
0
0.2
0.4
0.6
0.8
recall
December 1999
1
Department of Electrical Engineering and Computer Science
ITTC
Re-Ranking: Evaluation
Overall performance increase of up to 8%
 at each recall cutoff, up to 10% more relevant
documents have been retrieved
Professor Susan Gauch
December 1999
Department of Electrical Engineering and Computer Science
ITTC
Browsing Assistance
Analyze current page
 locate links
Identify which links are most likely to be
followed by the user
 popularity of the link overall
 relevance of linked page to user’s interests
Problem
 if you have to download the whole page to analyze
it, you’ve increased the network utilization
Professor Susan Gauch
December 1999
Department of Electrical Engineering and Computer Science
ITTC
Privacy
Is the user aware that their behavior is being
monitored?
Can users turn it off?
Where are profiles stored?
With whom are profiles shared?
How are profiles protected?
How are profiles used?
Professor Susan Gauch
December 1999
Department of Electrical Engineering and Computer Science
ITTC
Conclusions
Automatic creation of structured user profiles is
possible
Profiles are reasonably accurate
Applications in improving the search quality and
Web page access efficiency
Evaluation of re-ranking search results:
performance increase of up to 8%
Professor Susan Gauch
December 1999
Department of Electrical Engineering and Computer Science
ITTC
Future Work
Incorporating profile generator into browser
Connect system to ProFusion, OBIWAN
Personalize structure of ontology
Re-train classifier
More applications: recommendation service, web
caching, browsing, ...
Explicit user feedback?
Professor Susan Gauch
December 1999
Download