slides

advertisement
An Ontology-based Mining Approach for
User Search Intent Discovery
Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella,
Abdulmohsen Algarni and Xiaohui Tao
ADCS 2011, 2nd Dec, Canberra
Queensland University of Technology
Outline
• Introduction
• Related work
• Proposed Approach
–
–
–
–
An overview of the architecture
World knowledge base
Personalized ontology construction
In-levels ontology mining method
a university for the
real world
R
CRICOS No. 00213J
2
Outline
• Evaluation
–
–
–
–
Data collections
Measures & Baseline model
Results and findings
Discussion
• Conclusion and future work
a university for the
real world
R
CRICOS No. 00213J
3
Introduction
• Retrieving desired information to a user is the primary objective of
an effective search engines
• Many efforts are spent to improve search capabilities, e.g….
• No doubt that they are helpful, however, they are commonly
encountering an issue – information mismatch (ambiguity)
a university for the
real world
R
CRICOS No. 00213J
4
Introduction
• To overcome the issue, more and more researchers have taken
ontologies into account
• The ontologies can classify diverse knowledge into a wellstructured way, which facilitate users to assess information items
• Moreover, semantic relations can be considered to enhance
information navigation
a university for the
real world
R
CRICOS No. 00213J
5
Introduction
• Note that user search intent is a significant aspect to return desired
information
• We study search intents into two means: Specificity and
Exhaustivity intent
• A hierarchical concept level-finding technique is proposed to
discover and characterize user search intents
a university for the
real world
R
CRICOS No. 00213J
6
Introduction
• an ontology-based approach is introduced
• Library of Congress Subject Headings is applied as a world
knowledge base for learning personalized ontologies
• In-levels ontology mining method is fully described
• Evaluated by 100 RCV1 topics in TREC 2002 Filtering Track
• The results indicate that the performance of top precision is
improved dramatically.
a university for the
real world
R
CRICOS No. 00213J
7
Related work
• Ontology-based techniques
– Zhong proposes a learning approach for task (or domain-specic)
ontology, which employs various mining techniques and naturallanguage understanding methods.
– Li and Zhong present an automatic ontology learning method, in which
a class is called a compound concept, assembled by primitive classes
that are the smallest concepts and cannot be divided any further.
– …
a university for the
real world
R
CRICOS No. 00213J
8
Related work
• Ontology-based techniques
– They don't consider the purpose of discovering and characterizing
user search intents in a concept level.
– To extend the previous methods, the paper uses “Is-A“ relation to build
a real hierarchical structure for the backbone of personalized
ontologies
a university for the
real world
R
CRICOS No. 00213J
9
Related work
• User information needs
– Jiang and Tan aim to represent and capture users' interests in target
domain. Subsequently, a method, they called Spreading Activation
Theory (SAT), is employed for providing personalized services.
– Tao et al. propose an ontology-based knowledge retrieval framework
to capture user information needs by considering user knowledge
background and user's local instance repository with association roles
and data mining techniques.
– …
– They are normally either expensive in extraction or inaccurate in
description.
a university for the
real world
R
CRICOS No. 00213J
10
Proposed approach
• The paper first holds a hypothesis that a user search intent should
exist somewhere in an ontology.
• The intent could be general or specific, and can be represented in
a range of extent
a university for the
real world
R
CRICOS No. 00213J
11
Proposed approach
• An overview of the approach
a university for the
real world
R
CRICOS No. 00213J
12
Proposed approach
• World knowledge base (LCSH)
– In the LCSH, subject headings are basic semantic units for conveying
domain knowledge and concepts, they have three main types of
references: Broader Term, Narrower Term and Related Term.
– Refine to ancestor and descendant lexical relations respectively in our
approach
a university for the
real world
R
CRICOS No. 00213J
13
Proposed approach
• World knowledge base (cont.)
– Definitions
a university for the
real world
R
CRICOS No. 00213J
14
Proposed approach
• Personalized ontology learning
– Concept hierarchy is an essential object of ontology learning
– Here, we create an abstract hieratical structure
a university for the
real world
R
CRICOS No. 00213J
15
Proposed approach
• Personalized ontology learning (cont.)
– Definitions
a university for the
real world
R
CRICOS No. 00213J
16
Proposed approach
• Personalized ontology learning (cont.)
– An example
a university for the
real world
R
CRICOS No. 00213J
17
Proposed approach
• In-Levels ontology mining method
– Represent feature in levels (two objectives)
• 1) to decide subjects and weights for the pilot level;
• 2) to represent it as a query
After that, do a query expansion. Then, obtain a feature as:
a university for the
real world
R
CRICOS No. 00213J
18
Proposed approach
• In-Levels ontology mining method (cont.)
– Determine the best level for user search intents
a university for the
real world
R
CRICOS No. 00213J
19
Evaluation
• Data collections
– A LCSH (QUT Library data in 2008) database 719 mega bytes data
stored in Microsoft Office Access Database (.mdb), totally 491,250
subjects associated with semantic relations
– TREC-11 2002 Filtering Track, RCV1, totally 806,791 xml documents
in training and testing sets.
– All of them are processed by the pre-processing approach (stopwords
removal, stemming)
a university for the
real world
R
CRICOS No. 00213J
20
Evaluation
• Measures & Baseline model
– Top 20 precision (pr@20), the precision averages at 11 standard
recall levels (11-points), the Mean Average Precision (MAP), and the
F1-Measure.
– ONTO model (Tao et al., 2010)
– Two uniform level settings in upper level 7 and lower level 2
respectively.
a university for the
real world
R
CRICOS No. 00213J
21
Evaluation
• Results and Findings
a university for the
real world
R
CRICOS No. 00213J
22
Evaluation
• Results and Findings (cont.)
a university for the
real world
R
CRICOS No. 00213J
23
Evaluation
• Results and Findings (cont.)
a university for the
real world
R
CRICOS No. 00213J
24
Evaluation
• Discussion
– The approach by only containing new terms has better performance
than the one keeps all the terms in levels
– Demonstrate the validity of the hierarchical backbone
– The experimental results are indistinct for all the measures, and those
specific terms might be able to reduce recall
– The approach is suitable to situations when precision is be considered
more important than others
– LCSH is difficult to keep up to date
a university for the
real world
R
CRICOS No. 00213J
25
Conclusion
• The paper introduces an ontology-based approach to discover user
search intents
• The approach involves a subject-based search model, a world
knowledge base, and a in-levels ontology mining method
• The empirical results indicate that our approach works remarkable
on top precision
• The main intellectual contribution is the hierarchical level-finding
technique
a university for the
real world
R
CRICOS No. 00213J
26
Future work
• Investigate the usage of the rest of semantic relations in LCSH
• Combine with pattern mining methods
• Test the approach with other world knowledge base, like WordNet
or Amazon
a university for the
real world
R
CRICOS No. 00213J
27
• Thank you for listening, any question?
a university for the
real world
R
CRICOS No. 00213J
28
Download