Slides

advertisement
A Probabilistic Model
for Fine-Grained Expert Search
Shenghua Bao, Huizhong Duan, Qi Zhou,
Miao Xiong, Yunbo Cao, Yong Yu
June 16--18, 2008, Columbus Ohio
Schedule
1
Introduction
2
Fine-grained Expert Search
3
Experimental Results
4
Conclusion
Introduction

Expert Search

“who is an expert on X?”
User
Query
Search
Engine
Experts
Who are experts
on Semantic Web
Search Engine?
3
Introduction

Pioneering Expert Search Systems

Log data in software development


Email communications


Kautz et al., 1996; Mockus and Herbsleb, 2002;
McDonald and Ackerman, 1998; etc.
Campbell et al., 2003; Dom et al. 2003; Sihn and
Heeren, 2001; etc.
General documents

Yimam, 1996; Davenport and Prusak, 1998; Steer and
Lochbaum, 1988; Mattox et al., 1999; Hertzum and
Pejtersen, 2000; Craswell et al., 2001; etc.
Introduction

Expert Search at TREC

A new task at TREC 2005, 2006, 2007




Craswell et al., 2005;
Soboroff et al., 2006;
Bailey et al., 2007;
Many approaches have been proposed





Two generative models, Balog et al. 2006
Prior distribution, relevance feedback, Fang et al. 2006
Hierarchical language model, Petkova and Croft 2006
Voting and data fusion, Macdonald and Ounis 2006
…
Introduction

Coarse-grained approach.


Expert search is carried out under a grain of
document.
Further improvements are hard to achieve
Different
blocks of
electronic
documents
Different
functions and
qualities
Different
impacts for
expert search
Examples
irrelevant
relevant
Window
queried topic
Windowed Section Relation
7
Examples
Title
Query: Timed Text
Author
Title-Author Relation
8
Examples
Reference Section Relation
9
Examples
Query: W3C Management Team
<H1>
<H2>
Section Title-Body Relation
10
Schedule
1
Introduction
2
Fine-grained Expert Search
3
Experimental Results
4
Conclusion
Fine-grained Expert Search --Evidence Extraction
Fine-grained Evidence
<topic, person, relation, document>
•
Who are
experts on
Semantic
Web Search
Engine?
Document-001:
“…a high-level plan of the architecture
of the semantic web by Tim BernersLee… ”
Tim Berners-Lee
“…later, Berners-Lee describes a
semantic web search engine
experience…”
E1: <semantic web, Tim Berners-Lee, same-section, document-001>
E2: <semantic web search engine, Berners-Lee, same-section, document-001>
12
Fine-grained Expert Search –Search Model
Query
(q)
<topic, person, relation, document>
(t,p,r,d)
Expert Candidate
(c)
P(c | q)   P(c, e | q)   P(c | e, q)P(e | q)
e
e
  P(c | e)P(e | q)
e
Expert Matching
Model
Evidence Matching
Model
Fine-grained Expert Search -- Expert Matching
P(c | e)  P(c | p, r, d )
<topic, person, relation, document>
(<t, p, r, d> for short)
 P(c | p) P( p | r, d )
P(c | p)  Ptype(c, p)
Mask
P( p | r, d ) 
freq( p, r, d )
L( r , d )
Sample
Full Name
Ritu Raj Tiwari
Email Name
rtiwari@nuance.com
Combined Name
Tiwari, Ritu R;
Abbr. Name
Ritu Raj ; Ritu
Short Name
RRT
Alias, new email
rtiwari@hotmail.com
P( p | r , d ' )
| D|
d 'D
PS ( p | r , d )  P( p | r , d )  (1   ) 
Fine-grained Expert Search -- Evidence Matching
P(e | q)  P(t, p, r, d | q)  P(t | q) P( p | q) P(r | q) P(d | q)
P(t | q)  Ptype(t, q)
Type
Sample
P ( p | Semantic
q ) Web
P (Search
p ) Engine
Query
P ( r | q )  P ( type( r ))
P( q | dRelation
) P(dType
)
P(d | q)  Same Section  P( q | d ) P(d )
P( q) Section
Windowed
Phrase
“Semantic Web Search Engine”
Bi-gram
“Semantic Web” “Search Engine”
Reference Section
Dynamic Quality
Proximity
“Semantic … Web Search Engine”
Title-AuthorStatic Qualify
“Samentic Web Saerch Engine”
Section Title-Body
Quality Type
Search (
Engin”
P ( e | q“Semantic
)  PWebtype
t , q ) P ( p ) P (type( r ))P ( q | d ) P ( d )
Fuzzy
Stemmed
Schedule
1
Introduction
2
Fine-grained Expert Search
3
Experimental Results
4
Conclusion
Experimental Result

W3C Corpus





331,307 web pages
10 training topics of TREC 2005
50 test topics of TREC 2005
49 test topics of TREC 2006
Evaluation Metrics



Mean average precision (MAP)
R-precision (R-P)
Top N precision (P@N)
Experimental Result

Query Matching
TREC 2005
MAP
Baseline
0.1840
+Bi-gram
0.1957
+Proximity
0.2024
+ Fuzzy, Stemmed 0.2030
Improv.
10.33%
T-test
0.0084
R-P
0.2136
0.2438
0.2501
0.2501
TREC 2006
P@10
0.3060
0.3320
0.3360
0.3360
MAP
0.3752
0.4140
0.4530
0.4580
R-P
0.4585
0.4910
0.5137
0.5112
P@10
0.5604
0.5799
0.5922
0.5901
17.09% 9.80% 22.07% 11.49%
0.0000
5.30%
Experimental Result

Person Matching
TREC 2005
MAP
Baseline
0.2030
+ Combined Name 0.2056
+ Abbr. Name
0.2106
+ Short Name
0.2111
+ Alias, new email 0.2156
Improv.
6.21%
T-test
0.0064
R-P
0.2501
0.2539
0.2545
0.2578
0.2591
3.60%
TREC 2006
P@10
MAP
0.3360 0.4580
0.3463 0.4709
0.3400 0.5010
0.3400 0.5121
0.3400 0.5221
1.19% 14.00%
0.0057
R-P
0.5112
0.5152
0.5181
0.5192
0.5212
1.96%
P@10
0.5901
0.5931
0.6000
0.6000
0.6000
1.68%
Experimental Result

Multiple Relations
TREC 2005
Baseline
+Windowed Section
+Reference Section
+Title-Author
+Section Title-Body
Improv.
T-test
MAP
0.2156
0.2158
0.2160
0.2234
0.2586
19.94%
0.0013
R-P
0.2591
0.2633
0.2630
0.2634
0.3107
19.91%
P@10
0.3400
0.3380
0.3380
0.3580
0.3740
10.00%
TREC 2006
MAP
0.5221
0.5255
0.5272
0.5354
0.5657
8.35%
0.0043
R-P
0.5212
0.5311
0.5314
0.5355
0.5669
8.77%
P@10
0.6000
0.6082
0.6061
0.6245
0.6510
8.50%
Experimental Result

Evidence Quality
TREC 2005
Baseline
+Static quality
+Dynamic quality
Improv.
T-test
MAP
R-P
0.2586 0.3107
0.2711 0.3188
0.2755 0.3252
6.13% 4.67%
0.0360
Rank 1 @TREC
0.2749
TREC 2006
P@10
0.3740
0.3720
0.3880
3.74%
0.3330 0.4520
MAP
R-P
P@10
0.5657 0.5669 0.6510
0.5900 0.5813 0.6796
0.5943 0.5877 0.7061
2.86% 3.67% 8.61%
0.0252
0.5947
0.5783 0.7041
Schedule
1
Introduction
2
Fine-grained Expert Search
3
Experimental Results
4
Conclusion
Conclusion

Fine-grained expert search

Probabilistic model and its implementation

Evaluation on the TREC data set
Download