A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio Schedule 1 Introduction 2 Fine-grained Expert Search 3 Experimental Results 4 Conclusion Introduction Expert Search “who is an expert on X?” User Query Search Engine Experts Who are experts on Semantic Web Search Engine? 3 Introduction Pioneering Expert Search Systems Log data in software development Email communications Kautz et al., 1996; Mockus and Herbsleb, 2002; McDonald and Ackerman, 1998; etc. Campbell et al., 2003; Dom et al. 2003; Sihn and Heeren, 2001; etc. General documents Yimam, 1996; Davenport and Prusak, 1998; Steer and Lochbaum, 1988; Mattox et al., 1999; Hertzum and Pejtersen, 2000; Craswell et al., 2001; etc. Introduction Expert Search at TREC A new task at TREC 2005, 2006, 2007 Craswell et al., 2005; Soboroff et al., 2006; Bailey et al., 2007; Many approaches have been proposed Two generative models, Balog et al. 2006 Prior distribution, relevance feedback, Fang et al. 2006 Hierarchical language model, Petkova and Croft 2006 Voting and data fusion, Macdonald and Ounis 2006 … Introduction Coarse-grained approach. Expert search is carried out under a grain of document. Further improvements are hard to achieve Different blocks of electronic documents Different functions and qualities Different impacts for expert search Examples irrelevant relevant Window queried topic Windowed Section Relation 7 Examples Title Query: Timed Text Author Title-Author Relation 8 Examples Reference Section Relation 9 Examples Query: W3C Management Team <H1> <H2> Section Title-Body Relation 10 Schedule 1 Introduction 2 Fine-grained Expert Search 3 Experimental Results 4 Conclusion Fine-grained Expert Search --Evidence Extraction Fine-grained Evidence <topic, person, relation, document> • Who are experts on Semantic Web Search Engine? Document-001: “…a high-level plan of the architecture of the semantic web by Tim BernersLee… ” Tim Berners-Lee “…later, Berners-Lee describes a semantic web search engine experience…” E1: <semantic web, Tim Berners-Lee, same-section, document-001> E2: <semantic web search engine, Berners-Lee, same-section, document-001> 12 Fine-grained Expert Search –Search Model Query (q) <topic, person, relation, document> (t,p,r,d) Expert Candidate (c) P(c | q) P(c, e | q) P(c | e, q)P(e | q) e e P(c | e)P(e | q) e Expert Matching Model Evidence Matching Model Fine-grained Expert Search -- Expert Matching P(c | e) P(c | p, r, d ) <topic, person, relation, document> (<t, p, r, d> for short) P(c | p) P( p | r, d ) P(c | p) Ptype(c, p) Mask P( p | r, d ) freq( p, r, d ) L( r , d ) Sample Full Name Ritu Raj Tiwari Email Name rtiwari@nuance.com Combined Name Tiwari, Ritu R; Abbr. Name Ritu Raj ; Ritu Short Name RRT Alias, new email rtiwari@hotmail.com P( p | r , d ' ) | D| d 'D PS ( p | r , d ) P( p | r , d ) (1 ) Fine-grained Expert Search -- Evidence Matching P(e | q) P(t, p, r, d | q) P(t | q) P( p | q) P(r | q) P(d | q) P(t | q) Ptype(t, q) Type Sample P ( p | Semantic q ) Web P (Search p ) Engine Query P ( r | q ) P ( type( r )) P( q | dRelation ) P(dType ) P(d | q) Same Section P( q | d ) P(d ) P( q) Section Windowed Phrase “Semantic Web Search Engine” Bi-gram “Semantic Web” “Search Engine” Reference Section Dynamic Quality Proximity “Semantic … Web Search Engine” Title-AuthorStatic Qualify “Samentic Web Saerch Engine” Section Title-Body Quality Type Search ( Engin” P ( e | q“Semantic ) PWebtype t , q ) P ( p ) P (type( r ))P ( q | d ) P ( d ) Fuzzy Stemmed Schedule 1 Introduction 2 Fine-grained Expert Search 3 Experimental Results 4 Conclusion Experimental Result W3C Corpus 331,307 web pages 10 training topics of TREC 2005 50 test topics of TREC 2005 49 test topics of TREC 2006 Evaluation Metrics Mean average precision (MAP) R-precision (R-P) Top N precision (P@N) Experimental Result Query Matching TREC 2005 MAP Baseline 0.1840 +Bi-gram 0.1957 +Proximity 0.2024 + Fuzzy, Stemmed 0.2030 Improv. 10.33% T-test 0.0084 R-P 0.2136 0.2438 0.2501 0.2501 TREC 2006 P@10 0.3060 0.3320 0.3360 0.3360 MAP 0.3752 0.4140 0.4530 0.4580 R-P 0.4585 0.4910 0.5137 0.5112 P@10 0.5604 0.5799 0.5922 0.5901 17.09% 9.80% 22.07% 11.49% 0.0000 5.30% Experimental Result Person Matching TREC 2005 MAP Baseline 0.2030 + Combined Name 0.2056 + Abbr. Name 0.2106 + Short Name 0.2111 + Alias, new email 0.2156 Improv. 6.21% T-test 0.0064 R-P 0.2501 0.2539 0.2545 0.2578 0.2591 3.60% TREC 2006 P@10 MAP 0.3360 0.4580 0.3463 0.4709 0.3400 0.5010 0.3400 0.5121 0.3400 0.5221 1.19% 14.00% 0.0057 R-P 0.5112 0.5152 0.5181 0.5192 0.5212 1.96% P@10 0.5901 0.5931 0.6000 0.6000 0.6000 1.68% Experimental Result Multiple Relations TREC 2005 Baseline +Windowed Section +Reference Section +Title-Author +Section Title-Body Improv. T-test MAP 0.2156 0.2158 0.2160 0.2234 0.2586 19.94% 0.0013 R-P 0.2591 0.2633 0.2630 0.2634 0.3107 19.91% P@10 0.3400 0.3380 0.3380 0.3580 0.3740 10.00% TREC 2006 MAP 0.5221 0.5255 0.5272 0.5354 0.5657 8.35% 0.0043 R-P 0.5212 0.5311 0.5314 0.5355 0.5669 8.77% P@10 0.6000 0.6082 0.6061 0.6245 0.6510 8.50% Experimental Result Evidence Quality TREC 2005 Baseline +Static quality +Dynamic quality Improv. T-test MAP R-P 0.2586 0.3107 0.2711 0.3188 0.2755 0.3252 6.13% 4.67% 0.0360 Rank 1 @TREC 0.2749 TREC 2006 P@10 0.3740 0.3720 0.3880 3.74% 0.3330 0.4520 MAP R-P P@10 0.5657 0.5669 0.6510 0.5900 0.5813 0.6796 0.5943 0.5877 0.7061 2.86% 3.67% 8.61% 0.0252 0.5947 0.5783 0.7041 Schedule 1 Introduction 2 Fine-grained Expert Search 3 Experimental Results 4 Conclusion Conclusion Fine-grained expert search Probabilistic model and its implementation Evaluation on the TREC data set