SIGIR2012Tutorial

advertisement

B

EYOND

B

AG

-

OF

-W

ORDS

: M

ACHINE

L

EARNING FOR

Q

UERY

-D

OCUMENT

M

ATCHING IN

W

EB

S

EARCH

S UMMARY

Dealing with mismatch between query and document is one of the most critical research problems in web search.

Recently researchers have spent significant effort to address the grand challenge. The major approach is to conduct more query and document understanding, and perform matching between enriched query and document representations.

With the availability of large amount of log data and advanced machine learning techniques, this becomes more feasible and significant progress has been made recently. In this tutorial, we will give a systematic and detailed survey on newly developed machine learning technologies for query document matching in web search. We will focus on the descriptions on the fundamental problems, as well as the novel solutions. Matching between query and document is not limited to search, and similar problems can be observed at online advertisement, recommendation system, and other applications, as matching between objects from two spaces.

BIO:

Hang Li is senior researcher and research manager at Microsoft Research Asia. He is also adjunct professors at Peking

University, Nanjing University, Xi’an Jiaotong University, and Nankai University. His research areas include information retrieval, natural language processing, statistical machine learning, and data mining. He graduated from

Kyoto University in 1988 and earned his PhD from the University of Tokyo in 1998. He worked at the NEC lab in

Japan during 1991 and 2001. He joined Microsoft Research Asia in 2001 and has been working there until present.

Hang has about 100 publications at top international journals and conferences, including SIGIR, WWW, WSDM, ACL,

EMNLP, ICML, NIPS, and SIGKDD. He and his colleagues’ papers received the SIGKDD’08 best application paper award and the SIGIR’08 best student paper award. Hang has also been working on the development of several products.

These include Microsoft SQL Server 2005, Microsoft Office 2007 and Office 2010, Microsoft Live Search 2008,

Microsoft Bing 2009 and Bing 2010. He has also been very active in the research communities and severed or is serving the top conferences and journals. For example, in 2011, he is PC co-chair of WSDM’11; area chairs of

SIGIR’11, AAAI’11, NIPS’11; PC members of WWW’11, ACL-HLT’11, SIGKDD’11, ICDM’11, EMNLP’11; editorial board members of Journal of the American Society for Information Science and Journal of Computer Science

& Technology. http://research.microsoft.com/en-us/people/hangli/ .

Jun Xu is Associate Researcher at Microsoft Research Asia. He received his PhD in computer science from Nankai

University China in 2006. After that, he joined Microsoft Research Asia. His research interest focuses on information retrieval and text mining. Jun has published extensively in prestigious conferences and journals including SIGIR,

WWW, JMLR, ECML, and ECIR. Jun is very active in the research communities and severed or is serving the top conferences and journals. He developed the learning to rank algorithms of IR-SVM and AdaRank, as well as the

LETOR dataset. He released the AdaRank algorithm and LETOR dataset to the academic. Jun has also been working on the development of Microsoft products including Microsoft Bing 2010 and Office 2011. http://research.microsoft.com/en-us/people/junxu/ .

1.

Outline

1.

Learning for Matching between Query and Document

Query Document Match in Search

– Mismatch: Biggest Challenge in Search

– Matching at Different Levels

– Matching in Different Ways

Learning for Matching between Query and Document

Discussions

– Relation between Ranking and Matching

Previous Work

– Semantic Matching

– Long Tail Challenge

2.

Matching by Query Reformulation

Query Reformulation

Blending

Methods of Query Reformulation

Methods of Blending

3.

Matching with Dependency Model

Matching based on Term Dependency

Matching with Markov Random Field Models

4.

Matching with Statistical Machine Translation Model

Statistical Machine Translation

Matching with Translation Model

Issues in Matching with Translation Model

Methods for Matching with Translation Models

5.

Matching with Topic Model

Topic Modeling

Methods of Matching with Topic Model

Two Approaches to Topic Modeling

6.

Matching in Latent Space

7.

Generalization: Learning to Match

8.

Summary and Open Problems

A UDIENCE

People in the industry can get a summary of the state-of-the-art methods and think about how to apply them in practice, and people in the academia can get a reference of the recent work and leverage the result in their own research.

Download