Searching Locally-Defined Entities

advertisement
Searching Locally-Defined Entities
†
Zhaohui Wu†∗, Yuanhua Lv‡ , Ariel Fuxman‡
Computer Science and Engineering, Pennsylvania State University, PA, USA
‡
Microsoft Research, Mountain View, CA, USA
zzw109@psu.edu, {yuanhual, arielf}@microsoft.com
ABSTRACT
1.
When consuming content, users typically encounter entities
that they are not familiar with. A common scenario is when
users want to find information about entities directly within
the content they are consuming. For example, when reading the book “Adventures of Huckleberry Finn”, a user may
lose track of the character Mary Jane and want to find some
paragraph in the book that gives relevant information about
her. The way this is achieved today is by invoking the ubiquitous Find function (“Ctrl-F”). However, this only returns
exact-matching results without any relevance ranking, leading to a suboptimal user experience.
How can we go beyond the Ctrl-F function? To tackle
this problem, we present algorithms for semantic matching
and relevance ranking that enable users to effectively search
and understand entities that have been defined in the content that they are consuming, which we call locally-defined
entities. We first analyze the limitations of standard information retrieval models when applied to searching locallydefined entities, and then we propose a novel semantic entity retrieval model that addresses these limitations. We
also present a ranking model that leverages multiple novel signals to model the relevance of a passage. A thorough
experimental evaluation of the approach in the real-word application of searching characters within e-books shows that
it outperforms the baselines by 60%+ in terms of NDCG.
When consuming content, users typically encounter entities that they are not familiar with. To understand these
entities, they may switch to a search engine to get related
information. However, in addition to being distractive, this
has obvious limitations in common scenarios such as: (1)
an entity is only defined in the document that the user is
reading, and (2) the users prefer to see information about
an entity within the content that they are consuming.
As an illustration, consider the following example. A user
is reading a book on an e-reader. He/she finds a mention to
a minor character (e.g., “Mary Jane” in Huckleberry Finn)
but does not remember who the character is, or he/she finds the mention to another relatively popular character (e.g.,
“Aunt Sally”) that occurs many times in the book but cannot keep track what her role is. In the former, there may be
little information on the web (or available knowledge bases)
about this character, while in the latter, the users may prefer information about the character within the book. As a
result, in both cases, the user would want to find some paragraphs in the book that give relevant information about the
character, since the book itself contains enough information
to give a good understanding of them.
As another example, consider an enterprise scenario, where
the user is reading a document and finds a mention to the
name of a project. If this is a new project, there may be
no information about it in the company intranet (let alone
the Internet). However, the document itself may contain
enough introductory information about the project. Moreover, in some cases, the user may believe that the current
content is probably amongst the most relevant documents
for the result. As a result, the user would want to find some
paragraphs directly in the current document that give relevant information about the project.
In this paper, we tackle the problem of enabling users to
search and understand entities that have been defined in
the content that they are consuming. We call such entities
locally-defined entities. In the locally-defined entity search
task, the input (query) is the name (surface form) of an entity for which the user would like to find information; and the
output is a list of relevant passages extracted from the actual book/document that the user is reading. The passages
are ranked by how well they describe the input entity.
This work is motivated by one instance of locally-defined
entity search that has important applications: searching for
characters in e-books. This application is of significant practical interest as e-books become ever more prevalent. Ebooks are typically rather long and may mention hundreds
Categories and Subject Descriptors
H.3.3 [Information Search and Retrieval]: Information
filtering, retrieval models
General Terms
Algorithms; Experimentation
Keywords
Locally-Defined Entities; Descriptiveness; Within-document
Search
∗
This work was done when the first author was on a summer
internship at Microsoft Research.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
CIKM’14, November 3–7, 2014, Shanghai, China.
Copyright 2014 ACM 978-1-4503-2598-1/14/11...$15.00.
http://dx.doi.org/10.1145/2661829.2661954.
INTRODUCTION
word matching, term frequency, and document length
normalization) to locally-defined entity search.
“Mary Jane's nineteen, Susan's fifteen, and
Joanna's about fourteen—that's the one that
"Mary
Jane's
nineteen,
gives herself
to good
works and has a harelip.”
Susan's
fifteen, and Joanna's
• We present a ranking model that leverages multiple
novel signals to model the relevance 1 of a passage.
This includes not only the scores from the retrieval
model, but also signals that come from mining external sources, analyzing related entities and actions, and
modeling entity distribution within the document.
about fourteen—that's the
Mary Jane was red-headed, but that don't
one
gives herself
makethat
no difference,
she wasto
most awful
good
works
andface
has
harebeautiful,
and her
anda her
eyes was all
lit up like glory …
lip."
You see, he was pretty old, and George's
girls was too young to be much company for
him, except Mary Jane, the red-headed one
…
Here, Mary Jane, Susan, Joanner, take the
money—take it all. It's the gift of him that
lays yonder, cold but joyful.
• We perform an evaluation in the context of a real-world
application of locally-defined entity search: searching
for characters in e-books. We have constructed a dataset
of 240 characters over 9 novels, and have obtained relevance judgments for 8088 passages in total. 2 Using
the dataset from the e-book application, we show that
our approach outperforms all the baseline systems by
more than 60% in terms of NDCG.
See more results
Figure 1: A locally-defined entity search experience.
of characters most of which (particularly fictional characters) may not appear in any knowledge base. Enabling the
reader to understand the characters in e-book thus results
in a significantly better reading experience. Figure 1 shows
a screenshot of the experience enabled by locally-defined entity search. The reader is reading a book (“Adventures of
Huckleberry Finn”) in an e-reader and reaches a mention to
a character called “Mary Jane”. In order to get more information, the user selects “Mary Jane” and the application
shows, directly within the e-reader, passages that provide
rich descriptive information about this character.
One might argue that searching within a document has
always been available through the ubiquitous Find (“CtrlF”) function [20, 13]. However, a brute force approach of
string matching, say, the first paragraph where the entity
name appears may lead to a suboptimal user experience. For
example, in the book “Adventures of Huckleberry Finn”, the
first paragraph mentioning Mary Jane is “Mary Jane decides
to leave? Huck parting with Mary Jane.” An arguably much
better passage to describe Mary Jane is “Mary Jane was redheaded, but that don’t make no difference, she was most
awful beautiful...” But the latter passage corresponds to
the fourth mention of Mary Jane in the book. In fact, our
data analysis shows that, in the applications of searching
characters in e-books, 75% of the relevant paragraphs are
not in the first 10% matched paragraphs of a book.
There are a few works [11, 12, 20] that have attempted to
address this problem by ranking passages within-document.
However, they only used standard information retrieval models and the bag-of-word matching to score the relevance of
passages, which have been shown marginal improvements
over Ctrl-F. Indeed, in our work, we have analyzed the limitations of standard information retrievals models when applied to locally-defined entity search. To mention but a few,
the bag-of-words assumption behind these retrievals models may cause partial matching to be inappropriately dealt
with (e.g., “Jane” is always scored lower than “Mary Jane”,
though they may point the same person); anaphoric expressions are ignored (e.g., “she” is used to refer to “Mary Jane”
in a passage, which could be regarded as an additional occurrence of “Mary Jane”, but does not count for scoring);
and the document length hypothesis [29] should be revisited
since the “documents” are passages from the same content
(e.g, book) that the user is reading.
The contributions of this work include the following:
• We introduce the problem of locally-defined entity search.
• We present a retrieval model that tackles the limitations of standard retrieval heuristics (such as bag-of-
2.
RELATED WORK
To the best of our knowledge, the problem of locallydefined entity search has not been addressed by previous
work. We now briefly discuss research efforts that are related to our work in the following directions: entity linking and
search, search within a document, and passage retrieval.
Entity linking and search. There is a rich literature
on named entity linking [6, 23], which links the mention of
an entity in a text to a publicly available knowledge base
such as Wikipedia. Since locally-defined entities may not
appear in any available knowledge base, entity linking could
only mark them as NIL to indicate there is no matching
entry. Entity search [26] extracts relevant entities from documents and returns directly these entities (instead of passages or paragraphs) in response to a general query (rather
than a named entity). Existing work focuses mainly on how
to generate an entity-centric “document” representation by
keyword extraction, and how to score an entity based on this
entity representation using standard IR techniques.
Search within a document. We cast our problem as
searching within the actual content that the user is reading.
Despite the tremendous attention attracted by document retrieval (motivated by web search applications), there is limited research on algorithms for retrieval within documents,
with only a few exceptions [11, 12, 20]. A simple withindocument search function, known as “Find” (i.e., Ctrl-F) is
available in almost all document readers/processors and Internet browsers. This function locates the occurrences of a
query based on exact string matching, and lets users browse
all matches linearly. As such, it is insufficient to rank paragraphs that are descriptive about an entity (see [20] for a user study). Harper et al. [11, 12] presented a system, SmartSkim, which uses a standard language modeling approach
and bag-of-word matching to score the relevance of passages
within a document w.r.t. a user’s query, and then visualize
the distribution of relevant passages using TileBars [13]. A
related application that searches relevant passages within a
document is the Amazon Kindle X-Ray feature. Although
there is no publication about the details behind this feature,
it appears to work similarly to SmartSkim. In Section 7, we
1
We define a relevance notion of “descriptiveness” in later
sections, which is exchangeable with “relevance”.
2
http://research.microsoft.com/en-us/people/
yuanhual/lde.aspx
compare against the “Ctrl-F” function and several standard
IR algorithms including a language modeling approach as in
SmartSkim, and show that our methods perform significantly better for searching locally-defined entities.
Passage retrieval and summarization. Passage retrieval [4] segments documents into small passages, which
are then taken as units for retrieval. Many passage retrieval
techniques have been presented in the context of improving relevance estimation of documents, e.g., [4, 16, 14, 21],
while the research on passage retrieval for question answering aims at returning relevant passages directly, e.g., [5, 32].
Differently from all the existing work, our task is to retrieve
entity-centric descriptive passages in response to a namedentity query. As we shall see later, solving the locally-defined
entity search problem simply using a standard passage retrieval algorithm performs poorly, due to their lack of semantic matching and other appropriate ranking heuristics.
Our work is also related to automatic text summarization,
in particular query-based summarization [33, 31] that identifies the most query-relevant fragments of a document to
generate search result snippets. However, standard summarization techniques presumably cannot summarize locallydefined entities well without a clear understanding of the
special notion of relevance. Our proposed techniques are
complementary to the general query-based summarization
by potentially enhancing the search result snippets when
queries are locally-defined entities.
3. PROBLEM DEFINITION
The locally-defined entity search problem is an information
retrieval problem with the following elements:
• The corpus C consists of all passages from the content T (e.g., book, article) that the user is reading. A
passage p is a chunk of continuous text from the content that the user is reading. Notice the contrast with
the standard information retrieval setting, where the
corpus consists of a set of documents.
• The query is a mention to a named entity e. We call e
a locally-defined entity because it is not required to be
mentioned or defined anywhere but in the content T
that the user is reading. There may be different ways
of referring to the same entity (called surface forms).
For example, the entity Mary Jane can be referred to
as “Mary Jane”, “Mary” or “Miss Jane”.
• The search results S consist of a set of passages from
T . Clearly, S ⊆ C.
• The relevance function f (e, p) quantifies how well passage p describes entity e. Or, in other words, how
useful passage p is in order to understand entity e. We
call this function descriptiveness function.
We will consider passages that consist of a single paragraph from the content. The use of paragraph-based passages is partially inspired by an interesting observation in
question answering [19] that users generally prefer an answer embedded in a paragraph from which the answer was
extracted over the answer alone, the answer in a sentence, or
the answer in a longer context (e.g., a document-size chunk).
Furthermore, paragraph is a discourse segmented by the authors and tends to be a more natural way to present results
to the users.
A necessary condition for a passage p to be descriptive for
entity e is that it must mention the entity. This suggests
the following retrieval and ranking models. Given content
T and entity e, the goal of the retrieval model is to produce
a candidate set of passages S ⊆ C such that every passage
p in S mentions e. Given the retrieval set S, the goal of
the ranking model is to rank the passages in S according to
their descriptiveness with respect to e.
Both retrieval and ranking models pose singular challenges
in the locally-defined entity search problem. In the next
section, we describe our solution for the retrieval model, and
in Section 5 we focus on the ranking model.
4.
RETRIEVAL MODEL
Previous work [8] has shown that the good performance
of a retrieval model, such as BM25, is mainly determined
by the way in which various retrieval heuristics are used.
These include term frequency (TF), inverse document frequency (IDF), and document length normalization. TF rewards a document containing more distinct query terms and
high frequency of a query term, IDF rewards the matching
of a more discriminative query term, and document length
normalization penalizes long document because a long document tends to match more query terms.
Take the well-known retrieval model BM25 [29] as an example. BM25 has been widely-accepted as a state-of-the-art
model in IR. Its formula, as presented in [8], scores a document D with respect to query Q as follows:
∑
(k1 + 1) · c(q, D)
N +1
· log
|D|
df (q)
k
·
(1
−
b
+
b
)
+
c(q,
D)
1
q∈Q∩D
avdl
where c(q, D) is the raw TF of query term q in D; |D| is the
document length; avdl is the average document length in the
whole corpus; N is the corpus size; df (q) is the document
N +1
frequency of q in the corpus, log df
is IDF; and k1 and b
(q)
are free parameters that require manually tuning.
We now discuss the limitations of these retrieval heuristics
when applied to locally-defined entity search, and present a
retrieval model that addresses these limitations.
4.1
Semantic Matching: Addressing Limitations of TF
Standard retrieval models are based on the bag-of-words
assumption that takes both a query and a document as an
unordered collection of terms for TF computation. This
works well in most IR applications but has limitations for
locally-defined entity search. We now discuss these limitations and then present Entity Frequency (EF), a retrieval
heuristic that addresses these limitations.
The first limitation is that the bag-of-words assumption
leads to an underestimation of partial matches. As an illustration, suppose that the query is “Mary Jane” and we have
the following candidate passages:
(1) Mary Jane was red-headed, but that don’t make no difference, she was most awful beautiful...
(2) Mary was red-headed, but that don’t make no difference,
she was most awful beautiful....
The TF heuristic in practically all existing retrieval models is that a document is preferred if it contains more distinct
query terms. As a result, if we use standard retrieval models, the first passage above would be scored higher than the
the second because it matches two query terms (“Mary” and
“Jane”) while the second passage matches just one (“Mary”).
However, if both passages are indeed referring to the same
person, they should get the same score. We tackle this problem with our proposed EF retrieval heuristic by considering
semantic matching of the entire entity, rather than counting
how many query words are matched.
Another limitation of standard TF is that it may produce either false positives or false negatives due to lack of
semantic information. False positives can occur when a passage is rewarded by the standard retrieval model because
it matches some “term” of a query, without considering if
the matching term in the passage refers to a different entity
(or even a different type of entity). For example, a passage
mentioning “Mary Ann” or the place “Highland Mary” could
be mistakenly retrieved for query “Mary Jane”. We tackle
this problem with the EF retrieval heuristic by incorporating Natural Language Processing notions of semantic type
alignment and coreference analysis.
False negatives can occur because standard TF does not
consider anaphoric resolution. For example, in the second
passage above, “she” is used to refer to “Mary Jane” but this
cannot be captured unless we employ anaphora resolution.
In the standard IR setting, where the search results consist
of entire documents, anaphora resolution does not necessarily play an important role. In contrast, anaphoric mentions
are more important in locally-defined entity search. One
reason is that the retrieved “documents” are actually short
passages and it is good writing style to avoid redundant
mentions within the same passage (anaphora being one way
of avoiding redundancy). We tackle this problem by incorporating anaphora resolution into the EF retrieval heuristic. Anaphora resolution has been extensively studied in the
NLP literature ([24]), but there have been relatively few prior research efforts that have applied it to IR tasks [1, 7, 25].
Our experimental results of Section 7 show that anaphora
resolution clearly helps for locally-defined entity search.
We are now ready to define the Entity Frequency (EF)
retrieval heuristic.
Definition 1 (Entity Frequency). Given an entity
name e, and a passage p containing N named entities e1 , ..., eN
that match e based on the standard bag-of-words matching,
the entity frequency of e in passage p is
EF (e, p) =
N
∑
E(ei , e) · (1 + r · CR(ei ))
(1)
i=1
where CR(ei ) represents the number of anaphoric mentions
that refers to ei ; r ∈ [0, 1] controls the relative importance of
an anaphoric mention as compared to ei itself; and E(ei , e)
is a variable indicating how likely that ei refers to e.
E(ei , e) is dependent on the type of entities being considered. We now present the function used in our experiments (Section 7) which is geared towards matching of person names. While some aspects of the function are general,
an evaluation beyond person names is left as future work.

1




 0
E(ei , e) =
1




 0
Coref(ei ,e)
if there ei = e
if e and ei are of different types
if (ei .has(e) and |e| > 1) or
(e.has(ei ) and |ei | > 1)
if (!ei .has(e) and !e.has(ei ))
otherwise
where |e| stands for the number of words contained in e, and
e.has(ei ) is an indicator if ei is a word based substring of e.
Passages
TF
Washingtons and Lafayettes, and battles,
and Highland Mary, and ...
He’s sick? and so is mam and Mary Ann
Sarah Mary Williams ... some calls me Mary.
EF
1
0
1
2
0
0
L: 0
G: 0.1
I said it was Mary before, so I ...
1
Mary Jane was red-headed, ... she was most
awful beautiful, and ...
1
1+r∗1
Table 1: Entity Frequency vs Term Frequency
For example, “Mary Jane”.has(“Jane”) is true while “Mary
Jane”.has(“Mar”) is false. Coref(ei ,e) is a score obtained by
performing coreference resolution. Since off-the-shelf tools for coreference resolution are computationally expensive
and have low accuracy (about 70%) [18], we propose the
heuristics that we describe next.
We consider two heuristics: local and global coreference.
To compute local coreference, we first do a “local” lookup
by determining what entity, with the surface name being a
superstring of ei , is the nearest one before the passage within a fixed window (e.g., ten passages), inspired by the term
proximity heuristic in information retrieval [22, 21]. If it
is the query entity e, then coref (ei , e) = 1. Otherwise, we
compute global coreference based on statistics gathered from
the entire content. The intuition is that if there are overlapping mentions beyond the fixed local window (of, say, ten
passages), the most likely entity assignment is the one that
appears the most in the content. Suppose Q(ei ) includes all
possible entity names that contains ei , formally, we have
E(ei , e) = p(e|ei ) = ∑
p(ei |e)p(e)
p(e)
= ∑
p(e
|q)p(q)
i
q∈Q
q∈Q p(q)
To illustrate the computation of EF , suppose the query is
“Mary Jane” and there are five candidate passages as listed
in Table 1. It is easy to see that only the last passage is
clearly related to the query. The first three are unrelated
to the query: the first one mentions the place “Highland
Mary”; the second and third mentions different persons. It
is unclear whether the fourth paragraph is relevant, and it
would be necessary to look at the rest of the content (i.e.,
neighboring passages) to make a determination.
We show in Table 1 the EF scores: the first three get a
score of zero (due to type mismatch and partial matching
rules), which matches the intuition that they are irrelevant
to the query. The last gets the highest score (due to exact
match and anaphora resolution) which matches the intuition
that it is relevant to the query.
To determine the EF score of the third passage, we must
compute the coreference score between the term “Mary” with
mentions to “Mary Jane” in the rest of the content. We first
determine local coreference by looking at a window of neighboring passages. Suppose that the nearest mention overlapping “Mary” in that window is “Mary Williams”. Since this
is different from “Mary Jane”, the local coreference score is
zero (denoted by “L: 0” in Table 1). To compute the global
coreference we consider all the named entities in the entire content that overlap with “Mary”. Suppose that this
is {“Mary Ann”, “Mary Williams”, “Mary Jane”}, and their
frequencies in the entire content are 50, 40, 10. Then, the
10
= 0.1 (denoted by “G:
global coreference score is 50+40+10
0.1” in Table 1).
4.2 Addressing Limitations of Document Length
Normalization
Document length normalization plays an important role
to fairly retrieve documents of all lengths. As highlighted
by Robertson and Walker [28], the relationship between document relevance and length can be explained either by: (1)
the scope hypothesis, the likelihood of a document’s relevance increases with length due to the increase in covered
material; or (2) the verbosity hypothesis, where a longer
document may cover a similar scope than shorter documents
but simply uses more words and phrases.
The verbosity hypothesis prevails over the scope hypothesis in most current IR models since they tend to penalize
document length [30]. However, this may not be a reasonable assumption in locally-defined entity search since all passages of the same content (e.g., book) are likely to be written
by the same author, and thus the verbosity of different passages may not vary significantly.
We hypothesize that the scope hypothesis dominates over
the verbosity hypothesis in locally-defined entity search. That
is, document length should be rewarded. To validate this
hypothesis, we followed a procedure inspired by Singhal et
al’s finding [30] that a good retrieval function should retrieve
documents of all lengths with similar likelihood of relevance.
We compared the likelihood of relevance/retrieval of several
BM25 models in the dataset that we will present in Section
6, where the content consists of e-books and the queries are
characters appearing in the books.
The considered BM25 models differ only in the setting
of the parameter b: a smaller b means less penalization of
document length. We followed the binning analysis strategy proposed in [30] and plotted those likelihoods against
all document length on each book. We observed that the
retrieval likelihood gets closer to the relevance likelihood as
b decreases. As an example, Figure 2a presents the results
on “Adventures of Huckleberry Finn” 3 , where the bin size is
100 (varying it gives similar results). This shows that BM25
works the best when document length is not penalized, i.e.,
b = 0. However, we can see that even when b = 0, there
is still a large gap between the retrieval and the relevance
likelihoods, confirming our hypothesis that document length
should be rewarded instead of being penalized.
Now we explore how to reward document length. In keeping with the terminology in the literature, we will talk about
document length, but notice that the “document” in locallydefined entity search is actually a passage. Given an ideal
function g(|D|) for document length rewarding and a function BM 25b=0 that blocks document length, we assume that
the true relevance function R with appropriate document
length normalization can be written as
R = BM 25b=0 · g(|D|)
where R can be regarded as the golden retrieval function as
represented by the relevance likelihood in Figure 2a. Then
R
, which is plotted
g(|D|) can be approximated as BM 25
b=0
against document length in Figure 2c. We can see that
g(|D|) is roughly a monotonically increasing function with
|D|, which is consistent with the scope hypothesis [30] that
the likelihood of a document’s relevance increases with document length. Another interesting observation is there is a
3
Similar results of other books are not shown due to space
limitation.
“pivot” point of document length, formally defined as |D0 |.
When a passage D is shorter than this pivot value, g(|D|)
will be 0, causing the relevance score of this document to be
0; this is likely because a passage that is too short is unlikely
to be descriptive. When a passage D is longer than this pivot value, g(|D|) increases monotonically but the increasing
speed decreases with document length |D|; this is also reasonable, since a relatively longer passage may contain more
information, but if a document is already very long, increasing it further may not add much information. Our analysis
shows that a logarithm function of document length fits the
curve in Figure 2c very well. Therefore, we approximate the
document length rewarding function as follows:
{
g(D) ∝
log(|D|/|D0 |)
0
if |Dmax | ≥ |D| > |D0 |
otherwise
(2)
where |D0 | is the pivot length of a passage while |Dmax | is
the size of the longest passage.4
One may argue that the traditional window-based passage
retrieval does not have this document length issue. However, our document length finding shows that the documnet
length is indeed a useful signal to predict whether or not a
passage that contains the entity is a good descriptive passage for the entity. We thus hypothesize that the traditional window-based passage retrieval would not work well
for searching descriptive passages for locally-defined entities,
though it works well for capturing the traditional relevance
notion for general queries, which have also been verified by
our experiments.
4.3
The LRM Retrieval Function
We are now ready to present the retrieval function for
locally-defined entity search, which we call LRM. LRM combines EF and document length rewarding using BM25F’s TF
saturation function.
LRM(c, p) =
(k1 + 1) · EF (c, p)
· log(|D|/|D0 |)
k1 + EF (c, p)
(3)
where k1 is similar to k1 in BM25. Note that the relative
LRM scores are not very sensitive to k1 , since there is only
one single “term” in all queries; we only keep it to adjust the
relative importance of EF and document length. In experiments, we set k1 = 1.5 empirically.
The retrieval likelihood of LRM is also compared with the
relevance likelihood in Figure 2b. Obviously, LRM obtains
the closest retrieval likelihood to the relevance likelihood
as compared to any BM25 likelihoods shown in Figure 2a
(other books have similar results). This shows analytically
that LRM would be more effective than BM25 for our tasks.
Notice that LRM does not have an IDF component. The
IDF heuristic rewards the matching of a more discriminative
query term if there are multiple query terms. That is, IDF
is used to balance multiple query terms, and does not affect
a retrieval model for queries that have only one single term,
which is precisely the case in LRM, which uses the entire
entity query as opposed to a bag-of-words assumption.
5.
DESCRIPTIVENESS RANKING MODEL
We now turn our attention to the ranking of passages
based on how descriptive they are about the entity. We
introduce three classes of “descriptiveness” features.
4
We set D0 = 8 for all books in our experiments.
0.2
3.0
Relevance
BM25(b=1.0)
LRM
BM25(b=0.5)
Probability of Relevance/Retrieval
Probability of Relevance/Retrieval
Gap between BM25(b=0) and Relevance
0.20
Relevance
BM25(b=0)
0.1
0.15
0.10
0.05
2.5
2.0
log(L/L0)
1.5
1.0
0.5
0.0
1
L0
10
100
Document Length
0.00
0.0
1
10
Document Length
100
1
(a) BM25 v.s. relevance
10
Document Length
(b) LRM v.s. relevance
100
(c) Gap between “BM25(b=0)” and the “relevance”
(the x-axis is in log-scale)
Figure 2: Study of retrieval and relevance likelihoods of BM25 and LRM
5.1 Entity-centric Descriptiveness
Entity-centric descriptiveness directly captures the goodness of a passage to profile an entity. Intuitively, to introduce
or profile an entity, an author would often describe some typical aspects of the entity. For person entities, those aspects
might cover biographical information, social status and relationships, personality, appearance, career experience, etc.
This suggests that there might be some useful “patterns” to
recognize an informative passage. Let us look at an entitycentric descriptive passage for “Jean Valjean”:
“Jean Valjean was of that thoughtful but not gloomy disposition ...
He had lost his father and mother at a very early age. His mother
had died of a milk fever ... His father, a tree-pruner, like himself,
had been killed by a fall from a tree. All that remained to Jean
Valjean was a sister older than himself ...”
There are quite a few noticeable keywords (boldfaced in
the passage) that reflect an entity’s social status and family relationships. We name them “informative keywords”
and we argue that they are potential indicators for entitycentric descriptiveness. The question now is how to mine
such informative keywords. To answer this question, we
observe that the vocabulary used to introduce local entities that do not exist elsewhere, but the content may be
similar to the one used to introduce more popular entities
for which there may be information available in knowledge
bases. In the case of characters for e-books, we resorted
to summaries and descriptions available in Wikipedia. We
crawled descriptions/summaries of famous fictional entities
with Wikipedia pages, and then mined informative keywords
based on the ratio of the likelihood of a word occurring in
this crawled corpus over the likelihood of this word occurring in all Wikipedia articles5 . We used a small dictionary
of the top−50 extracted keywords in our work, which works
well empirically. For illustration, the top 15 keywords are:
“ill, son, age, friend, old, young, war, mother, love, father,
child, die, daughter, family, and wife”.
Finally, we design two features based on the informative
keywords: (1) whether a passage contains an informative
keyword, and (2) the frequency of informative keywords occurring in a passage. It is possible that some non-popular
entities may not have those keywords in their contexts. We
thus propose two more categories of features.
5.2 Relational Descriptiveness
Related Entities. Time, place, and people are three key
elements to describe a story. Intuitively, the fact that a pas5
We did not include books used in the evaluation dataset.
sage describes the interaction of an entity with other entities
at some time in some place may indicate this is an informative passage about the entity. Following this intuition, we
argue that the occurrence of related named entities might
imply the descriptiveness of the passage.
We consider three types of related named entities in this
work: person, time, and location/organization. We use the
frequency of each type of named entities as features.
Besides, we also design another feature, which is if any
entity appears for the first time in the book. It is reasonable to believe that the first occurrence of an entity often
indicates some new and sometimes important information.
Related Actions. We also consider how the query entity is connected to other entities through actions/events.
The intuitive idea is that unusual actions tend to be more
informative. For example, actions such as “A kills B” or “A
was born in place B” should be more informative than “A
says something to B” or “A walked in place B”.
From the aforementioned examples, we see that an important/unusual action is often associated with some unusual verbs, such as “died”, “killed”, or “born”, while a minor/common action is often associated with some common
verbs, such as “walked”, “said”, or “saw”. This suggests that
the “unusualness” (more technically, the IDF) of verbs related to an entity may imply the importance of the action.
We thus encode the descriptiveness of actions using the
IDF of verbs related to an entity, formally, log DFN(v) , where
v is a verb, N is total number of passages in the book, and
DF is the number of passages containing the verb. We consider the related verbs of c as the verbs co-occurring in the
same sentence with the entity. Three features are instantiated from this signal, including the average, the maximum,
and the minimum IDF of all verbs related to the entity.
5.3
Positional Descriptiveness
The positional descriptiveness includes the order of passage in the corpus (book) and occurrence position of the
entity in the passage.
We use the “passage order” to indicate the relative sequential order of a passage in the total passages where an entity
appears. The passage order is perhaps the most widelyused ranking feature in existing within-document retrieval
such as the “Ctrl-F” function. Our data analysis in Figure 3
demonstrates the distribution of descriptive passages w.r.t
the passage order, where the passage order is normalized
into 10 bins. It shows that the passage order indeed matters, and that the passages in the first bin are more likely to
be descriptive. However, only about 25% or less descriptive
Label
Passages
entities
Mary Jane
Perfect
Jean Valjean
Good
Mary Jane
Fair
Bad
Mary Jane
Mary Jane
ID
Mary Jane was red-headed, but that don’t make no difference, she was most awful beautiful, and her face
and her eyes was all lit up like glory...
Jean Valjean came from a poor peasant family of Brie. He had not learned to read in his childhood. When
he reached man’s estate, he became a tree-pruner at Faverolles. His mother was named Jeanne Mathieu ...”
So Mary Jane took us up, and she showed them their rooms, which was plain but nice. She said she’d
have her frocks and a lot of other traps took out of her room if they was in Uncle Harvey’s way ...
Miss Mary Jane, is there any place out of town a little ways where you could go and stay three or four days?
Oh, yes’m, I did. Sarah Mary Williams. Sarah’s my first name. Some calls me Sarah, some calls me Mary.
1
2
3
4
5
Probability of Descriptive Passages
Table 2: Examples for passages and their judgments
0.20
0.15
0.10
0.05
0.00
1
2
3
4
5
6
7
8
9
10
location
Figure 3: Distribution of descriptive passages w.r.t. an entity’s lifetime in the book
passages are covered by the first bin, and the remaining 75%
are distributed in other bins. This suggests that passage order, although useful, may not be a dominant signal.
We design two features based on this signal: (1) the normalized passage order into an integer value in [0, 10], and (2)
the sequential order of a passage in the whole book, which
is normalized in the same way.
Besides passage order, we find that the occurrence position of an entity in the passage is also helpful. It is often
the case that the entity occurs earlier tends to be more important in a passage containing multiple entities. So we also
present another two features based on the entity position:
(1) if an entity appears in the beginning of the passage, and
(2) the first occurrence position of the entity in the passage.
6. DATASETS
We now describe the datasets to learn the ranking function
and evaluate our methods. The datasets correspond to a
setting where a user is reading e-books and seeks to obtain
information about a character in the book.
6.1 Dataset Construction
We used e-books obtained from the Gutenberg collection6 .
The list of books is given in Table 3. For each book, we
obtained fairly exhaustive lists of characters from Wikipedia
and used the character names as queries7 .
We adopt the widely-accepted pooling technique [15] to
generate candidates to be labeled. Several standard IR models were used as the “runs”, including BM25 (k1 = 1.5,
b = 1) [29], language models with Dirichlet prior smoothing (µ = 100) [27, 34], and pseudo-relevance feedback based
on relevance models [17]. A key difference of our task from
traditional IR tasks is that we only care about top-ranked
passages. In fact, in the real world scenario of e-book search,
only top few results will be returned to users due to device
screen size limitation. Therefore, for each entity, the top
ranked 50 passages from each run were pooled together for
acquiring judgments, resulting in a total of 8088 passages for
6
http://www.gutenberg.org
Most characters, particularly the least popular ones, do
not have their own pages in Wikipedia. The nine books have
their Wikipedia pages, from which we get the character lists.
7
the nine books. The number of entities and passages from
each book can be found in Table 3.
For each book, each labeler was given a list of entities, each
of which was associated with a set of candidate passages
in random order, and was asked to assign each candidate
passage one of the 4 labels: Perfect, Good, Fair, and Bad.
Perfect: The passage is helpful to understand what the
character is about. The passage has provided useful information to write a self-explanatory summary about the character. A perfect passage should contain (but is not limited to)
significant biographical information, social status, social and
family relationships, personality, traits, appearance, career
experience, etc. Examples 1 and 2 in Table 2 show a perfect passage for “Mary Jane” in “Adventures of Huckleberry
Finn” and “Jean Valjean” in “Les Miserable” respectively.
Good: The passage contains some information about the
character, but it is not enough to construct a self-explanatory
summary describing it. It may still contain detailed information such as whether the character is interacting with others,
performing activities, etc. A “good” passage for “Mary Jane”
is Example 3 in Table 2.
Fair: While the passage mentions the character, it does
not provide any noticeable information about it. Basically,
our understanding of the character would not change after
reading the passage. Example 4 in Table 2 shows a “fair”
passage of “Mary Jane”.
Bad: The passage is not related to the entity. For example, Example 4 in Table 2 is not related to “Mary Jane” because “Mary” in the passage refers to “Sarah Mary Williams”.
6.2
Analysis of Judgments
We obtained editorial judgments from two labelers (one
of them is a professional editor and the other is an author of
this paper). The labelers chose books that they had read in
the past and were intimately familiar with, and labeled all
the characters and passages for such books. There was an
overlap of three books between the labelers (the overlapping
books are shown in Table 3 with a “*” symbol). The number
and proportion of judgments for each book are reported in
Table 3. We can see that the largest proportion of passages
get a Fair score (60.2%), followed by Bad, Good, and Perfect.
The three books judged by both labelers contain 99 entities with 3422 passages. We show the labeling consistency in Figure 4. We can see that consistency decreases as
the judgment level goes from “bad” to “perfect”. The “bad”
judgments have an extremely high consistency, with few exceptions where the other labeler might give “fair” (3.9%) or
“good” (0.5%), but never “perfect”. This indicates that our
assessors did a good job of labeling. We see the highest
disagreement rate between “good” and “fair” labels, followed
by “perfect” and “good”, while “perfect” and “fair” have the
lowest. This is mainly because “perfect”, “good”, and “fair”
labels appear to be relatively subjective labels, in particular
the “good” label, and the assignment of these labels may be
affected by the labelers’ familiarity with the entity/book and
N
#e
#p
pl
Judgment Distribution (%)
P.
G.
F.
B.
2475
42
1032
49
6.3
12.2
48.8
32.6
1313
8
260
22
2.3
7.0
88.4
2.3
1128
6
234
60
2.0
3.0
90
5.0
3327
21
653
42
2.0
3.2
62.8
32.2
7194
37
1313
52
2.5
5.3
72.9
19.2
13626
41
1966
43
5.1
6.9
63.6
24.4
2127
16
424
59
9.2
26.8
12.6
51.4
2233
43
1408
23
4.4
10.1
61.2
24.3
26
798
48
1.3
4.1
48.6
46.0
240
8088
45
4.1
8.2
60.2
27.5
Book name
1
2
3
4
5
6
7
8
9
Adventures
of
Huckleberry
Finn*
A
Doll
House
A Princess
of Mars
A Tale of Two Cities
David Copperfield
Les Miserable*
Pride
and
Prejudice*
Rover Boys
on the Farm
War
and
Peace
12108
Total
Table 3: Summary of the constructed data set (N: total
number of passages; pl: the average passage length; #e:
#entities; #p: #passages)
1.0
3.9%
12.9%
8.9%
0.9
0.8
@2
32.1
33.9
37.0
37.2
37.8
52.8*
39.7%
42.7%
60.2†
59.3%
62.7%
@3
31.7
33.2
37.4
37.7
38.0
51.4*
35.3%
37.4%
62.7†
65%
67.7%
@5
32.1
35.5
39.8
39.4
38.5
55.1*
43.1%
38.4%
65.5†
70.1%
64.6%
We use as features the descriptiveness signals described in
Section 5, together with the LRM retrieval model scores, to
train LDM. We use a leave-one-book-out strategy to evaluate our algorithms. That is, each time, we train a model
using eight books and test it on the remaining book.
Effectiveness Comparison
For our approach, we will report results of the machinelearned ranking model LDM. We will also report results of
the LRM retrieval model separately that we presented in
Section 4. We compare our approaches against five baselines:
69.5%
0.6
66.6%
0.5
95.6%
@1
30.0
32.4
33.2
33.9
35.7
51.8*
45.1%
56%
57.6†
61.3%
73.5%
Table 4: Comparison of average NDCG scores (%) on all
books; * and † indicate the improvements over the baselines
and LRM, respectively, are statistically significant (p<0.05)
using Student’s t-test.
7.1
0.7
Distribution of labels
Model
Standard Find
Entity Find
BM25
PLM
Semantic Match
LRM
↑ over Semantic Match
↑ over BM25
LDM
↑ over Semantic Match
↑ over BM25
82.3%
0.4
0.3
16%
0.2
Perfect
Good
17.6%
0.1
13.5%
0.0
Bad
Fair
• BM25, which is one of the state-of-the-art IR models.
Bad
4.9%
Fair
Good
Perfect
Label
Figure 4: Distribution of judgments given by one labeler,
when the other one’s label is “bad”, “fair”, “good” and “perfect”, as shown in x-axis.
personal judgment criteria. We can see that overall agreement ratio is as high as 86% for a 4-level relevance judgment
task; this shows the high quality of the judgments.
We use a 4 point score scale, where “perfect”, “good”, “fair”
and “bad” labels correspond to rating score 4, 2, 1, 0, respectively. This strategy is chosen because “perfect” passages are
the most desirable ones for locally-defined entity search. For
those passages that two assessors have different opinions, we
select a judgment with higher agreement ratio according to
Figure 4 as its final judgment. For example, if a passage is
labeled as “good” and “fair” by two assessors, then “fair” will
be chosen as its final label.
7. EXPERIMENTAL EVALUATION
Each passage is first preprocessed by applying the Porter
Stemmer and removing stopwords using a standard stopword list. The Stanford CoreNLP [9, 18] is used to provide tokenization, sentence splitting, NER, coreference, and
anaphora resolution. We use widely-used IR metrics including NDCG and precision as our main evaluation measures
and report the scores at top 1, 2, 3, and 5 passages. For
NDCG, we use the relevance labels presented in the previous section. For precision we use binary labels. A result is
considered to be related to the query if it is given one of the
labels “perfect”, “good”, and “fair”; and unrelated otherwise.
We use a state-of-the-art learning to rank algorithm called
LambdaMART [2] to train a descriptiveness function, namely LDM (learned descriptiveness model). LambdaMART is
a boosted tree version of LambdaRank [3] using MART [10].
• Standard Find, the within-document Find function
(“Ctrl-F”), which is used in arguably every application.
• Entity Find, which extends the standard Find function by finding passages that have an entity with the
same entity type and a single word matched.
• Semantic Match, which extends the standard Find
function by using the semantic entity matching techniques presented in Section 4.1.
• PLM (positional language model [21]), which is not
only a state-of-the-art window-based passage retrieval
model, but can also serve as an improved SmartSkim
algorithm [11, 12], since it has been shown to be more
effective than the standard language modeling approach
[21]. Following [21], we first apply PLM to score every
position, and then use the position with the maximum
score in a passage as the score of the passage.
We report the average NDCG scores of LDM, LRM and
the baselines in Table 4. We can see that LDM dramatically
outperforms all of the five baselines. The average NDCG
scores over all books is 60+% higher than all baselines on
almost all top k (=1,2,3,5) positions. LDM performs reasonably better than LRM, showing that the proposed descriptiveness signals effectively contribute to accurately modeling
the notion of descriptiveness. The detailed NDCG scores for
each book are reported in Table 6. The first column shows
the book IDs referring to Table 3. We can see that LDM
consistently outperforms other baselines on all books.
The average precision scores for different methods are
shown in Table 5. We can see that LDM significantly outperforms all of the five baselines, and that Precision@5 of
LDM reaches about 90%. It is interesting to see that the
Table 5: Comparison of average precision of LDM and LRM with baselines over all books; * and † indicate the improvements over the baselines and LRM, respectively, are
statistically significant (p<0.05) using Student’s t-test.
Model
Standard Find
Entity Find
BM25
PLM
Semantic Match
LRM
LDM
Prec@1
77.8
76.4
76.1
75.0
91.1
96.3*
96.4*
Prec@2
75.4
76.3
78.8
79.8
89.8
94.3*
95.0*†
Prec@3
74.5
77.8
76.9
80.8
88.5
92.4*
93.0*†
BM25
100
LRM-EF
LRM-DL
LRM
90
Prec@5
70.2
78.5
76.7
80.2
83.4
89.9*
90.0*
80
70
Pre@1
Pre@2
Pre@3
Pre@5
(a) Precision: LRM
BM25
60
LRM-EF
LRM-DL
LRM
precision of LDM is just slightly better than the precision
of LRM. This is to be expected because precision conflates
the labels “perfect”, “good” and “fair” as correct. Thus, a
passage is considered to be correct if it mentions the entity
query, which is precisely what LRM is designed for.
50
40
30
7.2 Analysis of the LRM Retrieval Model
As we can see in Tables 4 and 5, LRM by itself outperforms all baselines significantly, with the average Precision@5 scores reaching as high as 89.9%, and NDCG@5 being over
35% above all baselines. We now turn our attention to the
analysis of the contribution of the individual components
of LRM (namely entity frequency (EF) and rewarded document length) to the performance of LRM. To this end, we
report the results of two modified versions of LRM in Figures
5a and 5b. In LRM-EF, we replace EF with the traditional
TF, and in LRM-DL, we remove the document length rewarding function g(|D|). The precision results show that
LRM-DL works significantly better than LRM-EF, and performs just slightly worse than LRM, showing that the entity
frequency (EF) retrieval heuristic plays a dominant role over
rewarded document length. On the other hand, the NDCG
scores demonstrate both retrieval heuristics may have similar capability for encoding “descriptiveness”, though LRMEF works slightly better. However, combining both EF and
document length rewarding leads to a significant boost of
NDCG, suggesting that the two components complement
each other. For sensitivity analysis, by extensively testing
various values of parameter k1 and r, we found LRM works
stably when k1 ∈ [1, 2], r ∈ [0.3, 1].
7.3 Feature Analysis
We further analyze the importance of different heuristics
and features in contributing to the LDM ranking model. The
ten most influential features and their corresponding importance scores are reported in Figure 6, which are based on the
relative importance measure proposed in [10] to examine feature weights. It shows that the LRM score, number of keywords, normalized location of lifetime and number of person
entities are the top four most important features. Nonetheless, all the proposed descriptiveness signals contribute to
the descriptiveness model, and each of them contributes at
least two features in the top-10 most important features.
We conclude by showing anecdotal results of the application of the LDM model and two of the baselines (Semantic
Match and BM25) to the query “Mary Jane” in the book
“Adventures of Huckleberry Finn”. We list the top-3 results
returned by two baselines and LDM in Table 7. We can see
that LDM shows passages that have rich information about
Mary Jane, such as her age, physical appearance, etc. In
contrast, the passages retrieved by the baselines are either
NDCG@1
NDCG@2
NDCG@3
NDCG@4
(b) NDCG: LRM
Figure 5: Comparison of Precision and NDCG results for
LRM, LRM-EF, LRM-DL and BM25
LRM
Number of Keywords
Normalized Location of Lifetime
Numof PER
Num of LOC
At Begining
Contain Keywords
Average IDF of Main Verb
Max IDF of Main Verb
Contain New Entities
0.0
0.2
0.4
0.6
0.8
1.0
Importance
Figure 6: Top 10 important features
not related to Mary Jane (e.g., the first passage returned by
BM25) or provide little information about her.
8.
CONCLUSIONS AND FUTURE WORK
We introduced the problem of locally-defined entity search.
The problem has important practical applications (such as
search within e-books) and poses significant information retrieval challenges. In particular, we showed the limitations
of standard information retrievals heuristics (such as TF and
document length normalization) when applied to locallydefined entity search, and we designed a novel retrieval model that addresses this limitations. We also presented a ranking model that leverages multiple novel signals to model the
descriptiveness of a passage. A thorough experimental evaluation of the approach shows that it outperforms the baselines by 60%+ in terms of NDCG.
Our ultimate goal is that all applications (word processors,
Internet browsers, etc.) replace their Ctrl-F functions with
locally-defined entity search functions. This entails many directions for future work, including studying the use of these
techniques on other entity types beyond person names (locations, organizations, projects, etc.) and the impact of this
functionality on different types of applications.
9.
ACKNOWLEDGMENTS
We gratefully acknowledge helpful discussions with Ashok
Chandra and useful comments from referees.
ID
1
2
3
4
5
6
7
8
9
@1
36.6
25
61.7
50
35.2
39.3
5.2
35.2
35
Semantic Match
@2
@3
@5
39.5 33.7 30.4
25
28.7 31.3
60.1 58.6 59.3
43
42
45.5
36.8 37.9 37.2
40.1 44.2 46.5
6.5
6.8
6.3
41.9 42.7 45.2
39.5 41.7 41.4
@1
47.4
25.0
25.0
29.7
25.0
50.9
5.6
36.6
12.5
BM25
@2
@3
45.8 47.8
21.9 26.4
33.3 36.2
30.6 32.7
30.6 32.0
53.6 52.2
7.7
9.3
38.9 39.1
31.3 26.8
@5
49.5
27.3
54.3
32.3
36.8
54.2
12.7
40.5
27.4
@1
57.3
35.0
66.7
50.0
44.1
61.8
33.3
53.2
50.0
LRM
@2
@3
59.6 53.4
38.8 41.7
66.7 65.3
48.0 47.0
45.8 47.1
63.6 60.7
27.1 29.1
57.6 53.8
47.5 52.5
@5
55.2
43.2
66.7
47.5
49.0
60.5
32.5
60.7
66.5
@1
60.6
48.6
70.4
57.9
46.6
65.5
45.2
54.3
68.4
LDM
@2
@3
61.1 62.6
56.7 57.2
68.5 70.6
64.0 63.3
44.5 51.1
67.5 64.2
45.8 47.1
59.4 60.8
76.3 88.9
@5
67.3
60.3
71
60.1
57.8
64.8
48.4
65.5
89.7
Table 6: Comparison of NDCG scores (%) on all books
Rank
Semantic Match
BM25
LDM
1
The Trip to England.?”The Brute!”?Mary Jane
Decides to Leave.?Huck Parting with Mary
Jane.?Mumps.?The Opposition Line.
”M?Mary Williams.”
2
I never seen anybody but lied one time or another, without it was Aunt Polly, or the widow, or
maybe Mary. Aunt Polly?Tom’s Aunt Polly ...
”That’s what Miss Mary Jane
said.”
3
Most everybody was on the boat. Pap, and Judge
Thatcher, and Bessie Thatcher, and Jo Harper,
and Tom Sawyer, and his old Aunt Polly, and Sid
and Mary, and plenty more...
Mary Jane straightened herself up, and my, but she was
handsome! She says:...
“Mary Jane’s nineteen, Susan’s fifteen, and Joanna’s about fourteen?that’s the one that gives herself to good works and has a hare-lip.”
Mary Jane was red-headed, but that don’t make
no difference, she was most awful beautiful, and
her face and her eyes was all lit up like glory, she
was so glad her uncles was come...
So Mary Jane took us up, and she showed them
their rooms, which was plain but nice. She said
she’d have her frocks and a lot of other traps took
out of her room if they was in Uncle Harvey’s way
Table 7: Results(paragraphs) returned by Semantic Match, BM25 and LDM for “Mary Jane”
10. REFERENCES
[1] S. Bonzi and E. Liddy. The use of anaphoric resolution
for document description in information retrieval. Inf.
Process. Manage., 25(4):429–441, 1989.
[2] C. J. Burges. From ranknet to lambdarank to
lambdamart: An overview. MSR-TR-2010-82, 2010.
[3] C. J. C. Burges, R. Ragno, and Q. V. Le. Learning to
rank with nonsmooth cost functions. In NIPS ’06,
pages 193–200, 2006.
[4] J. P. Callan. Passage-level evidence in document
retrieval. In SIGIR ’94, pages 302–310, 1994.
[5] C. L. A. Clarke, G. V. Cormack, and T. R. Lynam.
Exploiting redundancy in question answering. In SIGIR
’01, pages 358–365, 2001.
[6] S. Cucerzan. Large-scale named entity disambiguation
based on Wikipedia data. In EMNLP-CoNLL, pages
708–716, 2007.
[7] R. J. Edens, H. L. Gaylard, G. J. Jones, and A. M.
Lam-Adesina. An investigation of broad coverage
automatic pronoun resolution for information retrieval.
In SIGIR ’03, pages 381–382, 2003.
[8] H. Fang, T. Tao, and C. Zhai. A formal study of
information retrieval heuristics. In SIGIR ’04, pages
49–56, 2004.
[9] J. R. Finkel, T. Grenager, and C. Manning.
Incorporating non-local information into information
extraction systems by gibbs sampling. In ACL ’05,
pages 363–370, 2005.
[10] J. H. Friedman. Greedy function approximation: A
gradient boosting machine. Annals of Statistics,
29:1189–1232, 2000.
[11] D. J. Harper, S. Coulthard, and S. Yixing. A language
modelling approach to relevance profiling for document
browsing. In JCDL ’02, pages 76–83, 2002.
[12] D. J. Harper, I. Koychev, Y. Sun, and I. Pirie.
Within-document retrieval: A user-centred evaluation of
relevance profiling. Inf. Retr., 7(3-4):265–290, 2004.
[13] M. A. Hearst. Tilebars: Visualization of term
distribution information in full text information access.
In CHI ’95, pages 59–66, 1995.
[14] J. Jiang and C. Zhai. Extraction of coherent relevant
passages using hidden markov models. ACM Trans.
Inf. Syst., 24(3):295–319, 2006.
[15] K. S. Jones and C. J. van Rijsbergen. Report on the
need for and the provision of an ’ideal’ information
retrieval test collection. Tech. Rep., University of
Cambridge, 1975.
[16] M. Kaszkiel and J. Zobel. Effective ranking with
arbitrary passages. JASIST, 52(4):344–364, 2001.
[17] V. Lavrenko and W. B. Croft. Relevance-based
language models. In SIGIR ’01, pages 120–127, 2001.
[18] H. Lee, A. Chang, Y. Peirsman, N. Chambers,
M. Surdeanu, and D. Jurafsky. Deterministic coreference
resolution based on entity-centric, precision-ranked
rules. Computational Linguistics, 39(4):885–916, 2013.
[19] J. Lin, D. Quan, V. Sinha, K. Bakshi, D. Huynh,
B. Katz, and D. R. Karger. What makes a good
answer? the role of context in question answering. In
INTERACT ’03, pages 25–32, 2003.
[20] F. Loizides and G. R. Buchanan. The myth of find: user
behaviour and attitudes towards the basic search
feature. In JCDL ’08, pages 48–51, 2008.
[21] Y. Lv and C. Zhai. Positional language models for
information retrieval. In SIGIR ’09, pages 299–306,
2009.
[22] D. Metzler and W. B. Croft. A markov random field
model for term dependencies. In SIGIR ’05, pages
472–479, 2005.
[23] R. Mihalcea and A. Csomai. Wikify!: linking documents
to encyclopedic knowledge. In CIKM ’07, pages
233–242, 2007.
[24] R. Mitkov. Anaphora resolution, volume 134. Longman
London, 2002.
[25] S.-H. Na and H. T. Ng. A 2-poisson model for
probabilistic coreference of named entities for improved
text retrieval. In SIGIR ’09, pages 275–282, 2009.
[26] D. Petkova and W. B. Croft. Proximity-based document
representation for named entity retrieval. In CIKM ’07,
pages 731–740, 2007.
[27] J. M. Ponte and W. B. Croft. A language modeling
approach to information retrieval. In SIGIR ’98, pages
275–281, 1998.
[28] S. E. Robertson and S. Walker. Some simple effective
approximations to the 2-poisson model for probabilistic
weighted retrieval. In SIGIR ’94, pages 232–241, 1994.
[29] S. E. Robertson, S. Walker, S. Jones,
M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3.
In TREC ’94, pages 109–126, 1994.
[30] A. Singhal, C. Buckley, and M. Mitra. Pivoted
document length normalization. In SIGIR ’96, pages
21–29, 1996.
[31] K. Spärck Jones. Automatic summarising: The state of
the art. Inf. Process. Manage., 43(6):1449–1481, 2007.
[32] S. Tellex, B. Katz, J. Lin, A. Fernandes, and G. Marton.
Quantitative evaluation of passage retrieval algorithms
for question answering. In SIGIR ’03, pages 41–47.
[33] A. Tombros and M. Sanderson. Advantages of query
biased summaries in information retrieval. In SIGIR
’98, pages 2–10, 1998.
[34] C. Zhai and J. D. Lafferty. A study of smoothing
methods for language models applied to ad hoc
information retrieval. In SIGIR ’01, pages 334–342,
2001.
Download