Learning Techniques for Information Retrieval

advertisement
Learning Techniques for Information
Retrieval
We cover
1. Perceptron algorithm
2. Least mean square algorithm
3. Chapter 5.2 User relevance feedback
(pp.118-123)
4. Chapter 5.3.1 Query expansion through
local clustering (pp. 124-126)
Adaptive linear model
• Let X1, X2, …, Xn be n vectors (of n documents).
• D1D2={X1, X2, …, Xn}, where D1 be the set of
relevant documents and D2 be the set of irrelevant documents.
• D1 and D2 are obtained from users feedback.
• Question: find a vector w such that
 WiXij+1>0 for each XjD1 and
i=1 to m

i=1 to m
WiXij>+1<0 for each XjD2
D1
D2
X0
X1
X2
W1
W0
Threshold
W2
X3
+1
W3
-1
Wn
Xn
Output
=sign(y)
Remarks:
• W is the new vector for query.
• W is computed based on the feedback, i.e., D1
and D2.
• The following is a hyper-plane:
 wiXi+d=0, where W=(w1, w2, …, wm)
i=1 to m
• The hyper-plane cuts the whole space into two
parts and hopefully one part contains relevant
docs and the other contains non-relevant docs.
Perceptron Algorithm
(1) For each XD1, if X·W+d<0 then increase the
weight vector at the next iteration:
W=Wold+CX . d=d+C.
(2) For each XD2 if X·W+d>0 then decrease the
weight vector at the next iteration:
W=Wold -CX . d=d-C.
C is a constant.
Repeat until X·W>0 for each XD1 and X·W<0 for
each XD2 .
Preceptron Convergence
Theorem
• The perceptron algorithm finds a W in
finite iterations if the training set {X1,
X2, …, Xn} is linearly separable.
• References:
• Wong, S.K.M., Yao, Y.Y., Salton, G., and Buckley, C., Evaluation of
an adaptive linear model, Journal of the American Society for
Information Science, Vol. 42, No. 10, pp. 723-730, 1991.
• Wong, S.K.M. and Yao, Y.Y., Query formulation in linear retrieval
models, Journal of the American Society for Information Science,
Vol. 41, No. 5, pp. 334-341, 1990.
An Example of the perception Algorithm
X1=(2,0), X2 =(2,2), X3=(-1,2), X4=(-2,1), X5=(-1,-1), X6=(1/2,-3/4)
D1={X1,X2 ,X3} , D1={X4,X5,X6} ,W=(-1,0). Set d=0.5
X3
X2
X4
WX1+0.5=-0.5,
X1
W
X5
X6
W= Wold+X1=(1,0)
X3
X2
X3
X2
W=(0,2)
X4
X4
X1
W
X5
X6
X1
X5
X6
X6
WX2+0.5= 2.5>0,
WX3+0.5=-0.5,W=Wold+ X3=(0,2)
WX4+0.5=2.5>0,
W=Wold- X4=(2,1)
X3
X2
X4
X3
W=(2,1)
X2
X4
W=(3/2,7/4)
X1
X5
X6
X1
X5
X6
WX1+0.5=3.5, WX2+0.5=7,
WX3+0.5=2.5,
WX4+0.5= -3/4,WX5+0.5= -11/4,
WX5+0.5=-2.5<0,
WX6+0.5=-1/16,
WX6+0.5=3/4>0, W=Wold-X6=(3/2,7/4)
The algorithm stops here.
LMS Learning Algorithm
Given a set of input vectors {x1, x2, …, xL}, each
has its own desired output dk, for k=1, 2, …, L,
Find a vector w such that
L
 (dk-w·xk)2 is minimized.
K=1
For IR, dk is just the order the user gives.
From “Neural networks: algorithms, applications and
programming techniques, by James A. Freeman,
David M. Skapura. 1991. Addison-Wesley Publishing
Company.”
The algorithm
1.
2.
3.
4.
5.
choose a vector w(1)=(1, 1, .., 1).
For each xk, compute
 k 2 (t)=(dk-w·xk) 2
W(t+1)=w(t)+2  k xk.
Repeat 1-4 until the error is reduced to be
acceptable.
--a parameter. If  is too large, the algorithm will
never converge. If  is too small, the speed is
slow. Choose a number between 1.0 and 0.1 in
practice. You can choose a bigger number at
the beginning and reduce gradually.
Query Expansion and Term Reweighting for the Vector Model
• Dr
: set of relevant documents, as identified by the user,
among the retrieved documents;
• Dn : set of non-relevant documents among the retrieved
documents;
•
Cr :
set of relevant documents among
all documents in the
,
collection;
• | Dr |,| Dn |,| Cr | : number of documents in the sets
respectively;
,
•
Dr , Dn , Cr
 ,  ,  : tuning constants.
qopt
1
1

dj
dj


| Cr | dCr
N  | Cr | dCr
Query Expansion and Term Reweighting for the Vector Model
Standard_Rochio :
Ide_Regular :
Ide_Dec_Hi :
qm   q 


| Dr | d j Dr
qm   q  



 d j Dr

| Dn | d j Dn
d j 
 d j Dr
qm   q  
dj

dj
dj
 d j Dn
d j   max nonrelevant (d j )
Where max nonrelevant (d j ) is a reference to the highest ranked
non-relevant document.
Evaluation of Relevance Feedback
Strategies (Chapter 5)
• Simple way: use the new query to search
the database and recalculate the results
• Problem: used feedback information, it is
not fare.
• Better way: just consider the unused
documents.
Query Expansion Through Local Clustering
• Definition Let V ( s ) be a non-empty subset of words
which are grammatical variants of each other. A
canonical form from s of V ( s ) is called a stem. For
instance, if V ( s)  { polish, polishing , polished } then
s  polish
• Definition For a given query q , the set Dl of
documents retrieved is called the local
document set. Further, the set Vl of all distinct
words in the local document set is called the
local vocabulary. The set of all distinct stems
derived from the set Vl is referred to as Sl .
Association Clusters
• Definition
The frequency of a stem si in a
document d j , d j  Dl , is referred to as
f si , j . Let m  (mi , j )
be an association matrix with | Sl | rows and | Dl | columns,
t
where mi , j  f si , j. Let m be the transpose of m. The
matrix s  mmt is a local stem-stem association matrix.
Each element su ,v in s expresses a correlation
between the stems
su
cu ,v 
and

d j Dl
su ,v  cu ,v
sv
namely,
f su , j  f sv , j
(5.5)
(5.6)
Association Clusters
•
Normalize
su ,v 
cu ,v
cu ,u  cv ,v  cu ,v
(5.7)
• Definition
Consider the u -th row in the association
matrix s (i.e., the row with all the associations for the
stem su ). Let Su (n) be a function which takes the u-th
row and returns the set of n largest values su ,v ,
where v varies over the set of local stems
and v  u . Then Su (n) defines a local association
cluster around the stem su . If su ,v is given by equation
(5.6), the association cluster is said to be unnormalized.
If su ,v is given by equation 5.7, the association cluster
is said to be normalized.
Interactive Search formulation
• A stem su that belongs to a cluster
associated to another stem sv is said to be
a neighbor of sv.
• Reformulation of query, for each
Sv, in the query, select m neighbor
stems from the cluster Sv(n) and add
them to the query.
Download