Automata Modeling for Cognitive Interference in Users’ Relevance Judgment Peng Zhang

advertisement
Quantum Informatics for Cognitive, Social, and Semantic Processes: Papers from the AAAI Fall Symposium (FS-10-08)
Automata Modeling for Cognitive Interference in Users’ Relevance Judgment
Peng Zhang1 , Dawei Song1 , Yuexian Hou1,2 , Jun Wang1, Peter Bruza3
1
2
School of Computing, The Robert Gordon University, UK.
School of Computer Sci. & Tec., Tianjin University, China.
3
Queensland University of Technology, Australia
Email: p.zhang1@rgu.ac.uk
This paper is a step further along the direction of investigating the quantum-like interference in a central IR process - user’s relevance judgment, where users are involved
to judge the relevance degree of documents with respect to
a given query. In this process, users’ relevance judgment
for the current document is often interfered by the judgment
for previously seen documents, as a result of cognitive interference. A typical scenario is that, after finding a more
relevant document, a user may lower the relevance degree
of a previously judged document. For example, when a user
judges whether a document d0 about “the theory of relativity” is relevant to the query “Albert Einstein”, the user might
initially think it is 90% relevant. However, after viewing another document d1 introducing all the aspects of Einstein,
the user is likely to consider d1 as almost 100% relevant, yet
the relevance of d0 may be lowered accordingly. Capturing
such cognitive interference, in our belief, will lead to more
user-centric IR models.
Research from the cognitive science (Khrennikov 2004;
Busemeyer et al. 2007; 2009) has demonstrated initial evidence about the quantum (or quantum-like) nature of the
cognitive interference involved in human decision making,
which underpins the relevance judgment process in IR. The
cognitive experiment showed that the participants’ probability for making certain decision could be interfered by a
previous categorization task in a non-classical manner: the
classical (Kolmogorov) law of total probability (CLTP) was
violated. It has been argued that classical Markov model
with transition probabilities from category states to decision
states was insufficient to model the probability differences,
while the quantum model provided a natural way to bridge
the gap between non-interfered probabilities and interfered
probabilities.
This sheds light for us on the formal modeling of the cognitive interference in the users’ relevance judgement process. However, different from the approach in cognitive science (Busemeyer et al. 2009), we will present how an alternative Markov approach, in which transitions are between
decision states, rather than (transitions) from category states
to decision states, is able to model the above cognitive interference. Similarly, the transition between users’ judgment
states can model the corresponding cognitive interference in
the relevance judgment process. The judgment states correspond to the relevance degrees of documents, e.g., relevant
Abstract
Quantum theory has recently been employed to further advance the theory of information retrieval (IR). A challenging
research topic is to investigate the so called quantum-like interference in users’ relevance judgment process, where users
are involved to judge the relevance degree of each document
with respect to a given query. In this process, users’ relevance
judgment for the current document is often interfered by the
judgment for previous documents, due to the interference on
users’ cognitive status. Research from cognitive science has
demonstrated some initial evidence of quantum-like cognitive interference in human decision making, which underpins
the user’s relevance judgment process. This motivates us to
model such cognitive interference in the relevance judgment
process, which in our belief will lead to a better modeling and
explanation of user behaviors in relevance judgement process
for IR and eventually lead to more user-centric IR models.
In this paper, we propose to use probabilistic automaton (PA)
and quantum finite automaton (QFA), which are suitable to
represent the transition of user judgment states, to dynamically model the cognitive interference when the user is judging a list of documents.
Introduction
Increasingly, information retrieval (IR) techniques are underpinning daily information seeking and management
tools, such as the popular search engines. Despite the
success of IR applications, there has been an urging demand for advanced theoretical frameworks, especially to
support the emerging context-sensitive and user-centric IR
systems which are compatible with human information processing and cognition. Following van Rijsbergen’s pioneering work (2004), which has shown the potential of quantum theory (QT) to subsume the major IR models into a
single mathematical formalism in Hilbert vector spaces, researchers proposed many QT-based models in a number of
areas, e.g., contextual IR (Melucci 2008), interactive IR
(Piwowarski and Lalmas 2009a, 2009b), lexical semantic
spaces (Bruza and Cole 2005; Hou and Song 2009), and
quantum-like interference in IR (Zuccon et al. 2009, 2010;
Melucci 2010).
c 2010, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
125
(R), partially relevant (P), and irrelevant (I).
Fomally, we propose to use a probabilistic automaton
(PA) (Tzeng 1992), which is a generalization of Markov
models, to dynamically model the cognitive interference
when user is judging a list of documents. Specifically, the
states in PA represent the judgment states of users; different input symbols of PA represent different kinds of cognitive interference generated by viewing other documents; and
each symbol is associated with a (Markov) transition matrix
to encode the transition probabilities among judgment states.
Moreover, PA can accept a finite string of symbols, making
it possible to dynamically model the cognitive interference
when users are judging a ranking of documents.
In addition to PA, we also present its quantum counterpart, namely quantum finite automaton (QFA) (Kondacs and
Watrous 1997), to model the cognitive interference and show
its potential advantages over the PA.
To the best of our knowledge, this is the first attempt to
formally model the cognitive interference in the document
relevance judgement process. In the IR literature, the dynamic nature of relevance and the effects of document presentation order on relevance judgment have been addressed
in (Mizzaro 1996; Eisenberg and Barry 1988). Recently,
Zuccon et al. (2009; 2010) proposed a quantum probability
ranking principle (QPRP), which incorporates the approximation (i.e., the interference term estimated by the similarity between non-ranked document and ranked documents)
of the quantum interference term into the classical probability ranking principle (PRP) to revise the document ranking
scores. QPRP is the first successful application of quantumlike IR methods and has achieved a promising performance
on large scale collections. However, QPRP can be regarded
as a system-centric approach, since it is concerned about
the document ranking by the system. Different from QPRP,
our approach is focused on user’s relevance judgment process: having judged the relevance of other documents will
lead to the user’s change of cognitive status and in turn interfere with his/her judgment of current document. In our
opinion, system-centric approach implicitly assumes that the
relevance is a reality feature (or an objective property) of
documents, while the user-centric approach is to investigate
the non-reality feature (or subjective property) of relevance,
which coincides with the non-reality principle of quantum
theory.
human faces and asked to make a decision about how they
would react to the faces. Two experimental settings were
predefined. In setting 1, the participants were asked to decide (D-only) whether to take a ‘attack (A)’ or ‘withdraw
(W)’ action. In setting 2, the participants were asked to first
categorize (C step) the face as belonging to either a ‘good
(G)’ or a ‘bad (B)’ guy, and then decide (C-then-D) their
actions (A or W). The experiment was carried out in a well
controlled environment to ensure less distraction to the participants and more randomness in assigning participants to
different experimental settings.
Violation of CLTP in Human Decision
Through this experiment, various probabilities1, including P (A) and P (W ) in setting 1, and P (G), P (A|G),
P (W |G), P (B), P (A|B) and P (W |B) in setting 2, can be
obtained. These probabilities are denoted for short as pA and
pW in setting 1, and pG , TAG , TW G , pB , TAB and TW B in
setting 2, respectively. The following observation from the
experiment indicates the violation of CLTP:
T T
pG · TAG + pB · TAB
pA
=
(1)
pW
pG · T W G + pB · T W B
The above formula shows that the non-interfered probabilities on the left-hand side are different from the interfered
probabilities on the right hand side.
A Markov Model Explanation
To model and explain the phenomenon, Busemeyer et
al. (2009) considered the right-hand side of Formula 1 as
a Markov model. They defined two category states, G and
B, and two decision states, A and W. The 2 × 1 row vector
PC = [pG pB ] represents the initial probabilities of the two
category states in setting 2. The 2 × 2 matrix
TAG TW G
(2)
T =
TAB TW B
can represent the probabilities of transiting from each category states to each decision states. Specifically, Tj,i represents probability of transiting from category state j (G or B)
to decision state i (A or W). T is transition matrix (a right
stochastic matrix), which means each row of T sums to one.
Then, the right-hand side of Formula 1 equals to PC · T , a
standard Markov process. Let PF = [pF A pF W ] = PC · T ,
where pF A and pF W are the probabilities predicted by the
Markov Model. Hence, Formula 1 can be rewritten as:
T T T
pA
pF A
pG · TAG + pB · TAB
=
=
pW
pF W
pG · T W G + pB · T W B
(3)
It turns out that the predicted probabilities of the Markov
model are still different from the probabilities in the setting
1. Moreover, this Markov approach still can not bridge the
probability differences in the two different settings.
Cognitive Interference in Human Decision
Making
In this section, we start with a description of Busemeyer et
al.’s cognitive experiment in human decision making and
their formulation on the cognitive interference. Then, we
present an alternative Markov approach to model the cognitive interference.
In (Busemeyer and Wang 2007; Busemeyer, Wang, and
Lambert-Mogiliansky 2009), researchers presented some
experimental results that demonstrated the nonclassical nature of the cognitive interference, evidenced by the violation
of the classical law of total probability (CLTP), in a human
decision making task. Participants were shown pictures of
1
In this section, for a better readability, some notations and
formulation in Sections 2.1 and 2.2 of (Busemeyer, Wang, and
Lambert-Mogiliansky 2009) are slightly modified
126
N
If no cognitive interference exists in setting 2, we let TAG
,
N
and TW B be the non-interfered conditional probabilities in setting 2, which correspond to the interfered ones
TAG , TAB , TW G and TW B in Formula 1, respectively. NoN
are just for theoretical
tice that different from T(·) , the T(·)
analysis and are actually not observed from the experimental
data. We let
T T
N
N
+ pB · TAB
pN A
pG · TAG
PN =
=
(7)
N
N
pN W
pG · T W
G + pB · T W B
Quantum Model Explanation
N
N
TAB
, TW
G
Later, Busemeyer et al. (2009) proposed to use quantum theory to explain the violation of CLTP. In quantum theory, the
probabilities are derived from the squared magnitudes of the
amplitudes. For example, an amplitude q is√a complex number q = r · [cos θ + i sin θ], where i = −1, r ∈ [0, 1]
is the magnitude, θ ∈ [0, 2π) is the phrase. The squared
magnitude q · q ∗ = r2 ≤ 1.
Two category states, |G and |B, as well as two decision
states, |A and |W , were defined. The 2 × 1 row vector
QC = [qG qB ] represents the initial amplitudes for the category states, where |qG |2 = pG and |qB |2 = pB . The 2 × 2
matrix
UAG UW G
(4)
U=
UAB UW B
where pN A and pN W denote the non-interfered probabilities
with respect to action A and W, respectively in setting 2.
According to our assumption, the CLTP should hold, which
means
T T
pN A
pA
=
(8)
pW
pN W
represents the amplitudes for transiting from each category
state to each decision state, where U is a unitary matrix2
and |Ui,j |2 = Ti,j . Then, the predicted amplitudes by the
quantum model can be represented as
T
T
QF A
qG · UAG + qB · UAB
QF =
= QC ·U =
QF W
qG · UW G + qB · UW B
(5)
The corresponding probabilities by quantum model are
»
»
|QF A |2
|QF W |2
–T
where the left-hand side is actually the left-hand side of Formula 1, while the right-hand side is just for theoretical analysis but not observed from the data.
If there exists cognitive interference in setting 2, the CLTP
does not hold, which gives us
T T
pIA
pG · TAG + pB · TAB
=
(9)
PI =
pIW
pG · T W G + pB · T W B
=
|qG |2 |UAG |2 +|qB |2 |UAB |2 +2 · |qG ||UAG ||qB ||UAB | cos(θ)
|qG |2 |UW G |2 +|qB |2 |UW B |2 +2 · |qG ||UW G ||qB ||UW B | cos(τ )
–T
»
pG · TAG + pB · TAB + Interf erence(θ)
≈
pG · TW G + pB · TW B + Interf erence(τ )
(6)
–T
where the PI = [pIA pIW ] represents the interfered probabilities in setting 2, and are actually equivalent to the righthand side of Formula 1. Hence, the violation of CLTP in
Formula 1 can be rewritten as:
T T
pIA
pA
=
(10)
pW
pIW
From the above four Formulae, it turns out that the problem to explain the violation of the CLTP in Formula 10 is
equivalent to the problem of explaining the difference between PN and PI (i.e., two right-hand sides of Formula 8
and 10, respectively) and how to bridge the difference. Notice that under our assumption, the quantum explanation in
Formula 6 can also be regarded as an approach to explain
the difference between PN and PI and use an interference
term to bridge the difference.
The difference between PN and PI is generated by the
N
and T(·) (see Formula 7 and Fordifference between T(·)
mula 9). We think this is due to the transition between the
decision states (A or W), which can be formulated by a standard Markov transition matrix (or called a right stochastic
matrix):
MAA MAW
(11)
M (α) =
MW A MW W
It shows that the quantum probabilities do not necessarily obey the CLTP due to the existence of the interference
term, e.g., 2 · |qG ||UAG ||qB ||UAB
| cos(θ). As a result, these
probabilities |QF A |2 |QF W |2 of quantum model are different from the predicted probabilities [pF A pF W ] of the
Markov model in Formula 3. In (Busemeyer, Wang, and
Lambert-Mogiliansky 2009), they have shown that using
a well trained cos(θ),
cos(τ ) and fitted unitary matrices,
|QF A |2 |QF W |2 can predict [pA pW ] more exactly.
An Alternative Markov Approach
We think that the cognitive interference observed in the
above human decision task can be modeled by an alternative Markov approach.
First, we assume in general that if there does not exist
cognitive interference, the CLTP should hold; otherwise, the
CLTP can be violated. For the experiment in (Busemeyer,
Wang, and Lambert-Mogiliansky 2009), this assumption
means that if there was no cognitive interference in setting
2 (i.e., the participants’ C-step does not interfere with the
D-step), the CLTP should hold . Otherwise, the probability
difference between both sides can be generated.
where M (α)i,j (i, j ∈ {A, W }) represents the probabilities
of transiting from a decision state i to another decision state
j, and different M (α)3 represent different cognitive interference effects of the category state α(α ∈ {G, B}). Hence,
3
It should be noted that M (α) is different from T (see Formula 2), in the sense that M (α) represents the transition probabilities between decision states, while T represents the transition
probabilities from category states to decision states.
2
To make sure that U is a unitary matrix, |Ui,j |2 just need to be
fitted to approximate Ti,j
127
the cognitive interference process can be formulated as follows:
[TAG
N
TW G ] = [TAG
N
TW
G ] × M (G)
[TAB
N
TW B ] = [TAB
N
TW
B ] × M (B)
ference means that user relevance judgment for some documents often interferes with user judgment for other documents. Our general aim is to dynamically model the cognitive interference while users are scanning a ranking of documents.
The above aim is highly related to interactive IR tasks.
For example, if the users have read and explicitly or implicitly judged some of the top-ranking documents, the relevance scores of other documents should be revised accordingly. The score revision process should consider the certain
groups/types of users’ cognitive interference, which can be
learned from the model proposed in this paper.
(12)
If each M (i) is an identity matrix, it means that categorizing the face does not interfere with participants’ decisions
on their actions. Otherwise, according to Formula 12, the
C-step interferes with the D-step, making the difference beN
tween T(·)
and T(·) , and then the difference between PN
and PI . The transition matrices M (G) and M (B) can be
N
and T(·) , and then
regarded as a connection between T(·)
bridge the difference between PN and PI .
N
N
For example, in Formula 12, [TAG
TW
G ] = [0.8 0.2]
means that without interference in setting 2, for the participants who categorized the face as belonging to a ‘good’
(G) guy, 80% of them decided to ‘attack’ (A) and 20% decided to ‘withdraw’ (W). If the transition matrix M (G) is an
N
N
TW
identity matrix, then [TAG TW G ] = [TAG
G ], which
indicates that the category state G does not interfere with
the decision states.
Otherwise,
suppose the transition ma
0.4 0.6
trix M (G) =
, which means that for the par0.7 0.3
ticipants (see the transition probabilities in the first row of
M (G)) who first decided to attack, 40% would stick to their
initial decisions while 60% would change their decisions to
withdraw; for the participants (see the second row) who first
decided to withdraw, 70% would change their decisions to
attack while 30% would stay on their initial decisions. AfN
N
TW
ter the cognitive interference represented by [TAG
G ]×
T
M (G), the interfered
probabilities
are
[T
AG
W G] =
0.4 0.6
[0.8 0.2] ×
= [0.46 0.54], which is dif0.7 0.3
ferent from the non-interfered probabilities.
Basic Idea
Now, we describe the basic idea of modelling the cognitive
interference between users judging two document d0 and d1 ,
using the standard Markov way described in the previous
section.
Assume there are three relevance degrees, R, P , and I,
which represent the relevance degrees relevant, partially relevant and irrelevant, respectively, of a document with respect to the given query q. We let relevance degrees be
judgment states of users, which correspond to the decision states in the previous section. Let a 1 × 3 row vector w = [pR , pP , pI ] be the judgment states distribution
(states distribution for short) to represent the judgment result. Specifically, pR , pP and pI represent the percentage
of users who judge the document as relevant (R), partially
relevant (P ), irrelevant (I). Let w0 be non-interfered states
(α)
distribution of document d0 , and w0 be interfered states
distribution of d0 . The cognitive interference is generated
by judging another document d1 , and the interference is encoded in a transition matrix
MRR MRP MRI
(13)
M (α) = MP R MP P MP I
MIR MIP MII
Further Comparison between the Above Models
where Mi,j (i, j ∈ {R, P, I}) represents the probabilities of
transiting from judgment state i to judgment state j. Different interference by document d1 has different M (α), which
will be discussed later.
This M (α), which represents the cognitive interference
among judgment states, borrows the same idea from the M
in Fomula 11. Furthermore, M (α) can bridge the gap between the non-interfered states distribution w0 and inter(α)
(α)
fered counterpart w0 by w0 =w0 × M (α) 5 . This is
similar to the idea that M (G) and M (B) can connect the
difference between PN and PF .
In the next section, we will adopt the probabilistic automata (PA), which is a generalization of Markov models,
From Formula 12, one may think that we are using transition matrices and non-interfered probabilities to predict the
interfered probabilities. This is different from that in the
quantum model (Formula 6), where they are using an interference term and interfered probabilities to predict the noninterfered ones. We should mention that: firstly, through the
inverse operation of Formula 12, one can also predict the
non-interfered probabilities; secondly, to predict the interfered decisions is actually more important in the user relevance judgment process.
Automata Modeling for Cognitive
Interference in Relevance Judgment
In the previous section, we have shown the possibility of
using Markov approach to model the cognitive interference
in human decision making context.
Now, we investigate the cognitive interference during the
user relevance judgment process4 . Here, the cognitive inter-
even the retrieval interface. Here, we pay more attention to the
change of user relevance judgment results of documents with respect to a given query. Actually, we also studied the cognitive interference in the relevance judgment of the potential queries (indicating user information needs) with respect to a given document,
which is out of the scope of this paper.
5
Here, in order to keep the formulation simpler, we do not include the conditional probabilities.
4
We admit that relevance judgment can be affected by a lot of
factors, such as dynamic information needs, contents, genre and
128
3
V
V
5
V
,
V
)
Figure 1: An example for Probabilistic Automaton (PA). In this
Figure 2: Illustration of cognitive interference modeling by PA.
The states distribution of PA can represent users relevance judgment. The transition matrices of PA can be used to model the cognitive interference when users are judging documents.
PA, the states set is S = {s1 , s2 , s3 , s4 }, where s4 is the final state.
Each PA has an alphabet Σ. For each symbol α ∈ Σ, M (α) is
the transition matrix to represent the probabilities of the transition
among states.
α
matrix. The value M (α)k,t = Pr(sk −
→ st ) is the probability that the state sk transits to st after reading α. For
the PA, after reading α, the new states distribution becomes
wM (α). If then another symbol β ∈ Σ is read in, the next
states distribution of PA will be wM (α)M (β). Therefore,
we can extend the domain of function M from Σ to Σ∗ in
a standard way, i.e., M (αβ) = M (α)M (β), where αβ is a
string of symbols.
An indicative example of a 4-state PA is given in Figure
1. Suppose the⎡initial distribution w =⎤ [0.5, 0.3, 0.2, 0],
0.6 0.35 0.05 0
0
0 ⎥
⎢ 0.1 0.9
and M (α) = ⎣
. After reading
0
0
1
0 ⎦
0
0
0
1
α, the states distribution of this PA will be wM (α) =
[0.330, 0.445, 0.225, 0]. The values 0.6, 0.35, 0.05 and 0 in
the first row of this M (α) are the probabilities that s1 transits to s1 , s2 , s3 and s4 , respectively. In general, the k th row
in M (α) represents the probabilities that the state sk transits
to every st (st ∈ S) .
to model the cognitive interference in the relevance judgment. The reason why we adopt PA is that: firstly, PA offers different input symbols, which can represent different
kinds of cognitive interference generated by viewing other
documents; secondly, PA can accept a finite string of symbols, making it possible and natural to dynamically model
the cognitive interference when users are judging a ranking
of documents.
In addition, the cognitive interference modeled by quantum finite automaton (QFA) will also be presented. Here, we
clarify that cognitive interference using the classical Markov
model, e.g. PA, may only be able to explain the violation of
CLTP described by Formula 1 or other similar phenomenon.
In the following, we will pay more attention to the PA, since
we think it is easier to implement in IR and it is powerful
enough for our task. On the other hand, we will further show
the potential advantages of QFA, which is more general and
powerful.
PA Modeling for Cognitive Interference
PA Representation for Users Judgment Results The
states s1 , . . . , sm and their distribution in PA can represent the uses relevance judgment of a document. Recall
the example illustrated in the “Basic Idea” section. Now,
we are going to reuse the same example in the PA framework. As shown in Figure 2, the PA states {s1 , s2 , s3 , s4 } =
{R, P, I, F }. R, P , and I represent the relevance degrees
(also called judgment states), relevant, partially relevant and
irrelevant, respectively, of a document with respect to the
given query q. The state F denotes that the user quits or
finishes the judging process. For the states distribution w,
the probability p1 , p2 , p3 or p4 corresponds to the percentage of users who judge the document as relevant (R), partially relevant (P ), irrelevant (I), or finish the judgment (F ),
respectively.
In general, PA can represent a finite number of relevance
degrees. Let the first m−1 PA states s1 , . . . , sm−1 represent
the relevance degrees which are monotonically decreasing.
That means s1 denotes totally relevant and sm−1 denotes
absolutely irrelevant. The final state sm indicates that the
user quits or finishes the relevance judgment. Then, we let
We propose to use probabilistic automaton (PA), which is a
generalization of the concept of Markov model, to model the
cognitive interference. We are more focused on the model
rather than examples or observations, since the difference
between non-interfered judgment and interfered judgment is
a widely-accepted fact in IR but less attention has been paid
to formal modeling of it. We first give the formal definition
of probabilistic automaton (PA).
Definition of PA Before giving the Definition 1 of PA, we
should mention that a (row) vector is stochastic if all its entries are greater than or equal to 0 and sum to 1. Accordingly,
a matrix is stochastic if all its row vectors are stochastic. We
let M(m,n) be the set of all m × n stochastic matrices.
Definition 1 A probabilistic automaton (PA) U is a 5-tuple
(S, Σ, M, w, F ), where S = {s1 , . . . , sm } is a finite set of
states, Σ is an input alphabet, M is a function from Σ into
M(m,m) , w is a (m)-dimensional stochastic row vector, and
F ⊆ S is a set of final states.
The vector w denotes the initial states distribution whose
k th component is the initial probability of the state sk (sk ∈
S). For each symbol α ∈ Σ, M (α) is a m × m stochastic
wi = [pi,1 , . . . , pi,m ]
(14)
represent the initial relevance judgments of users with re-
129
spect to di , where pi,k (k = 1, . . . , m) denotes the probability (e.g., the percentage of users) of judging di as relevance degree sk . Note that the initial judgment means that
the users only judge document di , i.e., without interference.
When we investigate the interference 7 on wi (w.r.t. di )
imposed by wj (w.r.t. dj ), the initial states distribution w
of PA is set as wi , and the input symbol is generated by
judging dj using the following formula:
⎧
a,
ψ(dj ) > ψ(di )
⎪
⎪
⎨ b,
ψ(dj ) < ψ(di )
α(dj ) =
(18)
⎪
c,
ψ(dj ) = ψ(di )
⎪
⎩
e,
Judgment f inished
Cognitive Interference Modeling when Judging two Documents Now, we describe how this PA can be used to
model the process that judging a document d0 is interfered
by judging another document d1 . We first map the initial
relevance judgment result to the ranking score, in order to
quantify the topical relevance of documents. Specifically,
this mapping is as follows:
ψ(di ) = wi × Λ
To model the process that having judged d1 d2 . . . dj
interferes with judging di (1 ≤ i ≤ n), we extend
the domain of the function M from Σ to Σ∗ , as described in first paragraph below Definition 1. After judging a ranking of documents d1 d2 . . . dj , a string x =
α(d1 )α(d2 ) . . . α(dj ) is generated. Accordingly, we have
j
M (x) = M (α(d1 )α(d2 ) . . . α(dj )) =
k=1 M (α(dk )).
(x)
Then, the interfered states distribution wi of di imposed
by the judgment of d1 d2 . . . dj can be obtained by
(15)
where ψ(di ) is the initial relevance score of di , and Λ
is
a column vector [m, m − 1, . . . , 1]T . Then, ψ(di ) =
m
k=1 pi,k × (m − k + 1). This is to ensure that the more
users judged higher relevance degrees, the higher the overall
relevance score would be6 .
Recall that PA has several input symbols in the alphabet Σ
and different symbols can represent different kinds of interference. Specifically, we define three input symbols, a, b and
c, which correspond to the following three conditions, i.e.,
1) ψ(d1 ) > ψ(d0 ), 2) ψ(d1 ) < ψ(d0 ) and 3) ψ(d1 ) = ψ(d0 ).
Obviously, these three conditions reflect the relative relevance of two documents d1 and d0 . Let another input symbol
e denote that the user will finish the relevance judgment task.
Formally, the input symbol generated by judging document
d1 is formulated as:
⎧
a,
ψ(d1 ) > ψ(d0 )
⎪
⎪
⎨ b,
ψ(d1 ) < ψ(d0 )
(16)
α(d1 ) =
⎪
c,
ψ(d1 ) = ψ(d0 )
⎪
⎩
e,
Judgment f inished
(x)
wi
= wi × M (x).
(19)
Note that the above formulation considers the presentation order of documents. Different document order generates different string x of symbols, leading to different tran(x)
sition matrix M (x) and wi .
Cognitive Interference Modeling when Judging a Ranking of Documents Generally, our aim is to dynamically
model the cognitive interference of users when they are
judging a list of documents d1 d2 . . . dn . This means for each
j (1 ≤ j ≤ n), PA is expected to model how the judgment
for d1 d2 . . . dj interferes with the judgment for document di
(1 ≤ i ≤ n).
Comparing PA with other Classical Models Probabilistic automaton (PA) is a non-deterministic finite automaton
(NDFA). In this section, we will justify why we choose PA,
instead of deterministic finite automaton (DFA), or other
graphical models (e.g., hidden Markov model (HMM) or
Bayesian network (BN)), to model the cognitive interference.
Firstly, we compare PA with the deterministic finite automaton (DFA). The definition (Tzeng 1992) of DFA is as
follows:
Definition 2 A deterministic finite automaton (DFA) A is a
5-tuple (S, Σ, τ , s1 , F), where S is a finite set of states, Σ is
an input alphabet, τ is a function from S ×Σ into S, s1 is the
initial state, and F ⊆ S is a set of final states.
Recall that in PA, each sate can have several possible next
states after reading a symbol α. From the DFA Definition 2,
however, we can see that after reading a symbol, for every
state in DFA, the next possible state is uniquely determined.
This means that after judging another document, users will
change their judgment from one relevance degree (e.g., absolutely relevant) to another unique relevance degree (e.g.,
absolutely irrelevant). Therefore, DFA cannot model the
probabilistic transition from one relevance degree to many
possible relevance degrees, which is necessary if the cognitive interference modeling consider a large number of users
with different behavioral trends.
Secondly, compared to other graphical models, such
as Hidden Markov Model (HMM) and Bayesian Network
(BN), PA is also more suitable for our task. Both PA and
6
There may be other choice for Λ in Equation 15, which is left
as our future work.
7
Here, we say wj interferes with wi , since we think the cognitive interference is on users relevance judgments.
To model the process that judging d0 is interfered by judging d1 , the initial states distribution of PA is set to w0 and
the input symbol is generated by judging d1 . For each symbol, a transition matrix M (α) will be learned from user
judgment data and reflect the states transition probabilities.
These probabilities actually reflect users cognitive interference. For example, M (α)k,t denote that the user will change
their relevance judgment from relevance degree qk to degree
qt , after reading another document that generates an input
factor α. The whole interference process can be formulated
as
(α)
(17)
w0 = w0 × M (α)
(α)
where w0 is the interfered relevance judgment of d0 and
α = α(d1 ) is generated by judging d1 .
130
documents. In this study, five queries were selected. They
are (1) “Albert Einstein”, (2) “Probability”, (3) “Semantic
Web”, (4) “Graphical Model”, and (5) “Quantum Mechanics”. The number of queries is small, since the users are
often more careful if the task does not take them too much
time. For each query, three Wikipedia pages with different
relevance degrees were chosen. Recall that PA can model
a finite number of relevance degrees. Here, the relevance
degrees are ranged from A to E, where A means “totally
relevant” and E means “absolutely irrelevant”, then S is
{A, B, C, D, E}. We fixed one page as d0 (which is mostly
likely to be partially relevant, i.e. B, C, or D) , and d1 is
randomly selected from the other two pages before the user
judge d0 .
The users were divided into 2 distinct groups. For each
query, we let users in group 1 judge d0 only (to obtain w0 ),
and users in group 2 judge d1 at first (to obtain w1 ) and then
(α)
d0 (to obtain w0 ). The reason why we split users into two
groups is to avoid any user reading d0 twice. Users were
required to mark the relevance degree from A to E.
The study was carried out through the Amazon Mechanical Turk, which is an Amazon Web service that enables online user studies to be “crowdsourced” to a large number of
users quickly and economically. To gain a better controllability of user experiments, it is important to detect suspicious
responses (Kittur, Chi, and Suh 2008). Therefore, in our experiment, we removed the data of users who are considered
as not reliable in terms of, e.g., without completing most of
the queries and without viewing any document for more than
a certain time interval. Eventually, for each group, the data
of 30 Mechanical Turk users were collected.
At first, we would like to test whether there exists cognitive interference in group 2. We denote the initial judgment result (i.e., without interference) of d0 in group 1 as
0 to inw0 , and the judgment result of d0 in group 2 as w
dicate the potential interference. We then use Euclidean
0 ) to measure the difference between w0
distance d(w0 , w
0 . The difference for each query is summarized in
and w
Table 1. We can observe that the difference is obvious for
every query and indicates the existence of the cognitive interference. More specifically, we found that users tended
to lower their relevance degrees of the current document after they viewed a more relevance document, and vise versa.
Generally, the initial relevance judgment w0 on a document
d0 does not hold anymore, once the interference from judging other documents takes effect, leading to the interfered
0.
judgment w
This has practical implications in scenarios such as: after
users have judged some documents, the retrieval system is
expected to be able to dynamically re-rank other documents
according to the interfered relevance. Hence, it is necessary
to derive a more reasonable estimation, rather than w0 , for
0.
the real interfered relevance result w
(α)
It is natural to use w0 by Equation 17 to estimate
(α)
0 . Then, we need to test if d(w0 , w
0 ) is smaller than
w
0 ), where w0 is used as a baseline for w
0 . If this
d(w0 , w
(α)
is true, we can conclude that w0 , computed from the pro 0.
posed PA approach, is a better estimation of w
HMM are probabilistic finite-state machines (Vidal et al.
2005) and some link between them has been found (Dupont,
Denis, and Esposito 2005)(Vidal et al. 2005). However, it is
not straightforward to define the hidden state in our problem.
Classical Bayesian network (BN) cannot model the sequential task. Dynamic Bayesian network (DBN) (Ghahramani
1998) can model sequences of variables, but it could be too
complex to be not directly applied to our problem. Nonetheless, we would like to investigate how to use DBN for our
problem in the future.
QFA Modeling for Cognitive Interference
Quantum finite automata (QFA) are a quantum analog of
probabilistic automata. In this section, we describe the main
idea of QFA modeling of the cognitive interference, by providing the quantum version of the components in the PA
modeling.
Using QFA, the states are denoted as |s1 , |s2 , . . . , |sm ,
where |sk (1 ≤ k ≤ m) denotes the k th judgment states.
The states distribution for the document di is denoted as
|wi = [qi,1 , qi,2 , . . . , qi,m ], where qi,k denotes the amplitude of the k th judgment state, and |qi,k |2 = pi,k in Formula 14. We adopt the same mapping function ψ(di ) in
Formula 15 and the interference symbol generation function
in Formula 18 for QFA modeling. For each input symbol
α, the corresponding quantum transition matrix is a unitary
matrix U (α). We denote the interfered states distribution as
|wiα , where
(20)
|wiα = |wi U (α)
After judging a ranking of documents that generate a sequence of interference input symbol, denoted as x =
α(d1 )α(d2 ) . . . α(dj ), the interfered states distribution for
document di is:
(21)
|wix = |wi U (x)
j
where U (x) = k=1 U (α(dk )). Note that the above quantum Markov process is different to that proposed by (Busemeyer, Wang, and Lambert-Mogiliansky 2009), in the sense
that the former is the transition of judgment states, while the
latter is the transition from category to decision states.
The above quantum formulation has its advantage in that
the quantum probabilities naturally do not need to obey classical probability law, e.g. CLTP. On the other hand, PA may
just be able to explain the violation of CLTP in the sense
of the difference between the non-interfered (or initial) and
interfered states distribution. Moreover, we consider that
QFA is stronger than PA in that QFA can naturally offer (potentially) infinite memory and the corresponding processing
ability. Note that, being aided by (potentially) infinite external storage and dominated by inductive and abstract ability,
the cognitive process of human being can handle, at least
to some extent, (potentially) infinite information. Hence,
QFA would potentially make a more general and powerful
approach to model the cognitive process.
Case Study
We designed and carried out a pilot user study to investigate the cognitive interference when users are judging two
131
Gordon University, email: jm@comp.rgu.ac.uk)
Table 1: The Difference between Initial and Interfered Judgment of d0
Query
0)
d(w0 , w
#1
0.3713
#2
0.2062
#3
0.2962
#4
0.2708
Acknowledgments
#5
0.1499
Table 2: The Difference between (Estimated) Interfered and
(Real) Interfered Judgment of d0
Query
(α)
0)
d(w0 , w
#1
0.2323
#2
0.1404
#3
0.2306
#4
0.1833
We would like to thank Leszek Kaliciak and anonymous
reviewers for their constructive comments. This work
is supported in part by the UK’s EPSRC grant (No.:
EP/F014708/2).
References
#5
0.1266
In order to do this test, we assumed that 75% of users
data were available and used them for the training of transition matrices M (α). The other 25% of users data was
(α)
0.
used to test whether w0 is a better estimation of w
4-fold cross validation was carried out. For training the
transition matrix M (α), we simply calculated probabilities
α
→ st }, by simulating states transition from users
Pr{sk −
data between two groups. Then, we use the trained M (α) to
(α)
(α)
compute w0 = w0 M (α). The difference between w0
8
0 for each query are in Table 2. It can be observed
and w
(α)
0 ) is much smaller than d(w0 , w
0 ) almost for
that d(w0 , w
every query. This implies the potential usefulness of our
model for predicting the interfered judgment of users. To
the best of our knowledge, there is yet no other result to
compare with. We report our first results on real user data.
Conclusions and Future Work
In this paper, we have proposed an alternative Markov approach to explain the quantum-like cognitive interference
(in the form of violation of the classical law of total probability) in human decision making which underpins users’
relevance judgment process in information retrieval. Then,
we proposed to use probabilistic automaton (PA) and quantum finite automaton (QFA) to dynamically model the cognitive interference of users when they are judging a list of
documents. A quantitative case study on the real users relevance judgment data collected through a task-based user
study demonstrated the suitability and feasibility of the PA
modeling for cognitive interference.
In the future, we will investigate in-depth the difference
between PA and QFA for cognitive interference modeling.
We also need to consider the exact quantity of relevance
score differences, and the statistical dependency among documents in the modeling. Moreover, we will refine our experimental set-up according to our general aim, and recruit
more users in the relevance judgment experiment.
Additional Authors
Massimo Melucci (University of Padua, email:
melo@dei.unipd.it), and John McCall (The Robert
8
Note that in the real application, it is not necessary to have one
transition matrix for only one query. Instead, it is reasonable to
learn a transition matrix for a number of queries
132
Bruza, P., and Cole, R. J. 2005. Quantum logic of semantic space: An exploratory investigation of context effects in
practical reasoning. We will Show Them! Essays in Honour
of Dov Gabbay 1:339–362, College Publications.
Busemeyer, J. R., and Wang, Z. 2007. Quantum information
processing explanation for interactions between inferences
and decisions. In Proceedings of the AAAI Symposium on
Quantum Interaction. AAAI Press.
Busemeyer, J. R.; Wang, Z.; and Lambert-Mogiliansky, A.
2009. Empirical comparison of markov and quantum models
of decision making. Journal of Mathematical Psychology
53:423–433.
Dupont, P.; Denis, F.; and Esposito, Y. 2005. Links between
probabilistic automata and hidden markov models: probability distributions, learning models and induction algorithms.
Pattern Recognition 38(9):1349–1371.
Eisenberg, M., and Barry, C. 1988. Order effects: A study
of the possible influence of presentation order on user judgments of document relevance. Journal of the American Society for Information Science 39(5):293 – 300.
Ghahramani, Z. 1998. Learning dynamic bayesian networks. In Adaptive Processing of Sequences and Data Structures, 168–197. Springer-Verlag.
Hou, Y., and Song, D. 2009. Characterizing pure high-order
entanglements in lexical semantic spaces via information geometry. In Proceedings of the Third Quantum Interaction
Symposium, 237–250.
Khrennikov, A. 2004. On quantum-like probabilistic structure of mental information. Open Systems and Information
Dynamics 11(3):267–275.
Kittur, A.; Chi, E. H.; and Suh, B. 2008. Crowdsourcing for
usability: Using micro-task markets for rapid, remote, and
low-cost user measurements. In ACM CHI 2008, 453–456.
Kondacs, A., and Watrous, J. 1997. On the power of quantum finite state automata. In Proceedings of the 38th IEEE
Conference on Foundations of Computer Science, 66–75.
Melucci, M. 2008. A basis for information retrieval in context. ACM Trans. Inf. Syst. 26(3):1–41.
Melucci, M. 2010. An investigation of quantum interference
in informaiton retrieval. In IRF 10, In press.
Mizzaro, S. 1996. Relevance: The whole (hi)story. Journal of the American Society for Information Science 48:810–
832.
Piwowarski, B., and Lalmas, M. 2009a. Structured information retrieval and quantum theory. In Proceedings of the
Third Quantum Interaction Symposium, 289–298.
Piwowarski, B., and Lalmas, M. 2009b. A quantum-based
model for interactive information retrieval. In Proceeedings
of the 2nd International Conference on the Theory of Information Retrieval, volume 5766. Springer.
Tzeng, W.-G. 1992. Learning probabilistic automata and
markov chains via queries. Machine Learning 8:151–166.
van Rijsbergen, C. J. 2004. The Geometry of Information Retrieval. New York, NY, USA: Cambridge University
Press.
Vidal, E.; Thollard, F.; de la Higuera, C.; Casacuberta,
F.; and Carrasco, R. C. 2005. Probabilistic finite-state
machines-part i. IEEE Trans. Pattern Anal. Mach. Intell.
27(7):1013–1025.
Zuccon, G., and Azzopardi, L. 2010. Using the quantum
probability ranking principle to rank interdependent documents. In Proceedings of the 32nd European Conference on
Information Retrieval, 357–369.
Zuccon, G.; Azzopardi, L.; and van Rijsbergen, C. J. 2009.
The quantum probability ranking principle for information
retrieval. In Proceeedings of the 2nd International Conference on the Theory of Information Retrieval, 232–240.
133
Download