Using Cross-evaluation to evaluate interactive QA systems

advertisement
Using Cross-evaluation to evaluate
interactive QA systems
Ying Sun
Associate Professor
Department of Library and Information Studies
Cross Evaluation (X-Eval)
• A systematic method focusing on assessing the
differential contribution of systems to the user’s
final results.
• interactive information systems
• Two entities: system and individual
•
system effect on users’ end-products
Cross Evaluation - Process
Cross Evaluation - Analysis
General linear model
The measurement score y for task t, done using
system s, by user u, as assessed by judge j, is
given in first approximation by the linear
expression:
B: self-judgment bias variable, b=0 when u<>j, b=1 when
u=j
Experimental Design
Cross Evaluation Criteria
Seven characteristics
•
Covers the important ground
•
Avoids the irrelevant materials
•
Avoids redundant information
•
Includes selective information
•
Is well organized
•
Reads clearly and easily
•
Overall rating
6/30/2016
Ying Sun
6
Possible Effects
4 systems: S1, S2, S3 and S0
7* analysts (as authors): 1 – 7
8 scenarios: A – H
4 observers: I – IV
7* analysts (as judges): 1 – 7
Self judgment
6/30/2016
Ying Sun
7
Analytical Model - DVs
Leading Factor of 7 characteristics
• If the instrument has a balanced set of
questions that accurately reflect the decision
makers’ concerns, then factor analysis is a
good way to summarize them. 79% variance.
7 characteristics individually
6/30/2016
Ying Sun
8
Results - System effect
Results - System effect
Post-hoc Scheffe analysis
s3
s1
s2
s1
s2
s0
.30
.37*
.44**
.06
.14
.07
Results – self judgment bias
Conclusion
The X-Eval method
• can effectively reveal differences as small as
those attributable to systems in spite of the
very large effects of tasks and users with a
very small number of participants.
• does not rely on pre-determined relevance
judgments
• is a successful model for the “3-realities”
paradigm: real users, real problems and real
systems
Download