Using Cross-evaluation to evaluate interactive QA systems Ying Sun Associate Professor Department of Library and Information Studies Cross Evaluation (X-Eval) • A systematic method focusing on assessing the differential contribution of systems to the user’s final results. • interactive information systems • Two entities: system and individual • system effect on users’ end-products Cross Evaluation - Process Cross Evaluation - Analysis General linear model The measurement score y for task t, done using system s, by user u, as assessed by judge j, is given in first approximation by the linear expression: B: self-judgment bias variable, b=0 when u<>j, b=1 when u=j Experimental Design Cross Evaluation Criteria Seven characteristics • Covers the important ground • Avoids the irrelevant materials • Avoids redundant information • Includes selective information • Is well organized • Reads clearly and easily • Overall rating 6/30/2016 Ying Sun 6 Possible Effects 4 systems: S1, S2, S3 and S0 7* analysts (as authors): 1 – 7 8 scenarios: A – H 4 observers: I – IV 7* analysts (as judges): 1 – 7 Self judgment 6/30/2016 Ying Sun 7 Analytical Model - DVs Leading Factor of 7 characteristics • If the instrument has a balanced set of questions that accurately reflect the decision makers’ concerns, then factor analysis is a good way to summarize them. 79% variance. 7 characteristics individually 6/30/2016 Ying Sun 8 Results - System effect Results - System effect Post-hoc Scheffe analysis s3 s1 s2 s1 s2 s0 .30 .37* .44** .06 .14 .07 Results – self judgment bias Conclusion The X-Eval method • can effectively reveal differences as small as those attributable to systems in spite of the very large effects of tasks and users with a very small number of participants. • does not rely on pre-determined relevance judgments • is a successful model for the “3-realities” paradigm: real users, real problems and real systems