Relational Evaluation Techniques Daniel McEnnis Outline Definition Component Overview Existing Approaches Descriptions of the Components Applications and Examples 1/29 Relational Evaluation Techniques Definition Experimental setup for evaluating the performance of algorithms that use data that span more than one table or instance vector Can use either relational algebra or hypergraph-based descriptions 2/29 Components Data Acquisition Ground Truth Acquisition Cross-Validation Technique Query Type Scoring Metric Significance Test 3/29 Existing Approaches Machine Learning Relational Machine Learning TREC Collaborative Filtering ISMIR Social Network Analysis 4/29 Machine Learning Predetermined flat data, no sampling Predetermined ground truth Typically simple queries Sophisticated cross-validation Basic set based metrics No significance tests 5/29 Relational Machine Learning Predetermined relational data Predetermined ground truth Predefined simple query Sophisticated cross-validation Basic set-based metrics No significance tests 6/29 TREC Predetermined flat data Sophisticated ground truth sampling. Sophisticated queries Machine-learning cross-validation Ranked set-of-sets scoring Simple significance tests 7/29 Collaborative Filtering Predetermined flat/relational data Predetermined ground truth Simple, predefined query No cross-validation Sophisticated Scoring metrics No significance tests 8/29 ISMIR Sampled flat data Predetermined ground truth Sophisticated queries Machine-learning cross validation Simple set based scoring metrics Sophisticated significance tests 9/29 Social Network Analysis Sophisticated data sampling Sophisticated statistical techniques 10/29 Sequences of Choices Plug ‘n play an experiment Different aspects are evaluated Some algorithms simply don’t work Extensive algorithm rewrites sometimes needed 11/29 Data Acquisition Data structure Where is it? What sampling technique to use Random Access Snowball Hypergraph Snowball How much data is needed? 12/29 Ground Truth Acquisition What is being tested? TREC extended ground truth sampling Structure of the output 13/29 Cross-Validation Actor Based Link Based Graph Based No Cross Validation 14/29 Graph Notation Actor definition Link definition Graph definition Database table / instance vector equivalence Foreign key / link equivelance 15/29 Actor Cross-Validation Traditional Machine Learning approach Divisions by database table Folds usually random assignment Works well on flat data Trouble with relational data 16/29 Link Cross Validation Rare machine learning approach Divisions by foreign key reference Less statistical independence than actor Works for collaborative filtering Usually random assignment 17/29 Graph Cross Validation Relational Machine Learning Divisions by predetermined discrete graphs Statistical independence Non-learning based approaches Clustering based fold generation 18/29 No Cross Validation Standard over fitting problems Useful after implied cross-validation 19/29 Query Type Information Need definition Actor based query Set or List based query Conditional queries 20/29 Scoring Metrics Comparisons against ground truth Set based metrics Ranked based metrics List based metrics 21/29 Set Based Metrics Recall and Precision F-Measure Mean Average Performance 22/29 Ranked List Metrics Pearson Correlation Spearmans Correlation Mean Absolute Error Linear Algebra Distance Metrics Serendipity 23/29 Ordered List Metrics Half Life Kendall Tau NDPM Sequence Alignment Algorithms Hamming Distance 24/29 Significance Tests Pairwise student t-test ANOVA ANOVA/Tukey-Kramer statistical test 25/29 Evaluation Questions Does the data contain time (global ordered sequence) Actor-, Link-, Graph-, or Set-based queries List, Set, or Set-of-Lists output Contextual question or absolute Statistical purity versus maximum information 26/29 Music Recommendation Example - Personalized Dynamic Tag Radio LastFM profile data LastFM tag data Semantic Web data Next-week-data ground truth Conditional query Graph cross-validation Kendall Tau scoring metric ANOVA/Tukey-Kramer statistical analysis 27/29 Conclusions No one-size-fits-all Data and ground-truth set the framework Question determines the final structure Each discipline has a piece of the answer Graph-RAT 0.5 28/29 Future Work Finish exploring Social Network Analysis significance tests Fully explore set-of-sets evaluation metrics Debugging of Graph-RAT crossvalidation schedulers Ease of use improvements to GraphRAT 29/29