Relational Evaluation Techniques - Graph-RAT

Relational Evaluation Techniques Daniel McEnnis Outline Definition  Component Overview  Existing Approaches  Descriptions of the Components  Applications and Examples  1/29 Relational Evaluation Techniques Definition Experimental setup for evaluating the performance of algorithms that use data that span more than one table or instance vector  Can use either relational algebra or hypergraph-based descriptions  2/29 Components Data Acquisition  Ground Truth Acquisition  Cross-Validation Technique  Query Type  Scoring Metric  Significance Test  3/29 Existing Approaches Machine Learning  Relational Machine Learning  TREC  Collaborative Filtering  ISMIR  Social Network Analysis  4/29 Machine Learning Predetermined flat data, no sampling  Predetermined ground truth  Typically simple queries  Sophisticated cross-validation  Basic set based metrics  No significance tests  5/29 Relational Machine Learning Predetermined relational data  Predetermined ground truth  Predefined simple query  Sophisticated cross-validation  Basic set-based metrics  No significance tests  6/29 TREC Predetermined flat data  Sophisticated ground truth sampling.  Sophisticated queries  Machine-learning cross-validation  Ranked set-of-sets scoring  Simple significance tests  7/29 Collaborative Filtering Predetermined flat/relational data  Predetermined ground truth  Simple, predefined query  No cross-validation  Sophisticated Scoring metrics  No significance tests  8/29 ISMIR Sampled flat data  Predetermined ground truth  Sophisticated queries  Machine-learning cross validation  Simple set based scoring metrics  Sophisticated significance tests  9/29 Social Network Analysis Sophisticated data sampling  Sophisticated statistical techniques  10/29 Sequences of Choices Plug ‘n play an experiment  Different aspects are evaluated  Some algorithms simply don’t work  Extensive algorithm rewrites sometimes needed  11/29 Data Acquisition Data structure  Where is it?  What sampling technique to use   Random Access  Snowball  Hypergraph  Snowball How much data is needed? 12/29 Ground Truth Acquisition What is being tested?  TREC extended ground truth sampling  Structure of the output  13/29 Cross-Validation Actor Based  Link Based  Graph Based  No Cross Validation  14/29 Graph Notation Actor definition  Link definition  Graph definition  Database table / instance vector equivalence  Foreign key / link equivelance  15/29 Actor Cross-Validation Traditional Machine Learning approach  Divisions by database table  Folds usually random assignment  Works well on flat data  Trouble with relational data  16/29 Link Cross Validation Rare machine learning approach  Divisions by foreign key reference  Less statistical independence than actor  Works for collaborative filtering  Usually random assignment  17/29 Graph Cross Validation Relational Machine Learning  Divisions by predetermined discrete graphs  Statistical independence  Non-learning based approaches  Clustering based fold generation  18/29 No Cross Validation Standard over fitting problems  Useful after implied cross-validation  19/29 Query Type Information Need definition  Actor based query  Set or List based query  Conditional queries  20/29 Scoring Metrics Comparisons against ground truth  Set based metrics  Ranked based metrics  List based metrics  21/29 Set Based Metrics Recall and Precision  F-Measure  Mean Average Performance  22/29 Ranked List Metrics Pearson Correlation  Spearmans Correlation  Mean Absolute Error  Linear Algebra Distance Metrics  Serendipity  23/29 Ordered List Metrics Half Life  Kendall Tau  NDPM  Sequence Alignment Algorithms  Hamming Distance  24/29 Significance Tests Pairwise student t-test  ANOVA  ANOVA/Tukey-Kramer statistical test  25/29 Evaluation Questions Does the data contain time (global ordered sequence)  Actor-, Link-, Graph-, or Set-based queries  List, Set, or Set-of-Lists output  Contextual question or absolute  Statistical purity versus maximum information  26/29 Music Recommendation          Example - Personalized Dynamic Tag Radio LastFM profile data LastFM tag data Semantic Web data Next-week-data ground truth Conditional query Graph cross-validation Kendall Tau scoring metric ANOVA/Tukey-Kramer statistical analysis 27/29 Conclusions No one-size-fits-all  Data and ground-truth set the framework  Question determines the final structure  Each discipline has a piece of the answer  Graph-RAT 0.5  28/29 Future Work Finish exploring Social Network Analysis significance tests  Fully explore set-of-sets evaluation metrics  Debugging of Graph-RAT crossvalidation schedulers  Ease of use improvements to GraphRAT  29/29

Relational Evaluation Techniques - Graph-RAT

Related documents

Products

Support

Relational Evaluation Techniques - Graph-RAT

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib