Relational Evaluation Techniques - Graph-RAT

advertisement
Relational Evaluation Techniques
Daniel McEnnis
Outline
Definition
 Component Overview
 Existing Approaches
 Descriptions of the Components
 Applications and Examples

1/29
Relational Evaluation Techniques
Definition
Experimental setup for evaluating the
performance of algorithms that use data
that span more than one table or
instance vector
 Can use either relational algebra or
hypergraph-based descriptions

2/29
Components
Data Acquisition
 Ground Truth Acquisition
 Cross-Validation Technique
 Query Type
 Scoring Metric
 Significance Test

3/29
Existing Approaches
Machine Learning
 Relational Machine Learning
 TREC
 Collaborative Filtering
 ISMIR
 Social Network Analysis

4/29
Machine Learning
Predetermined flat data, no sampling
 Predetermined ground truth
 Typically simple queries
 Sophisticated cross-validation
 Basic set based metrics
 No significance tests

5/29
Relational Machine Learning
Predetermined relational data
 Predetermined ground truth
 Predefined simple query
 Sophisticated cross-validation
 Basic set-based metrics
 No significance tests

6/29
TREC
Predetermined flat data
 Sophisticated ground truth sampling.
 Sophisticated queries
 Machine-learning cross-validation
 Ranked set-of-sets scoring
 Simple significance tests

7/29
Collaborative Filtering
Predetermined flat/relational data
 Predetermined ground truth
 Simple, predefined query
 No cross-validation
 Sophisticated Scoring metrics
 No significance tests

8/29
ISMIR
Sampled flat data
 Predetermined ground truth
 Sophisticated queries
 Machine-learning cross validation
 Simple set based scoring metrics
 Sophisticated significance tests

9/29
Social Network Analysis
Sophisticated data sampling
 Sophisticated statistical techniques

10/29
Sequences of Choices
Plug ‘n play an experiment
 Different aspects are evaluated
 Some algorithms simply don’t work
 Extensive algorithm rewrites sometimes
needed

11/29
Data Acquisition
Data structure
 Where is it?
 What sampling technique to use

 Random Access
 Snowball
 Hypergraph

Snowball
How much data is needed?
12/29
Ground Truth Acquisition
What is being tested?
 TREC extended ground truth sampling
 Structure of the output

13/29
Cross-Validation
Actor Based
 Link Based
 Graph Based
 No Cross Validation

14/29
Graph Notation
Actor definition
 Link definition
 Graph definition
 Database table / instance vector
equivalence
 Foreign key / link equivelance

15/29
Actor Cross-Validation
Traditional Machine Learning approach
 Divisions by database table
 Folds usually random assignment
 Works well on flat data
 Trouble with relational data

16/29
Link Cross Validation
Rare machine learning approach
 Divisions by foreign key reference
 Less statistical independence than actor
 Works for collaborative filtering
 Usually random assignment

17/29
Graph Cross Validation
Relational Machine Learning
 Divisions by predetermined discrete
graphs
 Statistical independence
 Non-learning based approaches
 Clustering based fold generation

18/29
No Cross Validation
Standard over fitting problems
 Useful after implied cross-validation

19/29
Query Type
Information Need definition
 Actor based query
 Set or List based query
 Conditional queries

20/29
Scoring Metrics
Comparisons against ground truth
 Set based metrics
 Ranked based metrics
 List based metrics

21/29
Set Based Metrics
Recall and Precision
 F-Measure
 Mean Average Performance

22/29
Ranked List Metrics
Pearson Correlation
 Spearmans Correlation
 Mean Absolute Error
 Linear Algebra Distance Metrics
 Serendipity

23/29
Ordered List Metrics
Half Life
 Kendall Tau
 NDPM
 Sequence Alignment Algorithms
 Hamming Distance

24/29
Significance Tests
Pairwise student t-test
 ANOVA
 ANOVA/Tukey-Kramer statistical test

25/29
Evaluation Questions
Does the data contain time (global
ordered sequence)
 Actor-, Link-, Graph-, or Set-based
queries
 List, Set, or Set-of-Lists output
 Contextual question or absolute
 Statistical purity versus maximum
information

26/29
Music Recommendation









Example - Personalized Dynamic Tag Radio
LastFM profile data
LastFM tag data
Semantic Web data
Next-week-data ground truth
Conditional query
Graph cross-validation
Kendall Tau scoring metric
ANOVA/Tukey-Kramer statistical analysis
27/29
Conclusions
No one-size-fits-all
 Data and ground-truth set the
framework
 Question determines the final structure
 Each discipline has a piece of the
answer
 Graph-RAT 0.5

28/29
Future Work
Finish exploring Social Network
Analysis significance tests
 Fully explore set-of-sets evaluation
metrics
 Debugging of Graph-RAT crossvalidation schedulers
 Ease of use improvements to GraphRAT

29/29
Download