Approximate l -fold Cross-Validation with Least Richard E. Edwards , Hao Zhang

advertisement
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Approximate l-fold Cross-Validation with Least
Squares SVM and Kernel Ridge Regression
Richard E. Edwards1 , Hao Zhang1 , Lynne E. Parker1 , Joshua R. New2
1
Distributed Intelligence Lab
Department of Electrical Engineering and Computer Science
University of Tennessee, Knoxville TN, USA
2
Whole Building and Community Integration Group
Oak Ridge National Lab
Oak Ridge TN, United States
December 7, 2013
Funded by the United States
Department of Energy
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
1
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Outline
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
2
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Applying Kernel Methods to Large Datasets
I
Direct Kernel application scales poorly
I
I
I
I
Scaling improvements
I
I
I
I
Requires O(n2 ) memory
Model solve time increases
Model selection time increases
Faster model solvers
Problem decompositions
Low-rank Kernel approximations
Most scaling improvements apply to standard SVMs
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
3
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Applying Kernel Methods to Large Datasets
I
Least Squares Support Vector Machine (LS-SVM)
I
I
I
I
Naive cross-validation model calibration complexity: O(ln3 )
Best exact leave-one-out (LOO) cross-validation complexity: O(n2 )
Best approximate cross-validation complexity: O(m2 n)
We can do better!
I
I
Approximate cross-validation complexity: O(n log n)
Applies to LOO as well
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
4
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Outline
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
5
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Previous LS-SVM Model Selection
I
T. Pahikkala et al (2006) and Cawley et al. (2004) obtained O(n2 )
LOO cross-validation
I
I
An et al. (2007) obtained O(m2 n) l-fold cross-validation
I
I
I
I
utilizes matrix inverse properties
uses low-rank kernel approximation
removes redundancy from the validation process
introduces a new cross-validation algorithm
L. Ding et al. (2011) obtained O(ln log n) l-fold cross-validation
I
O(n2 log n) LOO cross-validation
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
6
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Outline
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
7
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Multi-Level Matrices
I
I
Matrices indexed by factors
Example 3-level matrix with factors: 2x2, 4x4, 2x2
I
I
|M| = (2 × 4 × 2) × (2 × 4 × 2)
Level 1:
A00
A10
A01
A11
B00
 B10
=
 B20
B30
B01
B11
B21
B31
B02
B12
B22
B32
6
2
M=
I
Level 2:

A00
I

B03
B13 

B23 
B33
Level 3:
B00 =
5
1
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
8
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Circulant Matrices
I
A special Toeplitz matrix
I
Its inverse is computed in O(n log n) via Fast Fourier Transform
I
Example:

1
 4

 3
2
I
General definition: 




2
1
4
3
3
2
1
4

4
3 

2 
1
c0
cn
..
.
c1
c0
...
...
..
.
cn

cn−1
..
.




c1
c2
...
c0
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
9
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
P-Level Circulant Matrices
I
Combines Circulant Matrices and Multi-Level Circulant Matrices
I
I
Each level is a circulant matrix
All factors are now one diminsional
I
Example 3-Level with factors 2, 4, 2:
A0 A1
M=
A1 A0
I
Level 2:

B0
 B3
A0 = 
 B2
B1
I
B1
B0
B3
B3
B2
B1
B0
B2

B3
B2 

B1 
B0
Level 3:
B0 =
5
6
6
5
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
10
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Outline
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
11
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Overview
I
We use same approximation method has L. Ding et al. (2011)
I
We remove inefficienies from the cross-validation process
I
Result: n log n LOO cross-validation
I
L. Ding et al.’s LOO cross-validation: n2 log n
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
12
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Kernel Approximation via P-Level Circulant Matrices
I
Song et al. (2010) introduced P-Level Circulant RBF Kernel
approximation
I
I
I
I
approximation converges as matrix level factors approach infinity
result: O(n + n2p ) complexity
I
I
I
allows n log n model solve time
allows fast model selection
However 2 to 3 factors work well
L. Ding et al. (2011) and our work
One caveat: this approximation method only applies to RBF Kernels
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
13
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Kernel Approximation via P-Level Circulant Matrices
Algorithm 1 Kernel Approximation with P-level Circulant Matrix
Input: M (Kernel’s size), F = {n0 , n1 , . . . , np−1 }, k (Kernel function)
1: N ←{All multi-level indices defined by F}
2: T ← zeros(M), U ← zeros(M)
3: Hn ← {x0 , x1 , . . . , xp−1 } ∈ Rp s.t. ∀xi ∈ Hn , xi > 0
4: for all j ∈ N do
5:
Tj ← k(||jHn ||2 )
6: end for
7: for all j ∈ N do
8:
Dj ← P
Dj,0 × Dj,1 × · · · × Dj,p−1
9:
Uj ← l∈Dj Tl
10: end for
11: K̃ ← U
Output: K̃
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
14
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Efficient Cross-Validation
Theorem
Let y (k) = sign[gk (x)] denote the classifier formulated by leaving the kth
−1
group out and let βk,i = yk,i − gk (xk,i ). Then β(k) = Ckk
α(k) .
I
proven by An et al. (2007)
I
Take aways:
I
I
Allows computing a single Kernel matrix inverse for all folds
Perform smaller inverses to compute the hold out result
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
15
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Efficient Cross-Validation
Algorithm 2 Efficient Cross-Validation
Input: K (Kernel matrix), l (Number folds), y (response)
−1
1: Kγ−1 ← inv (K + γ1 I ), d ← 1T
n Kγ 1n
−1
2: C ← Kγ−1 + d1 Kγ−1 1n 1T
n Kγ
1 −1
−1
T −1
3: α ← Kγ y + d Kγ 1n 1n Kγ y
4: nk ← size(y )/l, y (k) ← zeros(l, nk )
5: for k ← 1, k ≤ l do
6:
Solve Ckk β(k) = α(k)
7:
y (k) ← sign[y(k) − β(k) ]
8:
k ←k +1
9: end for
Pl Pnk
(k,i)
10: error ← 12
|
k=1
i=1 |yi − y
Output: error
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
16
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Approximate l-fold Cross-Validation
Theorem
If K is a p-level circulant matrix with factorization n = n0 n1 . . . np−1 and
l = n0 n1 . . . ns s.t. s ≤ p − 1, then the computational complexity for An
et al.’s Cross-Validation Algorithm is O(n log n)
I
Take aways:
I
I
This combination produces an O(n log n) runtime
Works for any l-fold, provided the factorizations allign
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
17
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Extension to Kernel Ridge Regression
I
An et al.’s changes to their algorithm:
I
I
I
Change C’s value to Kγ−1
Change α’s value to Kγ−1 y
Our theorm still holds under these settings
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
18
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Outline
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
19
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Experimental Setup
I
Scaling
I
I
I
Approximation Quality
I
I
measured with randomly generated data
dataset sizes range from 213 to 220 samples
measured on benchmark datasets
Hyperparameter Selection Quality
I
Test exact models on real-world datasets
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
20
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Single CPU Scaling Test
213
4.43s
1.3s
0.54s
214
35.25s
2.6s
1.06s
215
281.11s
5.32s
2.14s
216
–
10.88s
4.3s
# Examples
E-LOO
A-LOO-LSSVM
A-LOO-KRR
218
–
47.41s
17.28s
219
–
101.36s
35.39s
220
–
235.83s
68.22s
# Examples
E-LOO
A-LOO-LSSVM
A-LOO-KRR
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
217
–
22.45s
8.55s
The University of Tennessee
21
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Runtime Scaling Comparison
Runtime Scaling
3
10
loglog runtime(s)
2
10
1
10
0
E−LOO−LSSVM
A−LOO−LSSVM
A−LOO−KRR
10
−1
10
4
10
5
6
10
10
7
10
dataset size
I
A-LOO scales the same for LSSVM and KRR (same slopes)
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
22
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Runtime Scaling Comparison
Runtime Scaling
4
loglog runtime(s)
10
An, et al.
A−LOO−LSSVM
A−LOO−KRR
2
10
0
10
−2
10
2
10
4
6
10
10
8
10
dataset size
I
I
We scale no worse than An et al’s low-rank approximation
We are assumption free, An et al. requires m << n
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
23
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Benchmark Dataset Performance
Data set
1) Titanic
2) B. Cancer
3) Diabetes
4) F. Solar
5) Banana
6) Image
7) Twonorm
8) German
9) Waveform
10) Thyroid
#Train
150
200
468
666
400
1300
400
700
400
140
#Test
2051
77
300
400
4900
1010
7000
300
4600
75
A-Error(L. Ding, et al.)
22.897±1.427
27.831±5.569
26.386±4.501
36.440±2.752
11.283±0.992
4.391±0.631
2.791±0.566
25.080±2.375
Not Reported
4.773±2.291
A-Error Hn ∈ (1, 2)
23.82±1.44
29.87±5.59
25.67±1.13
35.65±2.78
14.10±1.74
17.64±1.52
15.64±25.71
29.93±1.61
19.85±3.87
29.33±4.07
A-Error Hn ∈ (10, 11)
22.80±0.68
26.75±5.92
25.27±2.07
36.65±2.47
18.98±1.76
6.89±0.73
6.85±8.86
27.40±1.79
17.57±1.93
17.33±3.89
I
The real values selected effect approximation quality
I
Hyperparameter selection is now Rp+2 , rather than R2
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
E-Error
22.92±0.43
25.97±4.40
23.00±1.27
33.75±1.44
10.97±0.57
2.47±0.53
2.35±0.07
21.87±1.77
9.77±0.31
4.17±3.23
The University of Tennessee
24
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Real World Dataset
Data set
House 1
Sensor A
Sensor B
Sensor C
Sensor D
S1
S2
I
CoV(%)
19.6±1.69
1.3±0.05
17.2±4.89
12.0±2.31
1.4±0.09
13.1±0.00
3.1±0.00
MAPE(%)
15.3±0.47
1.0±0.05
10.8±0.25
7.8±0.68
0.9±0.03
10.0±0.00
4.7±0.00
CoV(%)
20.1±0.81
–
–
–
–
13.7±0.00
6.4±0.00
MAPE(%)
16.1±0.85
–
–
–
–
11.2±0.00
4.5±0.00
Selected hyperparameters work well with exact models
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
25
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Outline
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
26
Introduction
Related Work
Preliminaries
Approach
Experiments
Conclusion
Conclusion
I
The approach provides an O(n log n) l-fold cross-validation method
I
The approach scales well
I
The approach selects hyperparameters that perform well with the
exact model
I
Hyperparameter selection is now Rp+2 , rather than R2
Richard E. Edwards, Hao Zhang, Lynne E. Parker, Joshua R. New
Approximate l-fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression
The University of Tennessee
27
Download